The file system is really just a b-tree. If you’re concerned about using memory, you can implement a O(log n) map using the file system, where the entires are the different critical sections.Every node is a folder and every file is a leaf. Many package managers implement maps like this. I’d like
Chris Angelico writes:
> On Wed, Oct 10, 2018 at 5:09 AM Stephen J. Turnbull
> wrote:
> >
> > Chris Angelico writes:
> >
> > > Both processes are using the virtual memory. Either or both could be
> > > using physical memory. Assuming they haven't written to the pages
> > > (which is
On 10Oct2018 00:42, Stephen J. Turnbull
wrote:
Chris Angelico writes:
> On Tue, Oct 9, 2018 at 10:05 PM Greg Ewing
wrote:
> > Chris Angelico wrote:
> > > In contrast, a mmap'd file is memory that you do indeed own.
> >
> > Although it's not really accurate to say that it's owned by
> > a
Stephen J. Turnbull wrote:
Subject to COW, I presume. Probably in units smaller than the whole
file (per page?)
It can be COW or not, depending on the options passed to mmap.
And yes, it's mapped in units of pages.
--
Greg
___
Python-ideas mailing
On Wed, Oct 10, 2018 at 5:09 AM Stephen J. Turnbull
wrote:
>
> Chris Angelico writes:
>
> > Both processes are using the virtual memory. Either or both could be
> > using physical memory. Assuming they haven't written to the pages
> > (which is the case with executables - the system mmaps the
Chris Angelico writes:
> Both processes are using the virtual memory. Either or both could be
> using physical memory. Assuming they haven't written to the pages
> (which is the case with executables - the system mmaps the binary into
> your memory space as read-only), and assuming that those
On Wed, Oct 10, 2018 at 2:42 AM Stephen J. Turnbull
wrote:
>
> Chris Angelico writes:
> > On Tue, Oct 9, 2018 at 10:05 PM Greg Ewing
> wrote:
> > >
> > > Chris Angelico wrote:
> > > > In contrast, a mmap'd file is memory that you do indeed own.
> > >
> > > Although it's not really
Chris Angelico writes:
> On Tue, Oct 9, 2018 at 10:05 PM Greg Ewing
> wrote:
> >
> > Chris Angelico wrote:
> > > In contrast, a mmap'd file is memory that you do indeed own.
> >
> > Although it's not really accurate to say that it's owned by
> > a particular process. If two processes
On Tue, Oct 9, 2018 at 10:05 PM Greg Ewing wrote:
>
> Chris Angelico wrote:
> > In contrast, a mmap'd file is memory that you do indeed own.
>
> Although it's not really accurate to say that it's owned by
> a particular process. If two processes mmap the same file,
> the physical memory pages
Chris Angelico wrote:
In contrast, a mmap'd file is memory that you do indeed own.
Although it's not really accurate to say that it's owned by
a particular process. If two processes mmap the same file,
the physical memory pages holding it appear in the address
spaces of both processes.
--
On Mon, Oct 8, 2018 at 11:15 PM Anders Hovmöller wrote:
>
>
> However, another possibility is the the regexp is consuming lots of memory.
>
> The regexp seems simple enough (b'.'), so I doubt it is leaking memory like
> mad; I'm guessing you're just seeing the OS page in as much of the file as it
Thanks for your help everybody! I'm very happy to have learned about mmap.
On Mon, Oct 8, 2018 at 3:27 PM Richard Damon
wrote:
> On 10/8/18 8:11 AM, Ram Rachum wrote:
> > " Windows will aggressively fill up your RAM in cases like this
> > because after all why not? There's no use to having
On 10/8/18 8:11 AM, Ram Rachum wrote:
> " Windows will aggressively fill up your RAM in cases like this
> because after all why not? There's no use to having memory just
> sitting around unused."
>
> Two questions:
>
> 1. Is the "why not" sarcastic, as in you're agreeing it's a waste?
> 2. Will
>> However, another possibility is the the regexp is consuming lots of memory.
>>
>> The regexp seems simple enough (b'.'), so I doubt it is leaking memory like
>> mad; I'm guessing you're just seeing the OS page in as much of the file as it
>> can.
>
> Yup. Windows will aggressively fill up
" Windows will aggressively fill up your RAM in cases like this because
after all why not? There's no use to having memory just sitting around
unused."
Two questions:
1. Is the "why not" sarcastic, as in you're agreeing it's a waste?
2. Will this be different on Linux? Which command do I run on
On Mon, Oct 8, 2018 at 12:20 PM Cameron Simpson wrote:
>
> On 08Oct2018 10:56, Ram Rachum wrote:
> >That's incredibly interesting. I've never used mmap before.
> >However, there's a problem.
> >I did a few experiments with mmap now, this is the latest:
> >
> >path = pathlib.Path(r'P:\huge_file')
I'm not an expert on memory. I used Process Explorer to look at the
Process. The Working Set of the current run is 11GB. The Private Bytes is
708MB. Actually, see all the info here:
https://www.dropbox.com/s/tzoud028pzdkfi7/screenshot_TURING_2018-10-08_133355.jpg?dl=0
I've got 16GB of RAM on this
On 08Oct2018 10:56, Ram Rachum wrote:
That's incredibly interesting. I've never used mmap before.
However, there's a problem.
I did a few experiments with mmap now, this is the latest:
path = pathlib.Path(r'P:\huge_file')
with path.open('r') as file:
mmap = mmap.mmap(file.fileno(), 0,
That's incredibly interesting. I've never used mmap before.
However, there's a problem.
I did a few experiments with mmap now, this is the latest:
path = pathlib.Path(r'P:\huge_file')
with path.open('r') as file:
mmap = mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ)
for match in
On Sun, Oct 7, 2018 at 5:54 PM, Nathaniel Smith wrote:
> Are you imagining something roughly like this? (Ignoring chunk
> boundary handling for the moment.)
>
> def find_double_line_end(buf):
> start = 0
> while True:
> next_idx = buf.index(b"\n", start)
> if buf[next_idx
On Sun, Oct 7, 2018 at 5:09 PM, Terry Reedy wrote:
> On 10/6/2018 5:00 PM, Nathaniel Smith wrote:
>>
>> On Sat, Oct 6, 2018 at 12:22 AM, Ram Rachum wrote:
>>>
>>> I'd like to use the re module to parse a long text file, 1GB in size. I
>>> wish
>>> that the re module could parse a stream, so I
On 10/7/2018 12:32 AM, Ram Rachum wrote:
Does that mean I'll have to write that character-by-character algorithm?
I would not be surprised if you could make use of str.index, which scans
at C speed. See my answer to Nathaniel.
--
Terry Jan Reedy
On 10/6/2018 5:00 PM, Nathaniel Smith wrote:
On Sat, Oct 6, 2018 at 12:22 AM, Ram Rachum wrote:
I'd like to use the re module to parse a long text file, 1GB in size. I wish
that the re module could parse a stream, so I wouldn't have to load the
whole thing into memory. I'd like to iterate over
Jonathan Fine wrote:
Provided mmap releases memory when possible,
It will. The virtual memory system will read pages from the
file into RAM when needed, and re-use those RAM pages for other
purposes when needed. It should be pretty much the most
efficient solution possible.
--
Greg
Anders wrote
> An mmap object is one of the things you can make a memoryview of,
> although looking again, it seems you don't even need to, you can
> just re.search the mmap object directly.
>
> re.search'ing the mmap object means the operating system takes care of
> the streaming for you,
On 18-10-07 16.15, Ram Rachum wrote:
> I tested it now and indeed bytes patterns work on memoryview objects.
> But how do I use this to scan for patterns through a stream without
> loading it to memory?
An mmap object is one of the things you can make a memoryview of,
although looking again, it
I tested it now and indeed bytes patterns work on memoryview objects. But
how do I use this to scan for patterns through a stream without loading it
to memory?
On Sun, Oct 7, 2018 at 4:24 PM <2...@jmunch.dk> wrote:
> On 18-10-07 15.11, Ram Rachum wrote:
>
> > Unfortunately, it's not helpful. I
On 18-10-07 15.11, Ram Rachum wrote:
> Unfortunately, it's not helpful. I was developing a solution similar
to yours before I came to the conclusion that a multilne regex would be
more elegant.
How about memory mapping your 1GB file?
bytes patterns work on memoryviews.
regards, Anders
Hi Cameron,
Thanks for putting in the time to study my problem and sketch a solution.
Unfortunately, it's not helpful. I was developing a solution similar to
yours before I came to the conclusion that a multilne regex would be more
elegant. I find this algorithm to be quite complicated. It's
On Sat, Oct 6, 2018, 18:40 Steven D'Aprano wrote:
> The message I take from this is:
>
> - regex engines certainly can be written to support streaming data;
> - but few of them are;
> - and it is exceedingly unlikely to be able to easily (or at all)
> retro-fit that support to Python's
On 07Oct2018 07:32, Ram Rachum wrote:
On Sun, Oct 7, 2018 at 4:40 AM Steven D'Aprano wrote:
I'm sure that Python will never be as efficient as C in that regard
(although PyPy might argue the point) but is there something we can do
to ameliorate this? If we could make char-by-char processing
On 07Oct2018 07:30, Ram Rachum wrote:
I'm doing multi-color 3d-printing. The slicing software generates a GCode
file, which is a text file of instructions for the printer, each command
meaning something like "move the head to coordinates x,y,z while extruding
plastic at a rate of w" and lots of
On Sun, Oct 7, 2018 at 4:40 AM Steven D'Aprano wrote:
> I'm sure that Python will never be as efficient as C in that regard
> (although PyPy might argue the point) but is there something we can do
> to ameliorate this? If we could make char-by-char processing only 10
> times less efficient than
Hi Ned! I'm happy to see you here.
I'm doing multi-color 3d-printing. The slicing software generates a GCode
file, which is a text file of instructions for the printer, each command
meaning something like "move the head to coordinates x,y,z while extruding
plastic at a rate of w" and lots of
On Sat, Oct 06, 2018 at 02:00:27PM -0700, Nathaniel Smith wrote:
> Fortunately, there's an elegant and natural solution: Just save the
> regex engine's internal state when it hits the end of the string, and
> then when more data arrives, use the saved state to pick up the search
> where we left
On Sat, Oct 6, 2018 at 2:04 PM, Chris Angelico wrote:
> On Sun, Oct 7, 2018 at 8:01 AM Nathaniel Smith wrote:
>>
>> On Sat, Oct 6, 2018 at 12:22 AM, Ram Rachum wrote:
>> > I'd like to use the re module to parse a long text file, 1GB in size. I
>> > wish
>> > that the re module could parse a
On Sun, Oct 7, 2018 at 9:54 AM Nathaniel Smith wrote:
>
> On Sat, Oct 6, 2018 at 2:04 PM, Chris Angelico wrote:
> > On Sun, Oct 7, 2018 at 8:01 AM Nathaniel Smith wrote:
> >>
> >> On Sat, Oct 6, 2018 at 12:22 AM, Ram Rachum wrote:
> >> > I'd like to use the re module to parse a long text file,
On Sun, Oct 7, 2018 at 8:01 AM Nathaniel Smith wrote:
>
> On Sat, Oct 6, 2018 at 12:22 AM, Ram Rachum wrote:
> > I'd like to use the re module to parse a long text file, 1GB in size. I wish
> > that the re module could parse a stream, so I wouldn't have to load the
> > whole thing into memory.
On Sat, Oct 6, 2018 at 12:22 AM, Ram Rachum wrote:
> I'd like to use the re module to parse a long text file, 1GB in size. I wish
> that the re module could parse a stream, so I wouldn't have to load the
> whole thing into memory. I'd like to iterate over matches from the stream
> without keeping
On 10/6/18 7:25 AM, Ram Rachum wrote:
"This is a regular expression problem, rather than a Python problem."
Do you have evidence for this assertion, except that other regex
implementations have this limitation? Is there a regex specification
somewhere that specifies that streams aren't
I wrote:
> This is a regular expression problem, rather than a Python problem.
Ram wrote:
> Do you have evidence for this assertion, except that
> other regex implementations have this limitation?
Yes.
1. I've already supplied: https://svn.boost.org/trac10/ticket/11776
2.
"This is a regular expression problem, rather than a Python problem."
Do you have evidence for this assertion, except that other regex
implementations have this limitation? Is there a regex specification
somewhere that specifies that streams aren't supported? Is there a
fundamental reason that
Hi Ram
You wrote:
> I'd like to use the re module to parse a long text file, 1GB in size. I
> wish that the re module could parse a stream, so I wouldn't have to load
> the whole thing into memory. I'd like to iterate over matches from the
> stream without keeping the old matches and input in
It'll load as much as it needs to in order to match or rule out a match on
a pattern. If you'd try to match `a.*b` it'll load the whole thing. The use
cases that are relevant to a stream wouldn't have these kinds of problems.
On Sat, Oct 6, 2018 at 11:22 AM Serhiy Storchaka
wrote:
> 06.10.18
06.10.18 10:22, Ram Rachum пише:
I'd like to use the re module to parse a long text file, 1GB in size. I
wish that the re module could parse a stream, so I wouldn't have to load
the whole thing into memory. I'd like to iterate over matches from the
stream without keeping the old matches and
Hi,
I'd like to use the re module to parse a long text file, 1GB in size. I
wish that the re module could parse a stream, so I wouldn't have to load
the whole thing into memory. I'd like to iterate over matches from the
stream without keeping the old matches and input in RAM.
What do you think?
46 matches
Mail list logo