That's incredibly interesting. I've never used mmap before. However, there's a problem.
I did a few experiments with mmap now, this is the latest: path = pathlib.Path(r'P:\huge_file') with path.open('r') as file: mmap = mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) for match in re.finditer(b'.', mmap): pass The file is 338GB in size, and it seems that Python is trying to load it into memory. The process is now taking 4GB RAM and it's growing. I saw the same behavior when searching for a non-existing match. Should I open a Python bug for this? On Sun, Oct 7, 2018 at 7:49 PM <2...@jmunch.dk> wrote: > On 18-10-07 16.15, Ram Rachum wrote: > > I tested it now and indeed bytes patterns work on memoryview objects. > > But how do I use this to scan for patterns through a stream without > > loading it to memory? > > An mmap object is one of the things you can make a memoryview of, > although looking again, it seems you don't even need to, you can > just re.search the mmap object directly. > > re.search'ing the mmap object means the operating system takes care of > the streaming for you, reading in parts of the file only as necessary. > > regards, Anders > > _______________________________________________ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > >
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/