So, iter(file).next() is slow? Alex
On Mon, Feb 18, 2013 at 10:51 AM, Amaury Forgeot d'Arc <amaur...@gmail.com>wrote: > 2013/2/18 Eleytherios Stamatogiannakis <est...@gmail.com> > >> On 18/02/13 18:44, Maciej Fijalkowski wrote: >> >>> On Mon, Feb 18, 2013 at 6:20 PM, Eleytherios Stamatogiannakis >>> <est...@gmail.com> wrote: >>> >>>> We have found another (very simple) madIS query where PyPy is around >>>> 250x >>>> slower that CPython: >>>> >>>> CPython: 314msec >>>> PyPy: 1min 16sec >>>> >>>> The query if you would like to test it yourself is the following: >>>> >>>> select count(*) from (file 'some_big_text_file.txt' limit 100000); >>>> >>>> To run it you'll need some big text file containing at least 100000 text >>>> lines (we have run above query with a very big XML file). You can also >>>> run >>>> above query with a lower limit (the behaviour will be the same) as such: >>>> >>>> select count(*) from (file 'some_big_text_file.txt' limit 10000); >>>> >>>> Be careful for the file to not have a csv, tsv, json, db or gz ending >>>> because a different code path inside the "file" operator will be taken >>>> than >>>> the one for simple text files. >>>> >>>> l. >>>> >>>> >>>> ______________________________**_________________ >>>> pypy-dev mailing list >>>> pypy-dev@python.org >>>> http://mail.python.org/**mailman/listinfo/pypy-dev<http://mail.python.org/mailman/listinfo/pypy-dev> >>>> >>> >>> Hey >>> >>> I would be incredibly convinient if you can change it to be a >>> standalone benchmark (say reading large string from a file and >>> decoding it in a whole or in pieces); >>> >>> >> As it involves SQLite, CFFI and Python, it is very hard to extract the >> full execution path that madIS goes through even in a simple query like >> this. >> >> Nevertheless we extracted a part of the pure Python execution path, and >> PyPy is around 50% slower than CPython: >> >> CPython: 21 sec >> PyPy: 33 sec >> >> The full madIS execution path involves additional CFFI calls and >> callbacks (from SQLite) to pass the data to SQLite. >> >> To run the test.py: >> >> test.py big_text_file >> > > Most of the time is spent in file iteration. > I added > f = f.read().splitlines() > and the query is almost instant. > > > -- > Amaury Forgeot d'Arc > > _______________________________________________ > pypy-dev mailing list > pypy-dev@python.org > http://mail.python.org/mailman/listinfo/pypy-dev > > -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev