On 18/02/13 18:44, Maciej Fijalkowski wrote:
On Mon, Feb 18, 2013 at 6:20 PM, Eleytherios Stamatogiannakis
<est...@gmail.com> wrote:
We have found another (very simple) madIS query where PyPy is around 250x
slower that CPython:

CPython: 314msec
PyPy: 1min 16sec

The query if you would like to test it yourself is the following:

select  count(*)  from   (file  'some_big_text_file.txt' limit 100000);

To run it you'll need some big text file containing at least 100000 text
lines (we have run above query with a very big XML file). You can also run
above query with a lower limit (the behaviour will be the same) as such:

select  count(*)  from   (file  'some_big_text_file.txt' limit 10000);

Be careful for the file to not have a csv, tsv, json, db or gz ending
because a different code path inside the "file" operator will be taken than
the one for simple text files.

l.


_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev

Hey

I would be incredibly convinient if you can change it to be a
standalone benchmark (say reading large string from a file and
decoding it in a whole or in pieces);


As it involves SQLite, CFFI and Python, it is very hard to extract the full execution path that madIS goes through even in a simple query like this.

Nevertheless we extracted a part of the pure Python execution path, and PyPy is around 50% slower than CPython:

CPython: 21 sec
PyPy: 33 sec

The full madIS execution path involves additional CFFI calls and callbacks (from SQLite) to pass the data to SQLite.

To run the test.py:

test.py big_text_file

l.
import sys
from codecs import utf_8_decode , utf_8_encode

def directfileutf8(f):
    try:
        for line in f:
            yield ( utf_8_decode(line.rstrip("\r\n"))[0], )
    except UnicodeDecodeError, e:
        raise Exception("File is not utf-8 encoded")


def inputstream(f):
    input = open(f,"r", buffering=1000000)
    for l in directfileutf8(input):
        yield utf_8_encode(l[0])[0]


for i in inputstream(sys.argv[1]):
    a = utf_8_decode(i)[0]

_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev

Reply via email to