On Oct 9, 2008, at 5:04 PM, Christoph Zwerschke wrote:
> Since there were not complaints, I have now checked that in:
> http://www.pygresql.org/cvsweb.cgi/pygresql/module/pg.py?view=log
>
Thanks,
It's a nice feature to have and I finally got a chance to move my code
from test to production. Unfortunately when I started to throw real
world test data at it it turned out to be a net performance hit rather
than a net performance gain.
Before access to copy I had this:
fooDataList = [ l.rstrip for l in open("foo_data.txt") ]
db = pgdb.connect()
cur = db.cursor()
cur.execute("CREATE TEMPORARY TABLE raw_foo(" +
"raw_foo_id serial not null primary key, foo_data
text)")
cur.execute("PREPARE raw_foo_insert(text) AS " +
"INSERT INTO raw_foo(foo_data) VALUES ($1)")
cur.executemany("EXECUTE raw_foo_insert(%s)", fooDataList)
cur.execute("DEALLOCATE raw_foo_insert")
...
cur.close()
db.commit()
db.close()
This took about 0.5 ~ 0.6 seconds to load 1000 rows and about 84.5
seconds to process it for a total processing time of about 85 seconds.
After copy I can do this:
classicDb = pg.connect()
db = pgdb.connect(cnx = classicDb)
cur = db.cursor()
cur.execute("CREATE TEMPORARY TABLE load_foo(foo_data text)")
classicDb.query("COPY load_foo FROM stdin")
for l in fooDataList:
classicDb.putline(l)
classicDb.putline('\\.')
classicDb.endcopy()
cur.execute("CREATE TEMPORARY TABLE raw_foo(" +
"raw_foo_id serial not null primary key, foo_data
text)")
cur.execute("INSERT INTO raw_foo (foo_data) " +
"SELECT foo_data from load_foo")
...
cur.close()
db.commit()
del db
classicDb.close()
This loads 1000 rows in about 0.25 seconds so if just loading data is
your gig then COPY table... is a win. But processing after the load
takes 89.25 seconds for a total run time of 90 seconds. Somewhere I
lost 5 seconds. That's what I get for not analyzing the problem before
I began optimizating the code.
For 4.75 seconds I'm leaving it as copy because I can wrap up the
details in a class and basically add a LoadExternalTable() Method that
takes a tablename and a list of values. I expect that the performance
benefit will really hit me when I build my new postgresql server on
more up to date hardware. I'm also planning to move temporary tables
to a special tablespace that will exist on a memory filesystem rather
than on disk. I'm hoping that that performs well. I'll publish results
when I get them.
-- Chris
Chris Hilton tildeChris -- http://myblog.vindaloo.com
email -- chris/at/vindaloo/
dot/com
.~
~
.--.~
~.--.~~.--.~~.--.~~.--.~~.--.~~.--.~~.--.~~.--.~~.--.~~.--.~~.--.~~.
"I'm on the outside looking inside, What do
I see?
Much confusion, disillution, all
around me."
-- Ian McDonald / Peter
Sinfield
_______________________________________________
PyGreSQL mailing list
[email protected]
http://mailman.vex.net/mailman/listinfo/pygresql