On Wed, 2007-07-04 at 07:52 +0200, Filip de Waard wrote: > Hello, > > > Until today, I've never had a single worry about performance in my > short but exciting Python experience.
Lucky you :). You obviously haven't had experience with memory leaks in long-running daemon processes working with big datasets :). IMHO Python memory management leaves a lot to be desired from. Python is still a great tool though. > However, now I'm trying to index over six million books from a MySQL > database using PyLucene and I'd like to speed it up. Cool dataset! :) > I have posted my indexer script at http://pastie.textmate.org/75938. > Tomorrow I'll start playing with a profiler, but in the meantime: does > anyone have any recommendations as to how to be most efficient in > regard to the Python code, database interaction and of course the > PyLucene indexing process? Or maybe I'm doing something horribly wrong > in my script? I don't think dictcursor is the best option for you, what about trying SSCursor? however you are proably not losing much time in python but in python-lucene call conversions and lucene itself. Considering the simplicity of your program, wouldn't it be really easy to throw python out of equasion and write it in java entirely ? > Any pointer would be most appreciated. > > Regards, > > > Filip de Waard > > > > > _______________________________________________ > pylucene-dev mailing list > [email protected] > http://lists.osafoundation.org/mailman/listinfo/pylucene-dev _______________________________________________ pylucene-dev mailing list [email protected] http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
