On Wed, 2007-07-04 at 07:52 +0200, Filip de Waard wrote:
> Hello,
> 
> 
> Until today, I've never had a single worry about performance in my
> short but exciting Python experience. 

Lucky you :). You obviously haven't had experience with memory leaks in
long-running daemon processes working with big datasets :). IMHO Python
memory management leaves a lot to be desired from.

Python is still a great tool though.

> However, now I'm trying to index over six million books from a MySQL
> database using PyLucene and I'd like to speed it up.

Cool dataset! :)

> I have posted my indexer script at http://pastie.textmate.org/75938.

> Tomorrow I'll start playing with a profiler, but in the meantime: does
> anyone have any recommendations as to how to be most efficient in
> regard to the Python code, database interaction and of course the
> PyLucene indexing process? Or maybe I'm doing something horribly wrong
> in my script? 


I don't think dictcursor is the best option for you, what about trying
SSCursor?

however you are proably not losing much time in python but in
python-lucene call conversions and lucene itself.

Considering the simplicity of your program, wouldn't it be really easy
to throw python out of equasion and write it in java entirely ?

> Any pointer would be most appreciated.
> 
> Regards,
> 
> 
> Filip de Waard
> 
> 
> 
> 
> _______________________________________________
> pylucene-dev mailing list
> [email protected]
> http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to