Am 12.07.2011 10:08, schrieb alexander sulz:

Hi all,

Are there some kind of average indexing times or PDF's in relation to
its size?
I have here a 10MB PDF (50 pages) which takes about 30 seconds to index!
Is that normal?
Depends on you hardware. PDF parsing is a lot more tedious than XML and
besides parsing it's also analyzed and stored and maybe even committed. Is it
a problem or do you have many thousands of files with this size?

Luckily I don't there just about 500 of them all in all and about 100 of them are bigger, 10 of them even problematicly big so that my php script stops working but thats another problem. Unfortunatly I don't have a clue about the server spec's or know anyone who does.
greetings
   alex

So I figured out I had my "bleeding-edge" Version of Solr running.
It was 3.3 with the latest tika pulled from SVN (tika1.0-SNAPSHOT).
I reverted back to the stable 0.9 release and now I get 2 seconds index time for the same PDF! Still, why the PHP stops working correctly is beyond me, but it seems to be fixed now.

regards
 alex

Reply via email to