Am 12.07.2011 10:08, schrieb alexander sulz:
Hi all,
Are there some kind of average indexing times or PDF's in relation to
its size?
I have here a 10MB PDF (50 pages) which takes about 30 seconds to
index!
Is that normal?
Depends on you hardware. PDF parsing is a lot more tedious than XML and
besides parsing it's also analyzed and stored and maybe even
committed. Is it
a problem or do you have many thousands of files with this size?
Luckily I don't there just about 500 of them all in all and about 100
of them are bigger,
10 of them even problematicly big so that my php script stops working
but thats another problem.
Unfortunatly I don't have a clue about the server spec's or know
anyone who does.
greetings
alex
So I figured out I had my "bleeding-edge" Version of Solr running.
It was 3.3 with the latest tika pulled from SVN (tika1.0-SNAPSHOT).
I reverted back to the stable 0.9 release and now I get 2 seconds index
time for the same PDF!
Still, why the PHP stops working correctly is beyond me, but it seems to
be fixed now.
regards
alex