Some tuning results - play with what you have and you might be
surprised..!! A simple tweak to run Java as a server "-server" switch,
gave a ~13% improvement as noted below for a readdb. The -server tweak
did not help on query results via Tomcat but for basic Nutch DB work, it
did pretty well
No changes to logging configuration that worked fine at 0.8 but at 0.9 I
get this once I do a query (query returns just fine):
INFO: Server startup in 1947 ms
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: / (Is a directory)
at java.io.FileOutputStream.open
Thanks for everyones help so far from my postings.
Here is another question.
I am currently merging my crawls, but am wondering if I can skip a few steps
and how to do it.
I inject a whole slew of urls into a crawl each time, and then merge it with
the crawl previously to that.
The urls injected
I've found this message while looking to update subcollections field upon a
reindexing operation. I had no explanation for my issue: I fetched/indexed
some sites, using subcollection.xml, then I made changes in the
subcollection.xml and reindexed. While inspecting the db with luke, or using
the we
If I am understanding what you are asking, in the getRecordReader method
of the InputFormat innner class in DeleteDuplicates it gets the hash
score from the document. You could put your algorithm there and return
some type of numeric value based on analysis of the document fields.
You would n
Robin Haswell wrote:
On Wed, 2006-12-20 at 12:38 +0100, Andrzej Bialecki wrote:
This is the problem - you need to increase the heap space in your
Tomcat. Since you expanded you index, the bigger index won't fit in the
same heap space as before ... especially when you run searches that
touch
On Wed, 2006-12-20 at 12:38 +0100, Andrzej Bialecki wrote:
> This is the problem - you need to increase the heap space in your
> Tomcat. Since you expanded you index, the bigger index won't fit in the
> same heap space as before ... especially when you run searches that
> touch more of the index
Robin Haswell wrote:
Hey there
I'm having issues searching with my newly (vastly) expanded database.
Could anyone shed any light on this? Basically, on a newly started
server, I search for "test", and this appears in catalina.out:
2006-12-20 10:51:40,710 INFO NutchBean - creating new bean
2006
Hey there
I'm having issues searching with my newly (vastly) expanded database.
Could anyone shed any light on this? Basically, on a newly started
server, I search for "test", and this appears in catalina.out:
2006-12-20 10:51:40,710 INFO NutchBean - creating new bean
2006-12-20 10:51:40,725 INF