hi all,
there are 4 document fields in my index that i am not indexing anymore;
then i have 4 new fields i need to add to my index, so i created a new
indexing filter.
how i can add these new fields while preserving the removed fields in
the existing docs?
at the moment when i run bin/index
I think in the end what Ken Krugler did with Bixo (limiting crawl time) and
what Julien added in https://issues.apache.org/jira/browse/NUTCH-770 (plus
https://issues.apache.org/jira/browse/NUTCH-769) are solutions to this problem,
in addition to what Andrzej described below.
Can you try
Hej,
I am a newbie in Nutch and I need some help with a problem because I do not
find clear documentation.
In crawling proccess when the each of the FetcherThread get the content,
this is in formatted in a way which deletes the new line characters (\n)
and transform useful characters in Spanish
Yep, I will try right after this run ends... Which is likely tomorrow
by the sound of it.
Still how come there is a factor 6+ difference from one run to the
next ... Timing hosts blocking the queue maybe, but the probability to
get one in the queue can not be so different from one run to run.
Hello All,
I was wondering if there is any way to check the integrity of a segment? As it
stands, I can't create the index I want due to a number of my segments freaking
out like below :
Is there anyway to check if my segments are OK, I guess i could always re:fetch
them if need be.
Mischa Tuffield wrote:
Hello All,
http://people.apache.org/~hossman/#threadhijack
When starting a new discussion on a mailing list, please do not reply
to an existing message, instead start a fresh email. Even if you change
the subject line of your email, other mail headers still track
hi
have you tried to change this property:
parser.character.encoding.default
Hej,
I am a newbie in Nutch and I need some help with a problem because I do
not
find clear documentation.
In crawling proccess when the each of the FetcherThread get the content,
this is in formatted in a
hi,
i have to add parse-wml plugin to Nutch, if it has been finished,pls
give me some advise.
Tks!
Interesting updates on the current run of 450K urls :
+ 30minutes @ 3Mbits/s
+ drop to 1Mbit/s (1/X shape)
+ gradual improvement to 1.5 Mbit/s and steady for 7 hours
+ sudden drop to 0.9 Mbits/s and steady for 4 hours
+ up to 1.7 Mbits for 1hour
+ staircasing down to 0.5 Mbit/s by steps of 1 hour
Andrzej Bialecki wrote:
Sami Siren wrote:
Lots of good thoughts and ideas, easy to agree with.
Something for the ease of use category:
-allow running on top of plain vanilla hadoop
What does it mean plain vanilla here? Do you mean the current DB
implementation? That's the idea, we should
10 matches
Mail list logo