date:20091126

remove fields

2009-11-26 Thread Fadzi Ushewokunze

hi all, there are 4 document fields in my index that i am not indexing anymore; then i have 4 new fields i need to add to my index, so i created a new indexing filter. how i can add these new fields while preserving the removed fields in the existing docs? at the moment when i run bin/index

Re: 100 fetches per second?

2009-11-26 Thread Otis Gospodnetic

I think in the end what Ken Krugler did with Bixo (limiting crawl time) and what Julien added in https://issues.apache.org/jira/browse/NUTCH-770 (plus https://issues.apache.org/jira/browse/NUTCH-769) are solutions to this problem, in addition to what Andrzej described below. Can you try

Encoding the content got from Fetcher

2009-11-26 Thread Santiago Pérez

Hej, I am a newbie in Nutch and I need some help with a problem because I do not find clear documentation. In crawling proccess when the each of the FetcherThread get the content, this is in formatted in a way which deletes the new line characters (\n) and transform useful characters in Spanish

Re: 100 fetches per second?

2009-11-26 Thread MilleBii

Yep, I will try right after this run ends... Which is likely tomorrow by the sound of it. Still how come there is a factor 6+ difference from one run to the next ... Timing hosts blocking the queue maybe, but the probability to get one in the queue can not be so different from one run to run.

Broken segments ?

2009-11-26 Thread Mischa Tuffield

Hello All, I was wondering if there is any way to check the integrity of a segment? As it stands, I can't create the index I want due to a number of my segments freaking out like below : Is there anyway to check if my segments are OK, I guess i could always re:fetch them if need be.

Re: Broken segments ?

2009-11-26 Thread Andrzej Bialecki

Mischa Tuffield wrote: Hello All, http://people.apache.org/~hossman/#threadhijack When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track

Re: Encoding the content got from Fetcher

2009-11-26 Thread fadzi

hi have you tried to change this property: parser.character.encoding.default Hej, I am a newbie in Nutch and I need some help with a problem because I do not find clear documentation. In crawling proccess when the each of the FetcherThread get the content, this is in formatted in a

add parse-wml plugin to Nutch!

2009-11-26 Thread yangfeng

hi, i have to add parse-wml plugin to Nutch, if it has been finished,pls give me some advise. Tks!

Re: 100 fetches per second?

2009-11-26 Thread MilleBii

Interesting updates on the current run of 450K urls : + 30minutes @ 3Mbits/s + drop to 1Mbit/s (1/X shape) + gradual improvement to 1.5 Mbit/s and steady for 7 hours + sudden drop to 0.9 Mbits/s and steady for 4 hours + up to 1.7 Mbits for 1hour + staircasing down to 0.5 Mbit/s by steps of 1 hour

Re: Nutch near future - strategic directions

2009-11-26 Thread Sami Siren

Andrzej Bialecki wrote: Sami Siren wrote: Lots of good thoughts and ideas, easy to agree with. Something for the ease of use category: -allow running on top of plain vanilla hadoop What does it mean plain vanilla here? Do you mean the current DB implementation? That's the idea, we should

remove fields

Re: 100 fetches per second?

Encoding the content got from Fetcher

Re: 100 fetches per second?

Broken segments ?

Re: Broken segments ?

Re: Encoding the content got from Fetcher

add parse-wml plugin to Nutch!

Re: 100 fetches per second?

Re: Nutch near future - strategic directions

10 matches

Site Navigation

Mail list logo

Footer information