date:20070920

Re: Nutch recrawl script for 0.9 doesn't work with trunk. Help

2007-09-20 Thread Lyndon Maydwell

I've been having problems with the merge portion of the script too. My solution was to check the success status of the merge ( $? ), and if it failed, try again, or wait until next time. nutch_bin/nutch mergesegs $merged_segment -dir $segments if [ $? -ne 0 ] then echo merging segments

Blank result page

2007-09-20 Thread balachanthar palanivelu

Dear nutch users . I have some problem with nutch result some times it gives me blank page without any error but when i see the log file i got some error. I don't understand how to solve it i tried all the ways up to my extend. So i though of asking you. I am using fedora 6 + tomcat5 + jre6+

Re: Nutch recrawl script for 0.9 doesn't work with trunk. Help

2007-09-20 Thread Tomislav Poljak

Hi, I had the same problem using re-crawl scripts from wiki. They all work fine with nutch versions up to 0.9 (0.9 included), but when using nutch-1.0-dev (from trunk) they brak at merge of indexes. Reason is that merge in nutch-0.9 (from re-crawl scripts): bin/nutch merge crawl/indexes

Re: maybe dumb question about nutch index and segments file

2007-09-20 Thread Martin Kuen

hi, regarding hit summaries: The summaries are generated at search time. This is necessary, since different queries will generate different summaries (and different terms will be highlighted). The parsed text is stored in the various segments/timestamp folders. I don't know which directory it

Re: Nutch recrawl script for 0.9 doesn't work with trunk. Help

2007-09-20 Thread Alexis Votta

Hi Tomislav and Nutch users I could not solve the problem with your instructions. I crawled two times. In re-crawl. It generated crawl/NEWindexes. crawl/indexes was generated in 1st crawl. I merged == bin/nutch merge crawl/index crawl/indexes/ crawl/NEWindexes/ Now search.jsp is showing

Re: Nutch recrawl script for 0.9 doesn't work with trunk. Help

2007-09-20 Thread Alexis Votta

I have merged the old as well as new segments into segments dir. Still the same error comes. On 9/20/07, Tomislav Poljak [EMAIL PROTECTED] wrote: Hi Alexis, I think that your problem is not so much in index (or merging indexes) but in segments, because if you look at the exception you will see

Re: Nutch recrawl script for 0.9 doesn't work with trunk. Help

2007-09-20 Thread Susam Pal

We can do two things to solve this problem. SOLUTION 'A' 1. Once the 'depth' loop is complete, merge the segments in 'crawl/segments/'. ('crawl/segments/' will have one merged segment of the past plus all the segments generated in the depth loop, one for each iteration of the loop.) They are now

Nutch Dedup Question

2007-09-20 Thread karthik085

Hi, I am little confused about what exactly dedup does? a. Does dedup delete duplicate documents from Index and Segments? b. Is there a way that we could delete duplicated documents for two segments? Let me know. Thanks. -- View this message in context:

Indexing Process

2007-09-20 Thread Jeff Maki

Hello everyone, I'm not going to post my config files as not to spam you all, but I have a general question: I'm trying to index the pages of a website (obviously), and I've created a special page with a link to all the pages I want to index. I then pointed nutch to this special link page. I set

cached page not showing images

2007-09-20 Thread Joseph M.

I am having a problem with cached pages. images are not showing in them. how can I make images show in them? I am new to Nutch and having difficulties. please help me to show images in cached page.

Re: Nutch Dedup Question

2007-09-20 Thread Andrzej Bialecki

karthik085 wrote: Hi, I am little confused about what exactly dedup does? a. Does dedup delete duplicate documents from Index and Segments? Only from the index. b. Is there a way that we could delete duplicated documents for two segments? bin/nutch mergesegs -- Best regards, Andrzej

Re: Nutch Dedup Question

2007-09-20 Thread karthik085

Thanks - that's much clearer. Andrzej Bialecki wrote: karthik085 wrote: Hi, I am little confused about what exactly dedup does? a. Does dedup delete duplicate documents from Index and Segments? Only from the index. b. Is there a way that we could delete duplicated documents

Re: cached page not showing images

2007-09-20 Thread Susam Pal

See NUTCH-281. https://issues.apache.org/jira/browse/NUTCH-281 On 9/20/07, Joseph M. [EMAIL PROTECTED] wrote: I am having a problem with cached pages. images are not showing in them. how can I make images show in them? I am new to Nutch and having difficulties. please help me to show images

Changing HTTP/1.0 to HTTP/1.1

2007-09-20 Thread Joseph M.

Nutch uses HTTP/1.0 GET request. if I change the java program in HttpResponse.java to reqStr.append( HTTP/1.1\r\n); will it create any problem?

Newbie questions about filter, bandwidth, NTLM and threads

2007-09-20 Thread Bent Hugh

I have some newbie questions. - There are two filters crawl-urlfilter.txt and regex-urlfilter.txt. Which one should be configured in which condition? - Is it possible to see howmuch bandwidth Nutch crawl consumes? - Can the Nutch bot do NTLM authentication for websites in a domain? - Is there

Re: Indexing Process

2007-09-20 Thread Carl Cerecke

Look in nutch-default.xml The properties db.max.outlinks.per.page and http.content.limit might need to have their values increased. Cheers, Carl. Jeff Maki wrote: Hello everyone, I'm not going to post my config files as not to spam you all, but I have a general question: I'm trying to

Policy of merging patches

2007-09-20 Thread Bent Hugh

I was browsing Nutch JIRA. As per my observation, some patches are merged into trunk and some are merged into hudson - Nutch-Nightly. This is pretty confusing to me as a user. As a user, which branch should I check out if I want the latest Nutch with cutting-edge features and least open issues?

Re: Nutch recrawl script for 0.9 doesn't work with trunk. Help

Blank result page

Re: Nutch recrawl script for 0.9 doesn't work with trunk. Help

Re: maybe dumb question about nutch index and segments file

Re: Nutch recrawl script for 0.9 doesn't work with trunk. Help

Re: Nutch recrawl script for 0.9 doesn't work with trunk. Help

Re: Nutch recrawl script for 0.9 doesn't work with trunk. Help

Nutch Dedup Question

Indexing Process

cached page not showing images

Re: Nutch Dedup Question

Re: Nutch Dedup Question

Re: cached page not showing images

Changing HTTP/1.0 to HTTP/1.1

Newbie questions about filter, bandwidth, NTLM and threads

Re: Indexing Process

Policy of merging patches

17 matches

Site Navigation

Mail list logo

Footer information