Re: Exception in thread main java.io.IOException: Job failed!

2012-02-23 Thread Markus Jelsma
Unfetched, unparsed or just a bad corrupt segment. Remove that segment and try again. Many thanks Remi. Finally, after un reboot og the computer (I send my question just before leaving my desk), Nutch started to crawl (amazing :))) ) But now, during the crawl process, I got that :

Re: Exception in thread main java.io.IOException: Job failed!

2012-02-23 Thread Daniel Bourrion
Hi Markus Thx for help. (Hope i'm not boring everybody) I've erase everything in crawl/ Launching my nutch, got now - CrawlDb update: 404 purging: false CrawlDb update: Merging segment data into db. Exception in thread main java.io.IOException: Job failed! at

Re: Exception in thread main java.io.IOException: Job failed!

2012-02-23 Thread remi tassing
disk size issue? access rights? On Thu, Feb 23, 2012 at 12:39 PM, Daniel Bourrion daniel.bourr...@univ-angers.fr wrote: Hi Markus Thx for help. (Hope i'm not boring everybody) I've erase everything in crawl/ Launching my nutch, got now - CrawlDb update: 404 purging: false

Re: http.redirect.max

2012-02-23 Thread Lewis John Mcgibbney
Hi, Can you post your nutch-site.xml and I will give it a spin. Thank you Lewis On Thu, Feb 23, 2012 at 5:07 AM, xuyuanme xuyua...@gmail.com wrote: Just checked the latest code in 1.4 but it's the same. See code line 138 in below link:

Re: http.redirect.max

2012-02-23 Thread xuyuanme
Thanks! The config file can be get here: http://dl.dropbox.com/u/6614015/temp/config.zip http://dl.dropbox.com/u/6614015/temp/config.zip lewis john mcgibbney wrote Hi, Can you post your nutch-site.xml and I will give it a spin. Thank you Lewis On Thu, Feb 23, 2012 at 5:07 AM,

Re: index-basic and index-more cause multi-value on non-multi-value title field?

2012-02-23 Thread Lewis John Mcgibbney
Hi, For reference, this topic has now been picked up on the dev list. Thanks for bringing it to attention. Lewis On Mon, Feb 20, 2012 at 7:45 AM, shlomi java shlomij...@gmail.com wrote: As the method in MoreIndexingFilter is called '*resetTitle*', I would expect something like

Re: ParseSegment taking a long time to finish

2012-02-23 Thread Magnús Skúlason
Hi, I tried the parsechecker tool and as it turns out it hangs after printing out: Content Metadata: Vary=Accept-Encoding Date=Thu, 23 Feb 2012 15:27:43 GMT Content-Length=3992 Expires=Thu, 19 Nov 1981 08:52:00 GMT Content-Encoding=gzip Set-Cookie=Shoper4Shop=a3ojqpk5ep6opahejfpiv98hf6; path=/

Nutch data to Solr on HTTPS

2012-02-23 Thread Christopher Gross
I have my Solr set up on a secure port -- and I think that is causing a problem for nutch (nothing else changed.) I don't see anything in the documentation regarding this. My nutch version is 1.2, Solr is 3.4. Here's the line from my runbot.sh script: $NUTCH_HOME/bin/nutch solrindex

Re: Nutch data to Solr on HTTPS

2012-02-23 Thread Christopher Gross
Meant to include this...the output from the runbot.sh script. Not that it really says a whole lot... - Index (Step 5 of 8) - SolrIndexer: starting at 2012-02-23 18:18:20 java.io.IOException: Job failed! -- Chris On Thu, Feb 23, 2012 at 1:26 PM, Christopher Gross cogr...@gmail.com

Re: http.redirect.max

2012-02-23 Thread Lewis John Mcgibbney
I've checked working with redirects and everything seems to work fine for me. The site I checked on http://www.scotland.gov.uk temp redirect to http://home.scotland.gov.uk/home Nutch gets this fine when I do some tweaking with nutch-site.xml redirects property -1 (just to demonstrate, I

Re: Nutch data to Solr on HTTPS

2012-02-23 Thread Lewis John Mcgibbney
Yeah I can confirm it was 1.4 On Thu, Feb 23, 2012 at 7:05 PM, Christopher Gross cogr...@gmail.comwrote: I tried using 1.4, but I couldn't get that to work at all. What is wrong with your configuration, if this is all that is preventing you from migrating to 1.4 I would rather get it sorted

Re: Nutch data to Solr on HTTPS

2012-02-23 Thread Christopher Gross
I was getting it to do parts of the crawl, but it was not pushing the data to Solr (that was before I moved it to https). I had worked on that for two weeks, and was frustrated and needed to make progress with other parts of the project, so I bailed on the newer nutch and just rolled with 1.2,

Re: [nutchgora] - proposal to support distributed indexing

2012-02-23 Thread SUJIT PAL
Hi Lewis, Ok, thanks, I will attach the patch to NUTCH-945 after I am done with it, and update this thread as well... -sujit On Feb 23, 2012, at 3:43 AM, Lewis John Mcgibbney wrote: Hi Sujit, On Wed, Feb 22, 2012 at 6:16 PM, SUJIT PAL sujit@comcast.net wrote: Being able to