Unfetched, unparsed or just a bad corrupt segment. Remove that segment and try
again.
Many thanks Remi.
Finally, after un reboot og the computer (I send my question just before
leaving my desk), Nutch started to crawl (amazing :))) )
But now, during the crawl process, I got that :
Hi Markus
Thx for help.
(Hope i'm not boring everybody)
I've erase everything in crawl/
Launching my nutch, got now
-
CrawlDb update: 404 purging: false
CrawlDb update: Merging segment data into db.
Exception in thread main java.io.IOException: Job failed!
at
disk size issue?
access rights?
On Thu, Feb 23, 2012 at 12:39 PM, Daniel Bourrion
daniel.bourr...@univ-angers.fr wrote:
Hi Markus
Thx for help.
(Hope i'm not boring everybody)
I've erase everything in crawl/
Launching my nutch, got now
-
CrawlDb update: 404 purging: false
Hi,
Can you post your nutch-site.xml and I will give it a spin.
Thank you
Lewis
On Thu, Feb 23, 2012 at 5:07 AM, xuyuanme xuyua...@gmail.com wrote:
Just checked the latest code in 1.4 but it's the same. See code line 138 in
below link:
Thanks! The config file can be get here:
http://dl.dropbox.com/u/6614015/temp/config.zip
http://dl.dropbox.com/u/6614015/temp/config.zip
lewis john mcgibbney wrote
Hi,
Can you post your nutch-site.xml and I will give it a spin.
Thank you
Lewis
On Thu, Feb 23, 2012 at 5:07 AM,
Hi,
For reference, this topic has now been picked up on the dev list.
Thanks for bringing it to attention.
Lewis
On Mon, Feb 20, 2012 at 7:45 AM, shlomi java shlomij...@gmail.com wrote:
As the method in MoreIndexingFilter is called '*resetTitle*', I would
expect something like
Hi,
I tried the parsechecker tool and as it turns out it hangs after printing out:
Content Metadata: Vary=Accept-Encoding Date=Thu, 23 Feb 2012 15:27:43
GMT Content-Length=3992 Expires=Thu, 19 Nov 1981 08:52:00 GMT
Content-Encoding=gzip
Set-Cookie=Shoper4Shop=a3ojqpk5ep6opahejfpiv98hf6; path=/
I have my Solr set up on a secure port -- and I think that is causing
a problem for nutch (nothing else changed.) I don't see anything in
the documentation regarding this.
My nutch version is 1.2, Solr is 3.4. Here's the line from my runbot.sh script:
$NUTCH_HOME/bin/nutch solrindex
Meant to include this...the output from the runbot.sh script. Not
that it really says a whole lot...
- Index (Step 5 of 8) -
SolrIndexer: starting at 2012-02-23 18:18:20
java.io.IOException: Job failed!
-- Chris
On Thu, Feb 23, 2012 at 1:26 PM, Christopher Gross cogr...@gmail.com
I've checked working with redirects and everything seems to work fine for
me.
The site I checked on
http://www.scotland.gov.uk
temp redirect to
http://home.scotland.gov.uk/home
Nutch gets this fine when I do some tweaking with nutch-site.xml
redirects property -1 (just to demonstrate, I
Yeah I can confirm it was 1.4
On Thu, Feb 23, 2012 at 7:05 PM, Christopher Gross cogr...@gmail.comwrote:
I tried using 1.4, but I couldn't get that to work at all.
What is wrong with your configuration, if this is all that is preventing
you from migrating to 1.4 I would rather get it sorted
I was getting it to do parts of the crawl, but it was not pushing the
data to Solr (that was before I moved it to https). I had worked on
that for two weeks, and was frustrated and needed to make progress
with other parts of the project, so I bailed on the newer nutch and
just rolled with 1.2,
Hi Lewis,
Ok, thanks, I will attach the patch to NUTCH-945 after I am done with it, and
update this thread as well...
-sujit
On Feb 23, 2012, at 3:43 AM, Lewis John Mcgibbney wrote:
Hi Sujit,
On Wed, Feb 22, 2012 at 6:16 PM, SUJIT PAL sujit@comcast.net wrote:
Being able to
13 matches
Mail list logo