It seems that you don't set batch id correctly. I see Crawler class is not
used in launch script so you can try bin/nutch or bin/crawl command to run
the Nutch again.
On Thu, May 15, 2014 at 9:10 AM, 基勇 252637...@qq.com wrote:
Which friend can help solve this problem?
Thank's
Hi,
I'm new to Nutch. I have crawling several sites using Nutch and it works,
with several website as exception. I've looked up on hadoop.log buat can't
find any suspected errors for the failed crawling site. No document added
on console as any other successful crawling like this:
2014-05-15
I have a situation in which, ideally, I would like to combine data parsed
from two separate web pages into a single document, which would then be
indexed into Solr. I have looked at the options for passing two separate
documents to Solr and combining the data at query time, but none of the
Hi Folks,
Has anyone done this before?
Is email archiving something which we can do or not?
I've been playing around with Geronimo's Javamail library and wondered if
we could use it as Protocol extensions for above protocol's.
Any thoughts?
Lewis
--
*Lewis*
Title filed needs to be set to multivalued - Tika issue, Tioka may return
multiple values for Title on PDF's
On Thu, May 8, 2014 at 1:37 AM, BlackIce blackice...@gmail.com wrote:
Thnx
On Wed, May 7, 2014 at 4:07 AM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi BlackIce,
On
Hi
The dedup is now independent from any specific backend as you can see by
typing './nutch dedup'
*Usage: DeduplicationJob crawldb*
what is does is that it marks the duplicates within the crawldb and this is
then used by the indexer to delete the corresponding entries.
I have updated the
Hi,
Usage: Generator crawldb segments_dir [-force] [-topN N] *[-numFetchers
numFetchers]* [-adddays numDays] [-noFilter] [-noNorm][-maxNumSegments num]
set -numFetchers 10 to use all your slaves. Of course if all your URLs
belong to the same host they'll end up being processed by a single
I think patch-1651 https://issues.apache.org/jira/browse/NUTCH-1651 solved my
problem.
From: karvouni...@hotmail.com
To: user@nutch.apache.org
Subject: RE: Fetcher-Parser Nutch 2.2.1
Date: Mon, 12 May 2014 12:20:52 +0300
Thank you Talat in advance for helping me so much!
How can I get rid
8 matches
Mail list logo