Hi,
I have wrote two different plugins in nutch.Both of them are
working individually when tested using bin/nutch plugin .
Take the names of the plugins as A and B. I need to use the plugin A in B.
When I am importing plugin A in B it is giving error that package A is not
found. I
Hi,
In the file conf/crawl-urlfilter.txt check whether u have
commented the following line or not.
# skip file:, ftp:, mailto: urls
-^(file|ftp|mailto):
Also mention the urls u have given to crawl the local files
Srinivas
On Mon, Aug 25, 2008 at 4:18 PM, convoyer [EMAIL
:
-^(ftp|mailto):
2) Also under the urls folder i have a file which contains:
file:///c:/LocalSearch/localfiles/and
http://www.apache.org
I am still unable to get the local files indexed.
Srinivas Gokavarapu wrote:
Hi,
In the file conf/crawl-urlfilter.txt
hi
First check whether u have kept the settings for crawling intranet
correctly. Here is a link check it out.
http://www.folge2.de/tp/search/1/crawling-the-local-filesystem-with-nutch
And try one thing Just try to index only one folder containing more than 32
files.
Regards,
Srinivas.
Hi,
I am crawling large data from the web. I have started crawling and
I got an error saying no disk space. After checking I came to know that
nutch stores temporarily during crawling in /tmp folder. I dont have much
space in / directory. But I have more space on my /home2 directory
-- Forwarded message --
From: harshavardhan innamuri [EMAIL PROTECTED]
Date: Thu, Sep 18, 2008 at 9:42 AM
Subject: Fwd: Fw: Very Urgent..
To: ~badri ~ badrinath [EMAIL PROTECTED], [EMAIL PROTECTED],
[EMAIL PROTECTED], [EMAIL PROTECTED], AnuDeep 4
U.. [EMAIL PROTECTED],
hi,
You should change the url as file://C:/MyData/ and also in
crawl-urlfilter.txt change the file:// line to
+^file://C:/MyData/*
On Thu, Sep 25, 2008 at 11:42 PM, Manu Warikoo [EMAIL PROTECTED] wrote:
Hi,
I am running Nutch 0.9 and am attempting to use it to index files on my
hi,
Check this link For Crawling local pages in
nutchhttp://www.folge2.de/tp/search/1/crawling-the-local-filesystem-with-nutch.
Follow the steps in this site and check once
On Fri, Sep 26, 2008 at 3:24 AM, Kevin MacDonald [EMAIL PROTECTED]wrote:
Manu,
The only way I was able to
hi,
Can u post some of the urls for which parse text is missing.
On Tue, Oct 21, 2008 at 6:44 AM, John Mendenhall [EMAIL PROTECTED]wrote:
We are using nutch version nutch-2008-07-22_04-01-29.
We have a crawldb with over 1 million urls.
We have noticed some of the urls in search
hi
First check in logs/hadoop.log if the page is fetched properly and
also check if the webpage contains the query word. Check the name of the
crawl folder. The name of the folder of the crawl should be crawl, if you
want to change it you can change it conf/nutch-default.xml, searcher.dir
Hi,
I have faced a problem which tokenizing text using standard analyzer. When I
am trying to tokenize the string internet,art,3d,avatar,portraits
using StandardAnalyzer the tokens I got are
internet
art,3d,avatar
portraits
I expected it to be 5 different words. Is this a bug in the analyzer ??
11 matches
Mail list logo