reg: plugins

2008-05-22 Thread Srinivas Gokavarapu
Hi, I have wrote two different plugins in nutch.Both of them are working individually when tested using bin/nutch plugin . Take the names of the plugins as A and B. I need to use the plugin A in B. When I am importing plugin A in B it is giving error that package A is not found. I

Re: Unable to search LOCAL FILES

2008-08-26 Thread Srinivas Gokavarapu
Hi, In the file conf/crawl-urlfilter.txt check whether u have commented the following line or not. # skip file:, ftp:, mailto: urls -^(file|ftp|mailto): Also mention the urls u have given to crawl the local files Srinivas On Mon, Aug 25, 2008 at 4:18 PM, convoyer [EMAIL

Re: Unable to search LOCAL FILES

2008-08-26 Thread Srinivas Gokavarapu
: -^(ftp|mailto): 2) Also under the urls folder i have a file which contains: file:///c:/LocalSearch/localfiles/and http://www.apache.org I am still unable to get the local files indexed. Srinivas Gokavarapu wrote: Hi, In the file conf/crawl-urlfilter.txt

Re: can not deal too many files under one folder

2008-09-02 Thread Srinivas Gokavarapu
hi First check whether u have kept the settings for crawling intranet correctly. Here is a link check it out. http://www.folge2.de/tp/search/1/crawling-the-local-filesystem-with-nutch And try one thing Just try to index only one folder containing more than 32 files. Regards, Srinivas.

Re: Temporary storage during crawling

2008-09-16 Thread Srinivas Gokavarapu
Hi, I am crawling large data from the web. I have started crawling and I got an error saying no disk space. After checking I came to know that nutch stores temporarily during crawling in /tmp folder. I dont have much space in / directory. But I have more space on my /home2 directory

Fwd: Fw: Very Urgent..

2008-09-18 Thread Srinivas Gokavarapu
-- Forwarded message -- From: harshavardhan innamuri [EMAIL PROTECTED] Date: Thu, Sep 18, 2008 at 9:42 AM Subject: Fwd: Fw: Very Urgent.. To: ~badri ~ badrinath [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], AnuDeep 4 U.. [EMAIL PROTECTED],

Re: FW: Indexing Files on Local File System

2008-09-25 Thread Srinivas Gokavarapu
hi, You should change the url as file://C:/MyData/ and also in crawl-urlfilter.txt change the file:// line to +^file://C:/MyData/* On Thu, Sep 25, 2008 at 11:42 PM, Manu Warikoo [EMAIL PROTECTED] wrote: Hi, I am running Nutch 0.9 and am attempting to use it to index files on my

Re: Indexing Files on Local File System

2008-09-25 Thread Srinivas Gokavarapu
hi, Check this link For Crawling local pages in nutchhttp://www.folge2.de/tp/search/1/crawling-the-local-filesystem-with-nutch. Follow the steps in this site and check once On Fri, Sep 26, 2008 at 3:24 AM, Kevin MacDonald [EMAIL PROTECTED]wrote: Manu, The only way I was able to

Re: nutch parsetext missing for some urls

2008-10-21 Thread Srinivas Gokavarapu
hi, Can u post some of the urls for which parse text is missing. On Tue, Oct 21, 2008 at 6:44 AM, John Mendenhall [EMAIL PROTECTED]wrote: We are using nutch version nutch-2008-07-22_04-01-29. We have a crawldb with over 1 million urls. We have noticed some of the urls in search

Re: indexing after fetching

2009-02-17 Thread Srinivas Gokavarapu
hi First check in logs/hadoop.log if the page is fetched properly and also check if the webpage contains the query word. Check the name of the crawl folder. The name of the folder of the crawl should be crawl, if you want to change it you can change it conf/nutch-default.xml, searcher.dir

Problem with Standard analyzer

2010-04-28 Thread Srinivas Gokavarapu
Hi, I have faced a problem which tokenizing text using standard analyzer. When I am trying to tokenize the string internet,art,3d,avatar,portraits using StandardAnalyzer the tokens I got are internet art,3d,avatar portraits I expected it to be 5 different words. Is this a bug in the analyzer ??