Re: Why Nutch is not crawling all links from web page

2010-04-05 Thread Susam Pal
On Mon, Apr 5, 2010 at 3:32 PM, Anil Kumar a...@nexusemp.com wrote: Hi I'm using Nutch crawler in my project. I scrubbing the data from one of the site which have multiple links in that page leads to another web pages. Nutch does not crawling the all links. Help me to resolve this

Nutch segment merge is very slow

2010-04-05 Thread ashokkumar.raveendiran
Hi I'm using Nutch crawler in my project and crawled more than 2GB of data using Nutch runbot script. Up to 2GB segment merger has took and ended with in 24 hrs but now it takes more than 48 hrs and still running. I have set depth to 16 and topN to 2500. I want to run crawler every day as per my

Re: description and keywords

2010-04-05 Thread ramires
hi Metatag parser work great. When I dumped with readseg I saw metatag.keywords field and data.But I can't solve query part. I put nutch-site.xml this and deploy it. When I querying a keyword there is no search result. Is there anything wrong with this situation ? property

Re: Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 21, 2010

2010-04-05 Thread Grant Ingersoll
Just a reminder, just over one week left open on the CFP. Some great talks entered already. Keep it up! On Mar 24, 2010, at 8:03 PM, Grant Ingersoll wrote: Apache Lucene EuroCon Call For Participation - Prague, Czech Republic May 20 21, 2010 All submissions must be received by

Re: Nutch segment merge is very slow

2010-04-05 Thread Susam Pal
On Mon, Apr 5, 2010 at 5:27 PM, ashokkumar.raveendi...@wipro.com wrote: Hi I'm using Nutch crawler in my project and crawled more than 2GB of data using Nutch runbot script. Up to 2GB segment merger has took and ended with in 24 hrs but now it takes more than 48 hrs and still running. I

RE: Nutch segment merge is very slow

2010-04-05 Thread ashokkumar.raveendiran
Hi, Thank you for your suggestion. I have around 500+ internet urls configured for crawling and crawl process is running in Amazon cloud. I have already reduced my depth to 8, topN to 1000 and also increased fetcher threads to 150 and limited 50 urls per host using generate.max.per.host

Re: description and keywords

2010-04-05 Thread Julien Nioche
Hi The query-basic plugin is used to include these fields in the search e.g. in nutch-site.xml {code:xml} property namequery.basic.description.boost/name value2.0/value /property property namequery.basic.keywords.boost/name value2.0/value /property {code} The query filter included in

Re: Nutch segment merge is very slow

2010-04-05 Thread Andrzej Bialecki
On 2010-04-05 16:54, ashokkumar.raveendi...@wipro.com wrote: Hi, Thank you for your suggestion. I have around 500+ internet urls configured for crawling and crawl process is running in Amazon cloud. I have already reduced my depth to 8, topN to 1000 and also increased fetcher threads

KeepWord filter in Nutch

2010-04-05 Thread MilleBii
I would like to use a keepword filter, like in Solr, but I'm not using Solr... what options do I have apart rebuilding one from scratch in Nutch ? -- -MilleBii-

RE: Nutch segment merge is very slow

2010-04-05 Thread Arkadi.Kosmynin
Hi, -Original Message- From: Susam Pal [mailto:susam@gmail.com] Sent: Tuesday, 6 April 2010 12:18 AM To: nutch-user@lucene.apache.org Subject: Re: Nutch segment merge is very slow On Mon, Apr 5, 2010 at 5:27 PM, ashokkumar.raveendi...@wipro.com wrote: Hi I'm using