how to upgrade a java application with nutch?

2009-10-01 Thread Jaime Martín
Hi! I´ve a java application that I would like to upgrade with nutch. What jars should I add to my lib applicaction to make it possible to use nutch features from some of my app pages and business logic classes? I´ve tried with nutch-1.0.jar generated by war target without success. I wonder what is

Re: how to upgrade a java application with nutch?

2009-10-01 Thread Paul Tomblin
2009/10/1 Jaime Martín james...@gmail.com Hi! I´ve a java application that I would like to upgrade with nutch. What jars should I add to my lib applicaction to make it possible to use nutch features from some of my app pages and business logic classes? I´ve tried with nutch-1.0.jar

Nutch randomly skipping locations during crawl

2009-10-01 Thread tsmori
This is strange. I manage the webservers for a large university library. On our site we have a staff directory where each user has a location for information. The URLs take the form of: http://mydomain.edu/staff/userid I've added the staff URL to the urls seed file. But even with a crawl set to

Re: how to upgrade a java application with nutch?

2009-10-01 Thread Andrzej Bialecki
Jaime Martín wrote: Hi! I´ve a java application that I would like to upgrade with nutch. What jars should I add to my lib applicaction to make it possible to use nutch features from some of my app pages and business logic classes? I´ve tried with nutch-1.0.jar generated by war target without

Re: Nutch randomly skipping locations during crawl

2009-10-01 Thread Andrzej Bialecki
tsmori wrote: This is strange. I manage the webservers for a large university library. On our site we have a staff directory where each user has a location for information. The URLs take the form of: http://mydomain.edu/staff/userid I've added the staff URL to the urls seed file. But even with

Re: how to upgrade a java application with nutch?

2009-10-01 Thread Jaime Martín
thank you for the info. that´s really a problem. I have a java project and for some of its new features I would like to use nutch. As I need to customise nutch my idea was next: - 1st: change what needed for my requirements in my downloaded nutch and generate a nutch library - 2nd: add that

RE: Nutch randomly skipping locations during crawl

2009-10-01 Thread BELLINI ADAM
yes check also if some userids dont have some caracteres like ?, @, *, !, = they are filtred by default : -[...@=] Date: Thu, 1 Oct 2009 18:15:38 +0200 From: a...@getopt.org To: nutch-user@lucene.apache.org Subject: Re: Nutch randomly skipping locations during crawl tsmori wrote:

Re: how to upgrade a java application with nutch?

2009-10-01 Thread Ken Krugler
Hi Jaime, Depending on what exactly you're trying to do, there are some other projects that offer crawler functionality which could be easier to embed. The two I know about are: - Droids (http://incubator.apache.org/droids/), though I haven't really used it. - Bixo

RE: how to upgrade a java application with nutch?

2009-10-01 Thread Fuad Efendi
Hi Jaime, You don't have to embed; try (simplified) Nutch + SOLR (Nutch has plugin for SOLR). And use SolrJ client for SOLR from your application. This is very easy. -Fuad http://www.linkedin.com/in/liferay -Original Message- From: Jaime Martín [mailto:james...@gmail.com] Sent:

Re: R: Using Nutch for only retriving HTML

2009-10-01 Thread Andrzej Bialecki
BELLINI ADAM wrote: hi, but how to dump the content ? i tried this command : ./bin/nutch readseg -dump crawl/segments/20090903121951/content/ toto and it said : Exception in thread main org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:

RE: Nutch randomly skipping locations during crawl

2009-10-01 Thread tsmori
Both good ideas. Unfortunately, the content for each user is the same. It's a static php file that simply calls information out of our LDAP. It's very strange because I cannot see any difference between the user files/directories that are fetched and those that aren't. In checking both the crawl

Re: Nutch randomly skipping locations during crawl

2009-10-01 Thread Andrzej Bialecki
tsmori wrote: Both good ideas. Unfortunately, the content for each user is the same. It's a static php file that simply calls information out of our LDAP. It's very strange because I cannot see any difference between the user files/directories that are fetched and those that aren't. In checking

Re: Something wrong with nutch.wiki

2009-10-01 Thread Kirby Bohling
2009/9/29 Ольга Пескова opesk...@mail.ru: Hello! Please check the url: http://wiki.apache.org/nutch/ I can't find any content there. Just as a point of reference, I got the FrontPage to pull up just prior to sending this e-mail. I'm not sure what is wrong with your connection to it, but I

Re: Something wrong with nutch.wiki

2009-10-01 Thread Paul Tomblin
2009/10/1 Kirby Bohling kirby.bohl...@gmail.com: 2009/9/29 Ольга Пескова opesk...@mail.ru: Hello! Please check the url: http://wiki.apache.org/nutch/ I can't find any content there. Just as a point of reference, I got the FrontPage to pull up just prior to sending this e-mail.  I'm not

Fetcher problems with stable version of nutch-1.0 ?

2009-10-01 Thread Vijay
Hi all, I am trying to use nutch to crawl and index a list of about 50K URLs with depth=1. I am running indexing with the command: nutch-1.0/bin/nutch crawl urls/ -depth 1 -topN 10 with appropriate changes to the configuration files. I find that the fetching always terminates

RE: Something wrong with nutch.wiki

2009-10-01 Thread Brian Tingle
FWIW, I often have problems getting to wiki.apache.org. I could not get there this morning, and had to read what I needed out of the google cache. |-Original Message- |From: ptomb...@gmail.com [mailto:ptomb...@gmail.com] On Behalf Of Paul |Tomblin |Sent: Thursday, October 01, 2009 4:32