Re: Nutch 1.0 with tomcat6 and Firefox does not find all files on Fedora 12

2010-03-11 Thread Hannu Väisänen
On Wed, Feb 24, 2010 at 03:42:20PM +0200, Sami Siren wrote: > Hannu, > > Do you use same set of QueryFilters both in the webapp and when > running from shell? > > Perhaps your filter is not executed when running from cli? You can > verify how your query is transformed by running bin/nutch > org.a

Re: form-based authentication? Any progress

2010-03-11 Thread conficio
Andrzej Bialecki wrote: > > I was involved in a project to implement this (as a proprietary plugin). > ... > So, if you target 10 sites, you can make it work. If you target 10,000 > sites all using slightly different methods, then forget it. > > > -- > Best regards, > Andrzej Bialecki <

Re: Where are new linked entries added

2010-03-11 Thread Andrzej Bialecki
On 2010-03-11 15:53, nikinch wrote: Hi everyone I've been using nutch for a while now and i've come up on a snag. I'm trying to find where new linked pages are added to the segment as a specific entry. To make myself clear i've been through the fetch class and the crawlDBFilter and reducer. B

Re: Proxy Authentication

2010-03-11 Thread Susam Pal
On Thu, Mar 11, 2010 at 8:24 PM, Graziano Aliberti wrote: > Hi everyone, > > I'm trying to use nutch ver. 1.0 on a system under squid proxy control. When > I try to fetch my website list, into the log file I see that the > authentication was failed... > > I've configured my nutch-site.xml file wit

Proxy Authentication

2010-03-11 Thread Graziano Aliberti
Hi everyone, I'm trying to use nutch ver. 1.0 on a system under squid proxy control. When I try to fetch my website list, into the log file I see that the authentication was failed... I've configured my nutch-site.xml file with all that properties needed for proxy auth, but my error is "http

Where are new linked entries added

2010-03-11 Thread nikinch
Hi everyone I've been using nutch for a while now and i've come up on a snag. I'm trying to find where new linked pages are added to the segment as a specific entry. To make myself clear i've been through the fetch class and the crawlDBFilter and reducer. But i'm looking for the initial entry w

Creating new linked entries in crawlDB

2010-03-11 Thread nikinch
Hi everyone Not sure where exactly where to post this question. Sorry for the double post. I've been using nutch for a while now and i've come up on a snag. I'm trying to find where new linked pages are added to the segment as a specific entry. To make myself clear i've been through the fetch c