Re: Plugin path in Nutch web

2005-12-08 Thread Arun Kaundal
Make sure that you have specify correct name/path for plugins directory( plugin.folders). Also both plugin and nutch source must be from same version. Another solution is download latest trunk version of nutch from svn. On 12/8/05, Nguyen Ngoc Giang <[EMAIL PROTECTED]> wrote: > > Hi everyone,

Problem with fetching segment

2005-12-08 Thread Håvard W. Kongsgård
I have followed the media-style.com quick tutorial, but when I try to fetch my segment the fetch is killed! Have tried to set the system timer + 30 days, no anti-virus is running on the systems. System SUSE 9.2 and SUSE 10 # bin/nutch fetch segments/20060109014654/ 060109 014714 parsing file

Re: Luke and Indexes

2005-12-08 Thread Bryan Woliner
Thank you very much for the helpful answers. Most of the pages that didn't make it into the index were indeed due to protocol errors (mostly exceeding http.max.delay). One quick side note. When I was looking at the Nutch wiki page for bin/nutch segread, I noticed an error on the page and wasn't su

After mergesegs

2005-12-08 Thread Goldschmidt, Dave
Hi, just wanted to be sure - after I merge segments via the "mergesegs" tool, I need to use the "updatedb" tool before dropping the new indexes in, correct? And, as just posted, I need to shutdown and restart Tomcat, too, yes? Thanks, DaveG

How to refresh the application context - to use the merged index

2005-12-08 Thread K.A.Hussain Ali
HI all, ..while crawling using Nutch,i do segment merging and indexing ,but the search doesnt look into the new mergedsegment unless i restart the server.. Is there any way to refresh the application context to look into the new mergedindex without stopping the server ? Is there anyway to do t

Re: Crawling listing (pagination) pages.

2005-12-08 Thread Jack Tang
Hi I am facing the same problem. However my crawl only focuses on some website and I recognize the paganition url ursing regexp and inject them in every fetch cycle. /Jack On 12/8/05, K.A.Hussain Ali <[EMAIL PROTECTED]> wrote: > HI all, > > Do Nutch crawl pages in any listing pages( pages with

Crawling listing (pagination) pages.

2005-12-08 Thread K.A.Hussain Ali
HI all, Do Nutch crawl pages in any listing pages( pages with pagination as in search engines) While crawling through nutch i need to get the pages that gets displayed by the pagination unless i increase the depth of the whole crawling. Do nutch provide any plugin for the above issue

Re: Too many open file error -while searching using Nutch

2005-12-08 Thread Stefan Groschupf
Hi, browse the mail archive this is discussed many times, To summarize you need to merge the segments since you just have to many indexes. You can also change the limit of open files of you OS. (ulimit). HTH Stefan Am 08.12.2005 um 11:30 schrieb K.A.Hussain Ali: HI all, I get an error li

Too many open file error -while searching using Nutch

2005-12-08 Thread K.A.Hussain Ali
HI all, I get an error like " Too Many open files " while i try to search my segments which is in hundreds in count. Is there any way to solve this issue ? Do Nutch dont close the segmens afte the search ? kindly send your suggestion for the above issue .. Thanks in advance. Regards -Hussai

Crawling - dynamically generate web pages with paginations

2005-12-08 Thread K.A.Hussain Ali
Hi all, I need to crawl a page that has a pagination which is more that the depth i specify.. could i crawl all the pages listed in the pagination.. or is their any way in Nutch to change the depth if the page has paginations in that... Any ideas would help greatly.. Thanks in advance regards -

Plugin path in Nutch web

2005-12-08 Thread Nguyen Ngoc Giang
Hi everyone, I'm writing an JSP program to allow crawling via web. My JSP script follows nutch.tools.CrawlTool, which try to create database, inject database, fecth and index. I have difficulty of identifying the plugins. Creating database is fine, because it doesn't require any plugin. But

how to

2005-12-08 Thread Riku | http://kukusky.8800.org
how to use nutch support chinese? -- My Web: http://kukusky.8800.org MSN: [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED]

Re: Luke and Indexes

2005-12-08 Thread Andrzej Bialecki
Bryan Woliner wrote: I have a couple very basic questions about Luke and indexes in general. Answers to any of these questions are much appreciated: 1. In the Luke overview tab, what does "Index version" refer to? It's the time (as in System.currentTimeMillis()) when the index was last mo