Re: Plugins: directory not found: plugins

2006-02-07 Thread
Hi Do you mean I should create a dir called build and move dir plugins in? It seems it doesn't work either 2006/2/7, Saravanaraj Duraisamy [EMAIL PROTECTED]: Add build\plugins to your classpath On 2/7/06, 盖世豪侠 [EMAIL PROTECTED] wrote: I try to run nutch using command line and I've add

Re: Categorizing content

2006-02-07 Thread
color as background) and spammy pages or for sites with 3+ adsense ads or other particulars and score appropriately. Has anyone experiemented with this? -- 《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马,当然后悔莫及。

Re: Categorizing content

2006-02-07 Thread
with this? -- Keep Discovering ... ... http://www.jroller.com/page/jmars -- 《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马,当然后悔莫及。

Re: Which version of rss does parse-rss plugin support?

2006-02-05 Thread
, or the California Institute of Technology. -Original Message- From: 盖世豪侠 [mailto:[EMAIL PROTECTED] Sent: Saturday, February 04, 2006 11:40 PM To: nutch-user@lucene.apache.org Subject: Re: Which version of rss does parse-rss plugin support? Hi Chris How do I change the plugin.xml

Re: Which version of rss does parse-rss plugin support?

2006-02-04 Thread
On 2/3/06 7:16 AM, 盖世豪侠 [EMAIL PROTECTED] wrote: Hi *Chris,* The files of RSS 1.0 have a postfix of rdf. So willthe parser recognize it automatically as a rss file? 在06-2-3,Chris Mattmann [EMAIL PROTECTED] 写道: Hi there, parse-rss is based on commons-feedparser (http

Does anybody here do some efforts about RSS/Blog search?

2006-02-04 Thread
Using nutch or lucene. See if we can exchange some ideas.

How to crawl only a specific type of files?

2006-02-03 Thread
one rss file to another. If we want to index rss files, we have to index many html/htm files first. -- 《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马,当然后悔莫及。

Which version of rss does parse-rss plugin support?

2006-02-03 Thread
I see the test file is of version 0.91. Does the plugin support higher versions like 1.0 or 2.0? -- 《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马,当然后悔莫及。

Re: Which version of rss does parse-rss plugin support?

2006-02-03 Thread
website: ...commons-feedparser supports all versions of RSS (0.9, 0.91, 0.92, 1.0, and 2.0), Atom 0.5 (and future versions) as well as easy ad hoc extension and RSS 1.0 modules capability... Hope that helps. Thanks, Chris On 2/3/06 6:46 AM, 盖世豪侠 [EMAIL PROTECTED] wrote: I see the test

How many data have you got?

2006-01-31 Thread
When I performed a whole-web crawl test according to the tutorial, I got Number of pages: 36668 Number of links: 46721. Then how many have you got? -- 《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马,当然后悔莫及。

Re: puzzle about regx ofurl pattern

2006-01-30 Thread
Thank you. 在06-1-30,Steve Betts [EMAIL PROTECTED] 写道: Actually, the ^ means start of line. This character is used as a negative indicator only within the context of sets, eg, [^0-9]. Thanks, Steve Betts [EMAIL PROTECTED] 937-477-1797 -Original Message- From: 盖世豪侠 [mailto

Differences between intranet crawl and whole-web crawl

2006-01-30 Thread
Is there any difference differences between the two situations: 1) use several entry urls in flat file and url patterns in crawl-urlfilter.txt when doing intranet crawl 2) inject only a few urls and use url patterns in regex-urlfilter.txt when doing whole-web crawl -- 《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余

How to restrict the URL patterns of the internet crawl

2006-01-28 Thread
How to restrict the URL patterns of the internet crawl as that of the intranet crawl? -- 《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马,当然后悔莫及。