Hi
Do you mean I should create a dir called build and move dir plugins in?
It seems it doesn't work either
2006/2/7, Saravanaraj Duraisamy [EMAIL PROTECTED]:
Add build\plugins
to your classpath
On 2/7/06, 盖世豪侠 [EMAIL PROTECTED] wrote:
I try to run nutch using command line and I've add
color
as background) and spammy pages or for sites with 3+
adsense ads or other particulars and score
appropriately.
Has anyone experiemented with this?
--
《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马,当然后悔莫及。
with this?
--
Keep Discovering ... ...
http://www.jroller.com/page/jmars
--
《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马,当然后悔莫及。
, or the California Institute of Technology.
-Original Message-
From: 盖世豪侠 [mailto:[EMAIL PROTECTED]
Sent: Saturday, February 04, 2006 11:40 PM
To: nutch-user@lucene.apache.org
Subject: Re: Which version of rss does parse-rss plugin support?
Hi Chris
How do I change the plugin.xml
On 2/3/06 7:16 AM, 盖世豪侠 [EMAIL PROTECTED] wrote:
Hi *Chris,*
The files of RSS 1.0 have a postfix of rdf. So willthe parser recognize
it
automatically as a rss file?
在06-2-3,Chris Mattmann [EMAIL PROTECTED] 写道:
Hi there,
parse-rss is based on commons-feedparser
(http
Using nutch or lucene.
See if we can exchange some ideas.
one rss file to another. If we want to index rss files, we have to index
many html/htm files first.
--
《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马,当然后悔莫及。
I see the test file is of version 0.91.
Does the plugin support higher versions like 1.0 or 2.0?
--
《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马,当然后悔莫及。
website:
...commons-feedparser supports all versions of RSS (0.9, 0.91, 0.92, 1.0,
and 2.0), Atom 0.5 (and future versions) as well as easy ad hoc extension
and RSS 1.0 modules capability...
Hope that helps.
Thanks,
Chris
On 2/3/06 6:46 AM, 盖世豪侠 [EMAIL PROTECTED] wrote:
I see the test
When I performed a whole-web crawl test according to the tutorial, I got
Number of pages: 36668
Number of links: 46721.
Then how many have you got?
--
《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马,当然后悔莫及。
Thank you.
在06-1-30,Steve Betts [EMAIL PROTECTED] 写道:
Actually, the ^ means start of line. This character is used as a negative
indicator only within the context of sets, eg, [^0-9].
Thanks,
Steve Betts
[EMAIL PROTECTED]
937-477-1797
-Original Message-
From: 盖世豪侠 [mailto
Is there any difference differences between the two situations:
1) use several entry urls in flat file and url patterns in
crawl-urlfilter.txt when doing intranet crawl
2) inject only a few urls and use url patterns in regex-urlfilter.txt when
doing whole-web crawl
--
《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余
How to restrict the URL patterns of the internet crawl as that of the
intranet crawl?
--
《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马,当然后悔莫及。
13 matches
Mail list logo