Re: [Nutch-dev] RSS-fecter and index individul-how can i realize this function

Doug Cutting Tue, 06 Feb 2007 10:00:59 -0800

Doğacan Güney wrote:
> OK, then should I go forward with this and implement something?   This
> should be pretty easy,
> though I am not sure what to give as keys to a Parse[].
> 
> I mean, when getParse returned a single Parse, ParseSegment output them
> as <url, Parse>. But, if getParse
> returns an array, what will be the key for each element?


Perhaps Parser#parser could return a Map<String,Parse>, where the keys 
are URLs?

> Something like <url#i, Parse[i]> may work, but this may cause problems
> in dedup(for example,
> assume we fetched the same rss feed twice, and indexed them in different
> indexes. Two version's url#0 may be
> different items but since they have the same key, dedup will delete the
> older).

If the feed contains unique ids for items, then that can be used to 
qualify the URL.  Otherwise one could use the hash of the link of the item.

Since the target of the link must still be indexed separately from the 
item itself, how much use is all this?  If the RSS document is 
considered a single page that changes frequently, and item's links are 
considered ordinary outlinks, isn't much the same effect achieved?

Doug

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Re: [Nutch-dev] RSS-fecter and index individul-how can i realize this function

Reply via email to