who can tell  me where and how to build a nutch document in nutch-0.8.1?

for example , one html page is a document , but i want to detach a document
to several ones .

On 1/27/07, kauu <[EMAIL PROTECTED]> wrote:

that's the right thing.

i think we should to do some thing when nutch fetch a page successfully,
judge if a rss then create as many pages as the items'  number.i  don't
know whether it work.
In the other hand , we can do some thing in the segment just like what u
say .


i don't know that whether we can write a plugin to get the functionality.

anyone who can give me some hint?

On 1/26/07, Gal Nitzan <[EMAIL PROTECTED]> wrote:
>
> Hi Kauu,
>
> The functionality you require doesn't exist in the current parse-rss
> plugin. I need the same functionality but it doesn't exist and I believe
> it's not a simple task.
>
> The functionality required basically is to create a page in a segment
> for each item and the URL to the crawldb.
>
> Since the data already exists in the item element there is no reason to
> "fetch" the page (item). After that the only thing left is to index it.
>
> Any thoughts on how to achieve that goal?
>
> Gal.
>
>
>
>
>
>
> -----Original Message-----
> From: kauu [mailto:[EMAIL PROTECTED]
> Sent: Friday, January 26, 2007 4:17 AM
> To: nutch-dev@lucene.apache.org
> Subject: parse-rss make them items as different pages
>
> i want to crawl the rss feeds and parse them ,then index them and at
> last
> when search the content I just want that the hit just like an individual
> page.
>
>
> i don't know wether i tell u clearly.
>
> <item>
>     <title>欧洲暴风雪后发制人 致航班延误交通混乱(组图)</title>
>     <description>暴风雪横扫欧洲,导致多次航班延误
> 1月24日,几架民航客机在德国斯图加特机场内等待去除机身上冰雪。1月24日,工作人员在德国南部的慕尼黑机场清扫飞机跑道上的积雪。
> 据报道,迟来的暴风雪连续两天横扫中...
>     </description>
>     <link>http://news.sohu.com/20070125/n247833568.shtml </link>
>     <category>搜狐焦点图新闻</category>
>     <author>[EMAIL PROTECTED]</author>
>     <pubDate>Thu, 25 Jan 2007 11:29:11 +0800</pubDate>
>     <comments>
> http://comment.news.sohu.com/comment/topic.jsp?id=247833847</comments>
> </item>
>
> this one item in an rss file
>
> i want nutch deal with an item like an individual page.
>
> so i search something in this item,the nutch return it as a hit.
>
> so ...
> any one can tell me how to do about ?
> any reply will be appreciated
>
> --
> www.babatu.com
>



--
www.babatu.com




--
www.babatu.com
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to