[ 
https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471952
 ] 

nutch.newbie commented on NUTCH-444:
------------------------------------

Renaud :

Thanks for moving the discussion here. First to answer your question yes its 
based on mime type detectation problem. The goal of the trial was to see if one 
could make just a feed search site i.e just feeds but I didn't succeed. I will 
give it a go over the weekend.

Dogcan:

Yes, one could just replace the feedparser with rome or stax and submit back 
here or use it internally. My discussion point was to see how others see about 
it and maybe there are others who have ran into problem and their experience. 
As Gal pointed out about rome (At least it is being further developed) and stax 
and you pointed out that you are doing something with rome.. I just wanted to 
know what other think and their experience thats all. Yes you are correct i 
posted it in the wrong forum nutch-443. But Nutch-443 started off as someone 
having trouble with RSS and it is important in my view to discuss the issue as 
we are using (feedparser) which is not going to solve the original issue if one 
tries to create just a RSS search engine. Nutch -443 would have not surfaced in 
the first place.

I am looking forward to that day when I can use nutch just to do rss feed 
search engine  so Dogcan I am very interested in your rome impl. maybe you can 
post the code here so that i can participate. 

> Possibly use a different library to parse RSS feed for improved performance 
> and compatibility
> ---------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-444
>                 URL: https://issues.apache.org/jira/browse/NUTCH-444
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 0.9.0
>            Reporter: Renaud Richardet
>            Priority: Minor
>             Fix For: 0.9.0
>
>
> As discussed by Nutch Newbie, Gal, and Chris on NUTCH-443, the current 
> library (feedparser) has the following issues:
> - OutOfMemory when parsing > 100k feeds, since it has to convert the feed to 
> jdom first
> - no support for Atom 1.0
> - there has been no development in the last year
> Alternatives are:
> - Rome 
> - Informa
> - custom implementation based on Stax
> - ??

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to