[ 
https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494735
 ] 

Doğacan Güney commented on NUTCH-444:
-------------------------------------

Now that NUTCH-443 is in, how does everyone feel about this one? We have been 
using ROME in our system for a while now, and we are very happy with it. Its 
biggest advantage against feedparser is (besides being actively developed) that 
it supports modules, meaning it can also parse MediaRss, Itunes podcast, etc., 
so that it is a better building block for a podcast or a video search engine. 

We can also go with the transparency interface, but I am worried that interface 
is going to be huge, if the interface is also going to support video thumbnails 
(from MediaRSS), enclosures, you know, all the extra stuff that comes from rss 
extensions. That's why I think just choosing a library (*cough* ROME *cough* :) 
 is better.

> Possibly use a different library to parse RSS feed for improved performance 
> and compatibility
> ---------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-444
>                 URL: https://issues.apache.org/jira/browse/NUTCH-444
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 0.9.0
>            Reporter: Renaud Richardet
>         Assigned To: Chris A. Mattmann
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: parse-feed-v2.tar.bz2, parse-feed.tar.bz2
>
>
> As discussed by Nutch Newbie, Gal, and Chris on NUTCH-443, the current 
> library (feedparser) has the following issues:
> - OutOfMemory when parsing > 100k feeds, since it has to convert the feed to 
> jdom first
> - no support for Atom 1.0
> - there has been no development in the last year
> Alternatives are:
> - Rome 
> - Informa
> - custom implementation based on Stax
> - ??

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to