[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471952 ]
nutch.newbie commented on NUTCH-444: ------------------------------------ Renaud : Thanks for moving the discussion here. First to answer your question yes its based on mime type detectation problem. The goal of the trial was to see if one could make just a feed search site i.e just feeds but I didn't succeed. I will give it a go over the weekend. Dogcan: Yes, one could just replace the feedparser with rome or stax and submit back here or use it internally. My discussion point was to see how others see about it and maybe there are others who have ran into problem and their experience. As Gal pointed out about rome (At least it is being further developed) and stax and you pointed out that you are doing something with rome.. I just wanted to know what other think and their experience thats all. Yes you are correct i posted it in the wrong forum nutch-443. But Nutch-443 started off as someone having trouble with RSS and it is important in my view to discuss the issue as we are using (feedparser) which is not going to solve the original issue if one tries to create just a RSS search engine. Nutch -443 would have not surfaced in the first place. I am looking forward to that day when I can use nutch just to do rss feed search engine so Dogcan I am very interested in your rome impl. maybe you can post the code here so that i can participate. > Possibly use a different library to parse RSS feed for improved performance > and compatibility > --------------------------------------------------------------------------------------------- > > Key: NUTCH-444 > URL: https://issues.apache.org/jira/browse/NUTCH-444 > Project: Nutch > Issue Type: Improvement > Components: fetcher > Affects Versions: 0.9.0 > Reporter: Renaud Richardet > Priority: Minor > Fix For: 0.9.0 > > > As discussed by Nutch Newbie, Gal, and Chris on NUTCH-443, the current > library (feedparser) has the following issues: > - OutOfMemory when parsing > 100k feeds, since it has to convert the feed to > jdom first > - no support for Atom 1.0 > - there has been no development in the last year > Alternatives are: > - Rome > - Informa > - custom implementation based on Stax > - ?? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers