[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-11-15 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542819 ] Renaud Richardet commented on NUTCH-444: hi, i am travelling and will be offline until january 2008. thanks

[jira] Updated: (NUTCH-540) some problem about the Nutch cache

2007-08-09 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renaud Richardet updated NUTCH-540: --- Priority: Major (was: Blocker) could you please attach log files and error messages? thanks

[jira] Updated: (NUTCH-369) StringUtil.resolveEncodingAlias is unuseful.

2007-02-24 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renaud Richardet updated NUTCH-369: --- Attachment: patch.diff unified diff against head. - fixes encoding, as described by King

[jira] Updated: (NUTCH-369) StringUtil.resolveEncodingAlias is unuseful.

2007-02-24 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renaud Richardet updated NUTCH-369: --- Attachment: remover.diff just FYI, you can further filter which element neko should keep and

[jira] Commented: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-13 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472733 ] Renaud Richardet commented on NUTCH-443: hi All, Glad to see that this patch is moving forward :-) I have

[jira] Updated: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-09 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renaud Richardet updated NUTCH-443: --- Attachment: NUTCH-443-draft-v4.patch Hi Dogacan, Thanks for merging the patches, good

[jira] Commented: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-09 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471878 ] Renaud Richardet commented on NUTCH-443: Nutch Newbie, Gal, Chris It's great that you discuss alternative

[jira] Updated: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-08 Thread Renaud Richardet (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renaud Richardet updated NUTCH-443: --- Attachment: parsers.diff Great, here's my work-in-progress(not finished, not tested) for

[jira] Created: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-07 Thread Renaud Richardet (JIRA)
allow parsers to return multiple Parse object, this will speed up the rss parser Key: NUTCH-443 URL: https://issues.apache.org/jira/browse/NUTCH-443 Project: Nutch

[jira] Updated: (NUTCH-412) plugin to parse the feed-url (rss/atom) of a blog

2006-12-03 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-412?page=all ] Renaud Richardet updated NUTCH-412: --- Attachment: plugin_parse-feedUrl2.diff plugin to parse the feed-url (rss/atom) of a blog -

[jira] Created: (NUTCH-412) plugin to parse the feed-url (rss/atom) of a blog

2006-12-02 Thread Renaud Richardet (JIRA)
plugin to parse the feed-url (rss/atom) of a blog - Key: NUTCH-412 URL: http://issues.apache.org/jira/browse/NUTCH-412 Project: Nutch Issue Type: New Feature Affects Versions: 0.9.0

[jira] Updated: (NUTCH-412) plugin to parse the feed-url (rss/atom) of a blog

2006-12-02 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-412?page=all ] Renaud Richardet updated NUTCH-412: --- Attachment: plugin_parse-feedUrl.diff unified diff against head (Rev: 481445) plugin to parse the feed-url (rss/atom) of a blog

[jira] Created: (NUTCH-359) extraction of links will fail for whole page if one single link cannot be parsed

2006-08-24 Thread Renaud Richardet (JIRA)
extraction of links will fail for whole page if one single link cannot be parsed Key: NUTCH-359 URL: http://issues.apache.org/jira/browse/NUTCH-359 Project: Nutch

[jira] Updated: (NUTCH-346) Improve readability of logs/hadoop.log

2006-08-21 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-346?page=all ] Renaud Richardet updated NUTCH-346: --- Attachment: log4j_plugins.diff OK, here we go. This patch should be good for 0.8 and trunk. Improve readability of logs/hadoop.log

[jira] Created: (NUTCH-346) Improve readability of logs/hadoop.log

2006-08-09 Thread Renaud Richardet (JIRA)
Improve readability of logs/hadoop.log -- Key: NUTCH-346 URL: http://issues.apache.org/jira/browse/NUTCH-346 Project: Nutch Issue Type: Improvement Affects Versions: 0.9.0 Environment: ubuntu

[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-08-08 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12426579 ] Renaud Richardet commented on NUTCH-266: KuroSaka, yes you can download the hadoop jar, release 0.5.0 from the project website:

[jira] Commented: (NUTCH-330) command line tool to search a Lucene index

2006-08-08 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-330?page=comments#action_12426629 ] Renaud Richardet commented on NUTCH-330: This bug is obsolte, I just found out that Nutch already allows to search from the command line via bin/nutch

[jira] Updated: (NUTCH-266) hadoop bug when doing updatedb

2006-08-07 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-266?page=all ] Renaud Richardet updated NUTCH-266: --- Attachment: patch_hadoop-0.5.0.diff Now that Hadoop 0.5 has been released, here's the patch to use hadoop-0.5.0.jar in Nutch-0.8.x HTH, Renaud hadoop

[jira] Updated: (NUTCH-266) hadoop bug when doing updatedb

2006-08-02 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-266?page=all ] Renaud Richardet updated NUTCH-266: --- Attachment: patch.diff Thank you Sami, We had a similar problem with Win XP and were able to fix it by using hadoop-nightly.jar. However, because of

[jira] Updated: (NUTCH-208) http: proxy exception list:

2006-07-31 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-208?page=all ] Renaud Richardet updated NUTCH-208: --- Attachment: proxy_exception_list-0.8.diff I updated the patch to 0.8 and corrected small typo (if (!.equals(input[i].trim())){ ). The proxy exception

[jira] Created: (NUTCH-330) command line tool to search a Lucene index

2006-07-25 Thread Renaud Richardet (JIRA)
command line tool to search a Lucene index -- Key: NUTCH-330 URL: http://issues.apache.org/jira/browse/NUTCH-330 Project: Nutch Issue Type: Improvement Components: searcher Affects

[jira] Updated: (NUTCH-330) command line tool to search a Lucene index

2006-07-25 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-330?page=all ] Renaud Richardet updated NUTCH-330: --- Attachment: clSearch.diff unified diff against head command line tool to search a Lucene index --

[jira] Updated: (NUTCH-330) command line tool to search a Lucene index

2006-07-25 Thread Renaud Richardet (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-330?page=all ] Renaud Richardet updated NUTCH-330: --- Attachment: clSearch.diff forgot the echo in sh... command line tool to search a Lucene index --