[jira] Commented: (NUTCH-835) document deduplication (exact duplicates) failed using MD5Signature

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884624#action_12884624 ] Julien Nioche commented on NUTCH-835: - This patch has been marked for 1.2 but has been

[Nutchbase] WebPage class is a generated code?

2010-07-02 Thread Andrzej Bialecki
Hi, (This question is mostly to Dogacan Enis, but I encourage anyone familiar with the code to join the threads with [Nutchbase] - the sooner the better ;) ). I'm looking at src/gora/webpage.avsc and WebPage.java friends... presumably the java code was autogenerated from avsc using Gora?

Re: [Nutchbase] WebPage class is a generated code?

2010-07-02 Thread Julien Nioche
(This question is mostly to Dogacan Enis, but I encourage anyone familiar with the code to join the threads with [Nutchbase] - the sooner the better ;) ). I'm looking at src/gora/webpage.avsc and WebPage.java friends... presumably the java code was autogenerated from avsc using Gora? If

[jira] Created: (NUTCH-840) Port tests from parse-html to parse-tika

2010-07-02 Thread Julien Nioche (JIRA)
Port tests from parse-html to parse-tika Key: NUTCH-840 URL: https://issues.apache.org/jira/browse/NUTCH-840 Project: Nutch Issue Type: Task Components: parser Affects Versions: 1.1

Re: Nutch 2.0 : Design issue

2010-07-02 Thread Julien Nioche
On 2 July 2010 12:22, Andrzej Bialecki a...@getopt.org wrote: On 2010-07-02 12:42, Julien Nioche wrote: Hi guys, You've probably seen that there has been some progress on 2.0 lately. We've updated the nutchbase svn branch with the latest developments done on Dogacan's Github i.e. using

[Nutch Wiki] Trivial Update of PluginCentral by AlexM c

2010-07-02 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The PluginCentral page has been changed by AlexMc. The comment on this change is: adding a couple of external tutorials relating to plugins (more welcome!!!).

[jira] Updated: (NUTCH-840) Port tests from parse-html to parse-tika

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-840: Attachment: NUTCH-840.patch Patch which adds the HTML tests to the Tika Parser The tests currently

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884671#action_12884671 ] Julien Nioche commented on NUTCH-837: - I think we can also get rid of : * docs/ * WAR

Re: [Nutchbase] WebPage class is a generated code?

2010-07-02 Thread Mattmann, Chris A (388J)
Hey Guys, Since they are generated, +1 to: * adding a filepattern to svn:ignore to ignore them * updating build.xml to autogenerate Cheers, Chris On 7/2/10 3:24 AM, Julien Nioche lists.digitalpeb...@gmail.com wrote: (This question is mostly to Dogacan Enis, but I encourage anyone

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884691#action_12884691 ] Chris A. Mattmann commented on NUTCH-837: - Hey Julien: How are we going to replace

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884712#action_12884712 ] Chris A. Mattmann commented on NUTCH-837: - I'm not sure I agree :) The Nutch

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884718#action_12884718 ] Chris A. Mattmann commented on NUTCH-837: - Hey Julien, Yep that's the point. Solr

[jira] Updated: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-837: Attachment: NUTCH-837.patch Updated patch against r959954 (after NUTCH-836). Remove

[jira] Updated: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-837: Attachment: (was: NUTCH-837.patch) Remove search servers and Lucene dependencies

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884729#action_12884729 ] Andrzej Bialecki commented on NUTCH-837: - bq. So, I think we should still have a

[jira] Created: (NUTCH-841) Nutch 2.0 webapp

2010-07-02 Thread Chris A. Mattmann (JIRA)
Nutch 2.0 webapp Key: NUTCH-841 URL: https://issues.apache.org/jira/browse/NUTCH-841 Project: Nutch Issue Type: Improvement Components: web gui Environment: Nutch 2.0 Reporter: Chris A.

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884734#action_12884734 ] Julien Nioche commented on NUTCH-837: - :-) Remove search servers and Lucene

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884731#action_12884731 ] Chris A. Mattmann commented on NUTCH-837: - Okey dok, I created NUTCH-841 to track

[jira] Resolved: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-837. - Resolution: Fixed Committed in r960064. Thanks for review! Remove search servers and

[Nutch Wiki] Update of WritingPluginExample-0.9 by Ramprasad Ramachandran

2010-07-02 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The WritingPluginExample-0.9 page has been changed by Ramprasad Ramachandran. http://wiki.apache.org/nutch/WritingPluginExample-0.9?action=diffrev1=11rev2=12

Build failed in Hudson: Nutch-trunk #1196

2010-07-02 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1196/changes Changes: [ab] NUTCH-837 Remove search servers and Lucene dependencies. [ab] NUTCH-836 Remove deprecated parse plugins. [jnioche] NUTCH-836 : Remove deprecated parse plugins --