Build failed in Hudson: Nutch-trunk #1196

2010-07-02 Thread Apache Hudson Server
See Changes: [ab] NUTCH-837 Remove search servers and Lucene dependencies. [ab] NUTCH-836 Remove deprecated parse plugins. [jnioche] NUTCH-836 : Remove deprecated parse plugins -- [...t

[Nutch Wiki] Update of "WritingPluginExample-0.9" by Ramprasad Ramachandran

2010-07-02 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "WritingPluginExample-0.9" page has been changed by Ramprasad Ramachandran. http://wiki.apache.org/nutch/WritingPluginExample-0.9?action=diff&rev1=11&rev2=12

[jira] Resolved: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-837. - Resolution: Fixed Committed in r960064. Thanks for review! > Remove search servers and Lu

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884739#action_12884739 ] Julien Nioche commented on NUTCH-837: - Comments on the latest patch : * default.propert

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884731#action_12884731 ] Chris A. Mattmann commented on NUTCH-837: - Okey dok, I created NUTCH-841 to track it

[jira] Issue Comment Edited: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884729#action_12884729 ] Andrzej Bialecki edited comment on NUTCH-837 at 7/2/10 11:55 AM:

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884734#action_12884734 ] Julien Nioche commented on NUTCH-837: - :-) > Remove search servers and Lucene dependenc

[jira] Created: (NUTCH-841) Nutch 2.0 webapp

2010-07-02 Thread Chris A. Mattmann (JIRA)
Nutch 2.0 webapp Key: NUTCH-841 URL: https://issues.apache.org/jira/browse/NUTCH-841 Project: Nutch Issue Type: Improvement Components: web gui Environment: Nutch 2.0 Reporter: Chris A. Mattman

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884729#action_12884729 ] Andrzej Bialecki commented on NUTCH-837: - bq. So, I think we should still have a Nu

[jira] Updated: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-837: Attachment: (was: NUTCH-837.patch) > Remove search servers and Lucene dependencies > --

[jira] Updated: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-837: Attachment: NUTCH-837.patch Updated patch against r959954 (after NUTCH-836). > Remove searc

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884718#action_12884718 ] Chris A. Mattmann commented on NUTCH-837: - Hey Julien, Yep that's the point. Solr !

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884715#action_12884715 ] Julien Nioche commented on NUTCH-837: - Thanks for your comments Chris {quote} The Nutch

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884712#action_12884712 ] Chris A. Mattmann commented on NUTCH-837: - I'm not sure I agree :) The Nutch webap

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884700#action_12884700 ] Julien Nioche commented on NUTCH-837: - Hi Chris, My position on this is that we simply

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884691#action_12884691 ] Chris A. Mattmann commented on NUTCH-837: - Hey Julien: How are we going to replace

Re: [Nutchbase] WebPage class is a generated code?

2010-07-02 Thread Mattmann, Chris A (388J)
Hey Guys, Since they are generated, +1 to: * adding a filepattern to svn:ignore to ignore them * updating build.xml to autogenerate Cheers, Chris On 7/2/10 3:24 AM, "Julien Nioche" wrote: (This question is mostly to Dogacan & Enis, but I encourage anyone familiar with the code to

[jira] Commented: (NUTCH-837) Remove search servers and Lucene dependencies

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884671#action_12884671 ] Julien Nioche commented on NUTCH-837: - I think we can also get rid of : * docs/ * WAR

[jira] Updated: (NUTCH-840) Port tests from parse-html to parse-tika

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-840: Attachment: NUTCH-840.patch Patch which adds the HTML tests to the Tika Parser The tests currently

[Nutch Wiki] Trivial Update of "PluginCentral" by AlexM c

2010-07-02 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "PluginCentral" page has been changed by AlexMc. The comment on this change is: adding a couple of external tutorials relating to plugins (more welcome!!!). http://wiki.apache.org/nu

Re: Nutch 2.0 : Design issue

2010-07-02 Thread Julien Nioche
On 2 July 2010 12:22, Andrzej Bialecki wrote: > On 2010-07-02 12:42, Julien Nioche wrote: > >> Hi guys, >> >> You've probably seen that there has been some progress on 2.0 lately. >> We've >> updated the nutchbase svn branch with the latest developments done on >> Dogacan's Github i.e. using GORA

Re: Nutch 1.0 partially content indexing by Nutch

2010-07-02 Thread HBFan
I am also looking for a way to exclude certain content from within a html page that is being parsed. I am trying to do it from within the Parse Filter, but I am not sure how to do it. Did you figure out anything? Does anyone else know how this would work? Thanks -- View this message in contex

Re: Nutch 2.0 : Design issue

2010-07-02 Thread Andrzej Bialecki
On 2010-07-02 12:42, Julien Nioche wrote: Hi guys, You've probably seen that there has been some progress on 2.0 lately. We've updated the nutchbase svn branch with the latest developments done on Dogacan's Github i.e. using GORA as a storage layer. One of the main issues [1] I raised after usin

[jira] Closed: (NUTCH-836) Remove deprecated parse plugins

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-836. --- Resolution: Fixed Committed revision 959948. Thanks Andrzej for reviewing it > Remove deprecated par

Re: [Nutchbase] WebPage class is a generated code?

2010-07-02 Thread Andrzej Bialecki
On 2010-07-02 12:24, Julien Nioche wrote: (This question is mostly to Dogacan& Enis, but I encourage anyone familiar with the code to join the threads with [Nutchbase] - the sooner the better ;) ). I'm looking at src/gora/webpage.avsc and WebPage.java& friends... presumably the java code was

Nutch 2.0 : Design issue

2010-07-02 Thread Julien Nioche
Hi guys, You've probably seen that there has been some progress on 2.0 lately. We've updated the nutchbase svn branch with the latest developments done on Dogacan's Github i.e. using GORA as a storage layer. One of the main issues [1] I raised after using nutchbase was that : NutchBase currently

[jira] Commented: (NUTCH-835) document deduplication (exact duplicates) failed using MD5Signature

2010-07-02 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884630#action_12884630 ] Andrzej Bialecki commented on NUTCH-835: - Sorry, I should've been more precise - I

[jira] Created: (NUTCH-840) Port tests from parse-html to parse-tika

2010-07-02 Thread Julien Nioche (JIRA)
Port tests from parse-html to parse-tika Key: NUTCH-840 URL: https://issues.apache.org/jira/browse/NUTCH-840 Project: Nutch Issue Type: Task Components: parser Affects Versions: 1.1

Re: [Nutchbase] WebPage class is a generated code?

2010-07-02 Thread Julien Nioche
> > (This question is mostly to Dogacan & Enis, but I encourage anyone familiar > with the code to join the threads with [Nutchbase] - the sooner the better > ;) ). > > I'm looking at src/gora/webpage.avsc and WebPage.java & friends... > presumably the java code was autogenerated from avsc using Go

[Nutchbase] WebPage class is a generated code?

2010-07-02 Thread Andrzej Bialecki
Hi, (This question is mostly to Dogacan & Enis, but I encourage anyone familiar with the code to join the threads with [Nutchbase] - the sooner the better ;) ). I'm looking at src/gora/webpage.avsc and WebPage.java & friends... presumably the java code was autogenerated from avsc using Gora?

[jira] Commented: (NUTCH-835) document deduplication (exact duplicates) failed using MD5Signature

2010-07-02 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884624#action_12884624 ] Julien Nioche commented on NUTCH-835: - This patch has been marked for 1.2 but has been c