[jira] [Commented] (NUTCH-882) Design a Host table in GORA

2011-10-31 Thread Julien Nioche (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140008#comment-13140008 ] Julien Nioche commented on NUTCH-882: - nope, go ahead Design a Host

[jira] [Commented] (NUTCH-1185) Decrease solr.commit.size

2011-10-31 Thread Julien Nioche (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140047#comment-13140047 ] Julien Nioche commented on NUTCH-1185: -- Or we could catch the exceptions (OOME or

[jira] [Commented] (NUTCH-1185) Decrease solr.commit.size

2011-10-31 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140058#comment-13140058 ] Markus Jelsma commented on NUTCH-1185: -- Would that be feasible? It's not easy to

[jira] [Updated] (NUTCH-902) Add all necessary files and configuration so that nutch can be used with different backends out-of-the-box

2011-10-31 Thread Lewis John McGibbney (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-902: --- Attachment: NUTCH-902-v2.patch Revised patch to incorporate additional comments.

[jira] [Resolved] (NUTCH-902) Add all necessary files and configuration so that nutch can be used with different backends out-of-the-box

2011-10-31 Thread Lewis John McGibbney (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-902. Resolution: Fixed Fix Version/s: nutchgora Committed @ revision 1195403 in

[jira] [Updated] (NUTCH-1156) building errors with gora-hbase as a backend; update ivy.xml to use correct dependancies

2011-10-31 Thread Ferdy (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy updated NUTCH-1156: - Attachment: NUTCH-1156-v4.patch New patch that applicable for current branch. Is it ok if I commit this?

[jira] [Commented] (NUTCH-1156) building errors with gora-hbase as a backend; update ivy.xml to use correct dependancies

2011-10-31 Thread Lewis John McGibbney (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140068#comment-13140068 ] Lewis John McGibbney commented on NUTCH-1156: - I forgot that we had reopened

[jira] [Closed] (NUTCH-1156) building errors with gora-hbase as a backend; update ivy.xml to use correct dependancies

2011-10-31 Thread Ferdy (Closed) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy closed NUTCH-1156. Resolution: Fixed Committed. (Btw removed tabs in exclusion block) building errors with

[jira] [Commented] (NUTCH-1154) Upgrade to Tika 0.10

2011-10-31 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140119#comment-13140119 ] Hudson commented on NUTCH-1154: --- Integrated in nutch-trunk-maven #3 (See

[jira] [Commented] (NUTCH-1097) application/xhtml+xml should be enabled in plugin.xml of parse-html; allow multiple mimetypes for plugin.xml

2011-10-31 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140120#comment-13140120 ] Hudson commented on NUTCH-1097: --- Integrated in nutch-trunk-maven #3 (See

[jira] [Commented] (NUTCH-865) Format source code in unique style

2011-10-31 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140121#comment-13140121 ] Hudson commented on NUTCH-865: -- Integrated in nutch-trunk-maven #3 (See

[jira] [Commented] (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a ?

2011-10-31 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140122#comment-13140122 ] Hudson commented on NUTCH-797: -- Integrated in nutch-trunk-maven #3 (See

Re: Nutch Maven build

2011-10-31 Thread Markus Jelsma
Can't those deps be excluded just like in our ivy.xml? On Monday 31 October 2011 14:05:35 lewis john mcgibbney wrote: Hi Everyone, I just ran the maven build on our trunk code, the output of which can be seen here [1]. There are some unresolved dependencies which I think fail the build. I

[jira] [Created] (NUTCH-1186) FreeGenerator always normalizes

2011-10-31 Thread Markus Jelsma (Created) (JIRA)
FreeGenerator always normalizes --- Key: NUTCH-1186 URL: https://issues.apache.org/jira/browse/NUTCH-1186 Project: Nutch Issue Type: Bug Components: generator Affects Versions: 1.3

[jira] [Issue Comment Edited] (NUTCH-1098) better url-normalizer basic

2011-10-31 Thread Ferdy (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140167#comment-13140167 ] Ferdy edited comment on NUTCH-1098 at 10/31/11 1:49 PM: +1 for

[jira] [Commented] (NUTCH-1098) better url-normalizer basic

2011-10-31 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140170#comment-13140170 ] Markus Jelsma commented on NUTCH-1098: -- Path prefixes can be overcome by using -p1

[jira] [Commented] (NUTCH-1184) Fetcher to parse and follow Nth degree outlinks

2011-10-31 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140181#comment-13140181 ] Markus Jelsma commented on NUTCH-1184: -- Additional Todo's: * add setOutlinks to

[jira] [Commented] (NUTCH-1104) Port issues from 1.x to trunk

2011-10-31 Thread Ferdy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140192#comment-13140192 ] Ferdy commented on NUTCH-1104: -- What do you suggest, can I create subcases for each issue I

[jira] [Commented] (NUTCH-1104) Port issues from 1.x to trunk

2011-10-31 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140197#comment-13140197 ] Markus Jelsma commented on NUTCH-1104: -- Yes. This is just an umbrella issue. Thanks

Re: Nutch Maven build

2011-10-31 Thread lewis john mcgibbney
I suppose they probably could Markus. I'm not going to have time to check them today but will try and experiment locally tomorrow maybe. I have a feeling that they're required rather than to0 be excluded. or maybe I'm reading the console output incorrectly. On Mon, Oct 31, 2011 at 1:09 PM,

Re: Nutch Maven build

2011-10-31 Thread Mattmann, Chris A (388J)
Guys, I'm working on this right now as part of the RC, so hopefully should have a working Maven build soonish... Cheers, Chris On Oct 31, 2011, at 8:12 AM, lewis john mcgibbney wrote: I suppose they probably could Markus. I'm not going to have time to check them today but will try and

[jira] [Commented] (NUTCH-1174) Outlinks are not properly normalized

2011-10-31 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140222#comment-13140222 ] Markus Jelsma commented on NUTCH-1174: -- Dupes are not flushed and this seems as it

[jira] [Updated] (NUTCH-1184) Fetcher to parse and follow Nth degree outlinks

2011-10-31 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1184: - Attachment: NUTCH-1184-1.5-3.patch New patch fixes the todo's and incorporates NUTCH-1174.

Re: Nutch Maven build

2011-10-31 Thread Julien Nioche
Guys, I have probably missed a discussion on this lately but I really don't remember that we'd decided to move from ANT+IVY. We've had numerous discussions on this in the past, all leading to the conclusion that maintaining two systems is a bad idea. Have I missed something? Jul PS: If we had

[jira] [Updated] (NUTCH-1184) Fetcher to parse and follow Nth degree outlinks

2011-10-31 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1184: - Attachment: NUTCH-1184-1.5-4.patch New patch does not initialize maxOutlinkDepth in fetcher.

Re: Nutch Maven build

2011-10-31 Thread Markus Jelsma
This was the thing, isn't it? https://issues.apache.org/jira/browse/NUTCH-995 On Monday 31 October 2011 16:28:18 Julien Nioche wrote: Guys, I have probably missed a discussion on this lately but I really don't remember that we'd decided to move from ANT+IVY. We've had numerous discussions

Re: Nutch Maven build

2011-10-31 Thread lewis john mcgibbney
Hi Julien, You are correct, as far as I'm aware we're sticking to the Ant/Ivy config. I think it works excellently for Nutch, especially for testing. I was under the impression that to publish Nutch artefacts to maven repo we need to have a working pom.xml? Is this correct? This was all I was

ANT+MAVEN (was: Nutch Maven build)

2011-10-31 Thread Julien Nioche
Hi Chris Yah, I'm trying to get it working so that we have a working pom that we can release to Maven Central (we need a POM to be working in order to run the mvn release plugin which is what I run to publish to repository.apache.org and then sync to Central). yep Nutch is an interesting

Re: Nutch Maven build

2011-10-31 Thread Julien Nioche
Could you try and generate the pom.xml (see NUTCH-995https://issues.apache.org/jira/browse/NUTCH-995) and see if you are getting the same issues? On 31 October 2011 16:41, lewis john mcgibbney lewis.mcgibb...@gmail.comwrote: I think it's static Julien. This is where my concerns are. On Mon,

Re: ANT+MAVEN (was: Nutch Maven build)

2011-10-31 Thread Ken Krugler
Hi Julien, On Oct 31, 2011, at 4:42pm, Julien Nioche wrote: Hi Chris Yah, I'm trying to get it working so that we have a working pom that we can release to Maven Central (we need a POM to be working in order to run the mvn release plugin which is what I run to publish to

[jira] [Updated] (NUTCH-1172) AbstractNuchTest should have a generic testdir instead of specific 'inject' dir

2011-10-31 Thread Ferdy (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy updated NUTCH-1172: - Description: This is a very trivial issue but nevertheless important for the goal to have clarified tests. This

[jira] [Commented] (NUTCH-1148) Nutchgora job jar functionalilty is broken: PluginManifestParser cannot load plugins from contextClassLoader.

2011-10-31 Thread Ferdy (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140319#comment-13140319 ] Ferdy commented on NUTCH-1148: -- I can verify that running Nutch for several weeks with this

Re: Nutch Maven build

2011-10-31 Thread lewis john mcgibbney
OK I'm getting similar problems to what Gabriele was getting when he was working with you guys on this one. I need more time to properly look through the correspondence, as well as the code in more detail before I am able to move on with this one. BUILD FAILED

Re: Nutch Maven build

2011-10-31 Thread lewis john mcgibbney
I'm also a bit hesitant to be mucking around with this using ant tasks such as deploy and release etc. Its not my intention to start shipping this stuff off to maven central. Can you please give a brief run-down of how this is meant to work. I get it at conceptual level however have never seen it

Re: ANT+MAVEN (was: Nutch Maven build)

2011-10-31 Thread Mattmann, Chris A (388J)
Hey Julien, I've used Maven + Ant tasks :-) In fact, I really love it b/c Ant tasks in Maven are absolutely essentially to save the pain of writing Maven plugins (which I've done in OODT!, ugh) I don't have any experience with the other way around but my guess is that it would be equally

[jira] [Resolved] (NUTCH-1175) Update ivy.xml to use correct dependancies with gora-cassandra as a backend

2011-10-31 Thread Lewis John McGibbney (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1175. - Resolution: Fixed Fixed as this was a duplicated task, as addressed in

[jira] [Closed] (NUTCH-1175) Update ivy.xml to use correct dependancies with gora-cassandra as a backend

2011-10-31 Thread Lewis John McGibbney (Closed) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-1175. --- Update ivy.xml to use correct dependancies with gora-cassandra as a backend

[jira] [Commented] (NUTCH-1138) remove LogUtil from trunk and nutch gora

2011-10-31 Thread Lewis John McGibbney (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140669#comment-13140669 ] Lewis John McGibbney commented on NUTCH-1138: - OK so this patch for trunk

Indexer to use webgraph inlinks

2011-10-31 Thread Markus Jelsma
Hi, Any pointers on how to use the WebGraph's inlink DB instead of the old inlinkdb for the indexer [1] ? Anything to pay extra attention to or keep an eye on? [1]: https://issues.apache.org/jira/browse/NUTCH-1181 Thanks

[jira] [Commented] (NUTCH-902) Add all necessary files and configuration so that nutch can be used with different backends out-of-the-box

2011-10-31 Thread Enis Soztutar (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140684#comment-13140684 ] Enis Soztutar commented on NUTCH-902: - Patch looks good, but can you please test with

[Nutch Wiki] Trivial Update of CommandLineOptions by MarkusJelsma

2011-10-31 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The CommandLineOptions page has been changed by MarkusJelsma: http://wiki.apache.org/nutch/CommandLineOptions?action=diffrev1=39rev2=40 Comment: update to 1.4 - = Nutch 1.3 Command Line