[jira] [Updated] (NUTCH-1304) GeneratorMapper.java dosen't return when skipping and already generated mark
[ https://issues.apache.org/jira/browse/NUTCH-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Rosher updated NUTCH-1304: -- Attachment: NUTCH-1304.patch GeneratorMapper.java dosen't return when skipping and already generated mark Key: NUTCH-1304 URL: https://issues.apache.org/jira/browse/NUTCH-1304 Project: Nutch Issue Type: Bug Components: generator Affects Versions: nutchgora Reporter: Dan Rosher Priority: Minor Fix For: nutchgora Attachments: NUTCH-1304.patch GeneratorMapper.java dosen't return when skipping and already generated mark -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (NUTCH-1304) GeneratorMapper.java dosen't return when skipping and already generated mark
GeneratorMapper.java dosen't return when skipping and already generated mark Key: NUTCH-1304 URL: https://issues.apache.org/jira/browse/NUTCH-1304 Project: Nutch Issue Type: Bug Components: generator Affects Versions: nutchgora Reporter: Dan Rosher Priority: Minor Fix For: nutchgora Attachments: NUTCH-1304.patch GeneratorMapper.java dosen't return when skipping and already generated mark -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1304) GeneratorMapper.java dosen't return when skipping and already generated mark
[ https://issues.apache.org/jira/browse/NUTCH-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225159#comment-13225159 ] Lewis John McGibbney commented on NUTCH-1304: - +1 for commit. I'll wait until this afternoon to hear back from anyone else before doing so. Thanks Dan. GeneratorMapper.java dosen't return when skipping and already generated mark Key: NUTCH-1304 URL: https://issues.apache.org/jira/browse/NUTCH-1304 Project: Nutch Issue Type: Bug Components: generator Affects Versions: nutchgora Reporter: Dan Rosher Priority: Minor Fix For: nutchgora Attachments: NUTCH-1304.patch GeneratorMapper.java dosen't return when skipping and already generated mark -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1300) Indexer to normalize URL's
[ https://issues.apache.org/jira/browse/NUTCH-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225174#comment-13225174 ] Sebastian Nagel commented on NUTCH-1300: +1 * effective fix for a serious problem: long running continuous crawls require adjustments of the normalization rules quite often * tested (with 1.4): costs (time spent for extra normalization) are ok compared to the benefit Two suggestions: # Does a URLNormalizer scope index make sense? E.g., if only outlinks are normalized and default rules are empty, the scope index may use the same rules as scope outlink. # Wouldn't commandline options for solrindex be nice? Most other tools (generate, updatedb, invertlinks) have options such as -filter / -norm / -noNorm. Indexer to normalize URL's -- Key: NUTCH-1300 URL: https://issues.apache.org/jira/browse/NUTCH-1300 Project: Nutch Issue Type: New Feature Components: indexer Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.5 Attachments: NUTCH-1300-1.5-1.patch Indexers should be able to normalize URL's. This is useful when a new normalizer is applied to the entire CrawlDB. Without it, some or all records in a segment cannot be indexed at all. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1305) Domain(blacklist)URLFilter to trim entries
[ https://issues.apache.org/jira/browse/NUTCH-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1305: - Attachment: NUTCH-1305-1.5-1.patch Patch for 1.5. Fixes the issue. Domain(blacklist)URLFilter to trim entries -- Key: NUTCH-1305 URL: https://issues.apache.org/jira/browse/NUTCH-1305 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.5 Attachments: NUTCH-1305-1.5-1.patch Both filters should handle entries with trailing whitespace. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (NUTCH-1305) Domain(blacklist)URLFilter to trim entries
Domain(blacklist)URLFilter to trim entries -- Key: NUTCH-1305 URL: https://issues.apache.org/jira/browse/NUTCH-1305 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.5 Attachments: NUTCH-1305-1.5-1.patch Both filters should handle entries with trailing whitespace. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1305) Domain(blacklist)URLFilter to trim entries
[ https://issues.apache.org/jira/browse/NUTCH-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225206#comment-13225206 ] Lewis John McGibbney commented on NUTCH-1305: - +1 Domain(blacklist)URLFilter to trim entries -- Key: NUTCH-1305 URL: https://issues.apache.org/jira/browse/NUTCH-1305 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.5 Attachments: NUTCH-1305-1.5-1.patch Both filters should handle entries with trailing whitespace. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NUTCH-1305) Domain(blacklist)URLFilter to trim entries
[ https://issues.apache.org/jira/browse/NUTCH-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1305. -- Resolution: Fixed Committed for 1.5 in rev. 1298394. Domain(blacklist)URLFilter to trim entries -- Key: NUTCH-1305 URL: https://issues.apache.org/jira/browse/NUTCH-1305 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.5 Attachments: NUTCH-1305-1.5-1.patch Both filters should handle entries with trailing whitespace. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1305) Domain(blacklist)URLFilter to trim entries
[ https://issues.apache.org/jira/browse/NUTCH-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225209#comment-13225209 ] Markus Jelsma commented on NUTCH-1305: -- Thanks Lewis. Domain(blacklist)URLFilter to trim entries -- Key: NUTCH-1305 URL: https://issues.apache.org/jira/browse/NUTCH-1305 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.5 Attachments: NUTCH-1305-1.5-1.patch Both filters should handle entries with trailing whitespace. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (NUTCH-1306) Commit after finished writing to solr index
Commit after finished writing to solr index --- Key: NUTCH-1306 URL: https://issues.apache.org/jira/browse/NUTCH-1306 Project: Nutch Issue Type: Improvement Components: indexer Affects Versions: nutchgora Reporter: Dan Rosher Priority: Trivial Fix For: nutchgora Attachments: NUTCH-1306.patch Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-1306) Commit after finished writing to solr index
[ https://issues.apache.org/jira/browse/NUTCH-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Rosher updated NUTCH-1306: -- Attachment: NUTCH-1306.patch Commit after finished writing to solr index --- Key: NUTCH-1306 URL: https://issues.apache.org/jira/browse/NUTCH-1306 Project: Nutch Issue Type: Improvement Components: indexer Affects Versions: nutchgora Reporter: Dan Rosher Priority: Trivial Fix For: nutchgora Attachments: NUTCH-1306.patch Commit after finished writing to solr index - otherwise a bit confusing not seeing the number of docs we expect in solr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
NutchGora release, and Nutch 1.x trunk release
Hey Guys, I've got some cycles this weekend -- anyone up for a 1.5 release off trunk (stable), and a NutchGora branch release? I suggested this before [1] regarding NutchGora. I'm inclined to say let's do the following: 1. NutchGora: apache-nutch-2.0 - release 2.x series based on this branch 2. Nutch: apache-nutch-1.x - stable trunk branch Then, when the time comes, we can try and create a: 3. Nutch: apache-nutch-3.x - merge of 1.x and 2.x feature branches Would this make sense? Anyways we don't have to decide anything now that we can't undo later, but are folks OK with me doing an RC for NutchGora and for 1.x this weekend? Cheers, Chris [1] http://s.apache.org/GD2 ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
[jira] [Commented] (NUTCH-1305) Domain(blacklist)URLFilter to trim entries
[ https://issues.apache.org/jira/browse/NUTCH-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225232#comment-13225232 ] Hudson commented on NUTCH-1305: --- Integrated in nutch-trunk-maven #187 (See [https://builds.apache.org/job/nutch-trunk-maven/187/]) NUTCH-1305 Domain(blacklist)URLFilter to trim entries (Revision 1298394) Result = SUCCESS markus : Files : * /nutch/trunk/src/plugin/urlfilter-domainblacklist/src/java/org/apache/nutch/urlfilter/domainblacklist/DomainBlacklistURLFilter.java Domain(blacklist)URLFilter to trim entries -- Key: NUTCH-1305 URL: https://issues.apache.org/jira/browse/NUTCH-1305 Project: Nutch Issue Type: Bug Affects Versions: 1.4 Reporter: Markus Jelsma Assignee: Markus Jelsma Priority: Minor Fix For: 1.5 Attachments: NUTCH-1305-1.5-1.patch Both filters should handle entries with trailing whitespace. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: NutchGora release, and Nutch 1.x trunk release
+1 1.5 has, again, many fixes and improvements, just as 1.4 had over 1.3. But i'd like to integrate Tika 1.1 after its pending release. Cheers On Thursday 08 March 2012 15:38:15 Mattmann, Chris A (388J) wrote: Hey Guys, I've got some cycles this weekend -- anyone up for a 1.5 release off trunk (stable), and a NutchGora branch release? I suggested this before [1] regarding NutchGora. I'm inclined to say let's do the following: 1. NutchGora: apache-nutch-2.0 - release 2.x series based on this branch 2. Nutch: apache-nutch-1.x - stable trunk branch Then, when the time comes, we can try and create a: 3. Nutch: apache-nutch-3.x - merge of 1.x and 2.x feature branches Would this make sense? Anyways we don't have to decide anything now that we can't undo later, but are folks OK with me doing an RC for NutchGora and for 1.x this weekend? Cheers, Chris [1] http://s.apache.org/GD2 ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -- Markus Jelsma - CTO - Openindex
Re: NutchGora release, and Nutch 1.x trunk release
Yeah I agree Chris Markus. On the Nutchgora note, I would like to see Gora 0.2. released before hand, as we have a blocking issue NUTCH-1205 with Ivy retrieving alien Gora 0.2-SNAPSHOT dependencies from repository.apache.org. We should be able to overcome this issue by releasing Gora 0.2 to maven central then just pulling those dependencies with Ivy in Nutchgora rather than messing about with chain/multiple/snapshot resolvers in the Ivy configuration. My 2 cents On Thu, Mar 8, 2012 at 3:03 PM, Markus Jelsma markus.jel...@openindex.iowrote: +1 1.5 has, again, many fixes and improvements, just as 1.4 had over 1.3. But i'd like to integrate Tika 1.1 after its pending release. Cheers On Thursday 08 March 2012 15:38:15 Mattmann, Chris A (388J) wrote: Hey Guys, I've got some cycles this weekend -- anyone up for a 1.5 release off trunk (stable), and a NutchGora branch release? I suggested this before [1] regarding NutchGora. I'm inclined to say let's do the following: 1. NutchGora: apache-nutch-2.0 - release 2.x series based on this branch 2. Nutch: apache-nutch-1.x - stable trunk branch Then, when the time comes, we can try and create a: 3. Nutch: apache-nutch-3.x - merge of 1.x and 2.x feature branches Would this make sense? Anyways we don't have to decide anything now that we can't undo later, but are folks OK with me doing an RC for NutchGora and for 1.x this weekend? Cheers, Chris [1] http://s.apache.org/GD2 ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -- Markus Jelsma - CTO - Openindex -- *Lewis*
[jira] [Created] (NUTCH-1307) Improve formatting of ant targets for clearer project help
Improve formatting of ant targets for clearer project help -- Key: NUTCH-1307 URL: https://issues.apache.org/jira/browse/NUTCH-1307 Project: Nutch Issue Type: New Feature Components: build Affects Versions: nutchgora, 1.5 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Priority: Trivial Fix For: nutchgora, 1.5 This is a trivial formatting issue I will submit a patch shortly and fix it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: NutchGora release, and Nutch 1.x trunk release
Hey Guys, OK, sounds good. Looks like we need to wait for the Tika 1.1 release (seems to be going well so far), and then try and push Gora 0.2 (which I know Lewis is pushing, and which I'm happy to RM once we're ready there). So, maybe I'll shoot for next weekend or the weekend after to push Nutch 1.5 and 2.0 RCs. Cheers, Chris On Mar 8, 2012, at 7:23 AM, Lewis John Mcgibbney wrote: Yeah I agree Chris Markus. On the Nutchgora note, I would like to see Gora 0.2. released before hand, as we have a blocking issue NUTCH-1205 with Ivy retrieving alien Gora 0.2-SNAPSHOT dependencies from repository.apache.org. We should be able to overcome this issue by releasing Gora 0.2 to maven central then just pulling those dependencies with Ivy in Nutchgora rather than messing about with chain/multiple/snapshot resolvers in the Ivy configuration. My 2 cents On Thu, Mar 8, 2012 at 3:03 PM, Markus Jelsma markus.jel...@openindex.io wrote: +1 1.5 has, again, many fixes and improvements, just as 1.4 had over 1.3. But i'd like to integrate Tika 1.1 after its pending release. Cheers On Thursday 08 March 2012 15:38:15 Mattmann, Chris A (388J) wrote: Hey Guys, I've got some cycles this weekend -- anyone up for a 1.5 release off trunk (stable), and a NutchGora branch release? I suggested this before [1] regarding NutchGora. I'm inclined to say let's do the following: 1. NutchGora: apache-nutch-2.0 - release 2.x series based on this branch 2. Nutch: apache-nutch-1.x - stable trunk branch Then, when the time comes, we can try and create a: 3. Nutch: apache-nutch-3.x - merge of 1.x and 2.x feature branches Would this make sense? Anyways we don't have to decide anything now that we can't undo later, but are folks OK with me doing an RC for NutchGora and for 1.x this weekend? Cheers, Chris [1] http://s.apache.org/GD2 ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -- Markus Jelsma - CTO - Openindex -- Lewis ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: NutchGora release, and Nutch 1.x trunk release
+1 for pushing Gora 0.2 prior to the Nutchgora 2.0 RC. For Nutchgora, besides Nutch-1205 the only thing I'm a bit concerned about is Nutch-1253. This seems like a blocker to me, and I think it only affects Nutch trunk. (Though I'm not sure). On Thu, Mar 8, 2012 at 4:32 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Guys, OK, sounds good. Looks like we need to wait for the Tika 1.1 release (seems to be going well so far), and then try and push Gora 0.2 (which I know Lewis is pushing, and which I'm happy to RM once we're ready there). So, maybe I'll shoot for next weekend or the weekend after to push Nutch 1.5 and 2.0 RCs. Cheers, Chris On Mar 8, 2012, at 7:23 AM, Lewis John Mcgibbney wrote: Yeah I agree Chris Markus. On the Nutchgora note, I would like to see Gora 0.2. released before hand, as we have a blocking issue NUTCH-1205 with Ivy retrieving alien Gora 0.2-SNAPSHOT dependencies from repository.apache.org. We should be able to overcome this issue by releasing Gora 0.2 to maven central then just pulling those dependencies with Ivy in Nutchgora rather than messing about with chain/multiple/snapshot resolvers in the Ivy configuration. My 2 cents On Thu, Mar 8, 2012 at 3:03 PM, Markus Jelsma markus.jel...@openindex.io wrote: +1 1.5 has, again, many fixes and improvements, just as 1.4 had over 1.3. But i'd like to integrate Tika 1.1 after its pending release. Cheers On Thursday 08 March 2012 15:38:15 Mattmann, Chris A (388J) wrote: Hey Guys, I've got some cycles this weekend -- anyone up for a 1.5 release off trunk (stable), and a NutchGora branch release? I suggested this before [1] regarding NutchGora. I'm inclined to say let's do the following: 1. NutchGora: apache-nutch-2.0 - release 2.x series based on this branch 2. Nutch: apache-nutch-1.x - stable trunk branch Then, when the time comes, we can try and create a: 3. Nutch: apache-nutch-3.x - merge of 1.x and 2.x feature branches Would this make sense? Anyways we don't have to decide anything now that we can't undo later, but are folks OK with me doing an RC for NutchGora and for 1.x this weekend? Cheers, Chris [1] http://s.apache.org/GD2 ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -- Markus Jelsma - CTO - Openindex -- Lewis ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
[jira] [Updated] (NUTCH-1307) Improve formatting of ant targets for clearer project help
[ https://issues.apache.org/jira/browse/NUTCH-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1307: Attachment: NUTCH-1307-trunk.patch NUTCH-1307-nutchgora.patch trivial patches When running {code} $ant -projecthelp {code} (from $NUTCH_HOME) this gives nicer output. Improve formatting of ant targets for clearer project help -- Key: NUTCH-1307 URL: https://issues.apache.org/jira/browse/NUTCH-1307 Project: Nutch Issue Type: New Feature Components: build Affects Versions: nutchgora, 1.5 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Priority: Trivial Fix For: nutchgora, 1.5 Attachments: NUTCH-1307-nutchgora.patch, NUTCH-1307-trunk.patch This is a trivial formatting issue I will submit a patch shortly and fix it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (NUTCH-1307) Improve formatting of ant targets for clearer project help
[ https://issues.apache.org/jira/browse/NUTCH-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-1307. --- Improve formatting of ant targets for clearer project help -- Key: NUTCH-1307 URL: https://issues.apache.org/jira/browse/NUTCH-1307 Project: Nutch Issue Type: New Feature Components: build Affects Versions: nutchgora, 1.5 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Priority: Trivial Fix For: nutchgora, 1.5 Attachments: NUTCH-1307-nutchgora.patch, NUTCH-1307-trunk.patch This is a trivial formatting issue I will submit a patch shortly and fix it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NUTCH-1307) Improve formatting of ant targets for clearer project help
[ https://issues.apache.org/jira/browse/NUTCH-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1307. - Resolution: Fixed Committed @ revision 1298437 in Nutchgora branch Committed @ revision 1298438 in trunk Improve formatting of ant targets for clearer project help -- Key: NUTCH-1307 URL: https://issues.apache.org/jira/browse/NUTCH-1307 Project: Nutch Issue Type: New Feature Components: build Affects Versions: nutchgora, 1.5 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Priority: Trivial Fix For: nutchgora, 1.5 Attachments: NUTCH-1307-nutchgora.patch, NUTCH-1307-trunk.patch This is a trivial formatting issue I will submit a patch shortly and fix it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NUTCH-1304) GeneratorMapper.java dosen't return when skipping and already generated mark
[ https://issues.apache.org/jira/browse/NUTCH-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1304. - Resolution: Fixed Committed @ revision 1298444 in Nutchgora branch Thank you again Dan for the patch. GeneratorMapper.java dosen't return when skipping and already generated mark Key: NUTCH-1304 URL: https://issues.apache.org/jira/browse/NUTCH-1304 Project: Nutch Issue Type: Bug Components: generator Affects Versions: nutchgora Reporter: Dan Rosher Priority: Minor Fix For: nutchgora Attachments: NUTCH-1304.patch GeneratorMapper.java dosen't return when skipping and already generated mark -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1304) GeneratorMapper.java dosen't return when skipping and already generated mark
[ https://issues.apache.org/jira/browse/NUTCH-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225270#comment-13225270 ] Lewis John McGibbney commented on NUTCH-1304: - Please close this one off when you have time Dan you. GeneratorMapper.java dosen't return when skipping and already generated mark Key: NUTCH-1304 URL: https://issues.apache.org/jira/browse/NUTCH-1304 Project: Nutch Issue Type: Bug Components: generator Affects Versions: nutchgora Reporter: Dan Rosher Priority: Minor Fix For: nutchgora Attachments: NUTCH-1304.patch GeneratorMapper.java dosen't return when skipping and already generated mark -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1307) Improve formatting of ant targets for clearer project help
[ https://issues.apache.org/jira/browse/NUTCH-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225299#comment-13225299 ] Hudson commented on NUTCH-1307: --- Integrated in nutch-trunk-maven #188 (See [https://builds.apache.org/job/nutch-trunk-maven/188/]) commit to address NUTCH-1307 and update to CHANGES.txt (Revision 1298438) Result = SUCCESS lewismc : Files : * /nutch/trunk/CHANGES.txt * /nutch/trunk/build.xml Improve formatting of ant targets for clearer project help -- Key: NUTCH-1307 URL: https://issues.apache.org/jira/browse/NUTCH-1307 Project: Nutch Issue Type: New Feature Components: build Affects Versions: nutchgora, 1.5 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Priority: Trivial Fix For: nutchgora, 1.5 Attachments: NUTCH-1307-nutchgora.patch, NUTCH-1307-trunk.patch This is a trivial formatting issue I will submit a patch shortly and fix it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-728) Improve nutch release packaging
[ https://issues.apache.org/jira/browse/NUTCH-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225389#comment-13225389 ] Lewis John McGibbney commented on NUTCH-728: Looking at this, then at what we have available on our mirrors, I don't really see the need at the moment (unless it would make release process easier) of including this code. Chris already provides us with src.tar.gz with every release? I suppose this ones really down to release manager's opinion. Improve nutch release packaging --- Key: NUTCH-728 URL: https://issues.apache.org/jira/browse/NUTCH-728 Project: Nutch Issue Type: Improvement Reporter: Sami Siren Attachments: NUTCH-728-nutchgora.patch, NUTCH-728-v2.patch, NUTCH-728.patch see the discussion from http://www.lucidimagination.com/search/document/aa4d52cbd9af026a/discuss_contents_of_nutch_release_artifact -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-882) Design a Host table in GORA
[ https://issues.apache.org/jira/browse/NUTCH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225486#comment-13225486 ] Mathijs Homminga commented on NUTCH-882: Status: I have updated the patches to match the current HEAD (nutchgora). Also added a HostDbUpdateJob which populates the host db from an existing web table (needed to fix an issue in GORA for this: https://issues.apache.org/jira/browse/GORA-105). I'm currently finishing some work on the NutchContext and will post the patch somewhere next week. Design a Host table in GORA --- Key: NUTCH-882 URL: https://issues.apache.org/jira/browse/NUTCH-882 Project: Nutch Issue Type: New Feature Affects Versions: nutchgora Reporter: Julien Nioche Assignee: Julien Nioche Fix For: nutchgora Attachments: NUTCH-882-v1.patch, hostdb.patch Having a separate GORA table for storing information about hosts (and domains?) would be very useful for : * customising the behaviour of the fetching on a host basis e.g. number of threads, min time between threads etc... * storing stats * keeping metadata and possibly propagate them to the webpages * keeping a copy of the robots.txt and possibly use that later to filter the webtable * store sitemaps files and update the webtable accordingly I'll try to come up with a GORA schema for such a host table but any comments are of course already welcome -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1278) Fetch Improvement in threads per host
[ https://issues.apache.org/jira/browse/NUTCH-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225545#comment-13225545 ] Ferdy Galema commented on NUTCH-1278: - I noticed you used the diff command this time, but failed to include the new file in patch. When you want the diff command to include new files, you simply add them first to svn. In the case of HostsUtil, this would be: svn add src/java/org/apache/nutch/util/HostsUtil.java When you execute the diff command afterwards, you will notice that it included the new file. Now you can simply upload this patch file only instead of a zip. Good luck. Fetch Improvement in threads per host - Key: NUTCH-1278 URL: https://issues.apache.org/jira/browse/NUTCH-1278 Project: Nutch Issue Type: New Feature Components: fetcher Affects Versions: 1.4 Reporter: behnam nikbakht Attachments: NUTCH-1278-v.2.zip, NUTCH-1278.zip the value of maxThreads is equal to fetcher.threads.per.host and is constant for every host there is a possibility with using of dynamic values for every host that influeced with number of blocked requests. this means that if number of blocked requests for one host increased, then we most decrease this value and increase http.timeout -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NUTCH-841) Nutch 2.0 webapp
[ https://issues.apache.org/jira/browse/NUTCH-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-841: --- Priority: Major (was: Blocker) Nutch 2.0 webapp Key: NUTCH-841 URL: https://issues.apache.org/jira/browse/NUTCH-841 Project: Nutch Issue Type: Improvement Components: web gui Environment: Nutch 2.0 Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: nutchgora In light of the conversation on NUTCH-837, we are removing the old Nutch webapp and will replace it with a 2.0 one that works with GORA + Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-841) Nutch 2.0 webapp
[ https://issues.apache.org/jira/browse/NUTCH-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225551#comment-13225551 ] Chris A. Mattmann commented on NUTCH-841: - Yep not a blocker! Nutch 2.0 webapp Key: NUTCH-841 URL: https://issues.apache.org/jira/browse/NUTCH-841 Project: Nutch Issue Type: Improvement Components: web gui Environment: Nutch 2.0 Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: nutchgora In light of the conversation on NUTCH-837, we are removing the old Nutch webapp and will replace it with a 2.0 one that works with GORA + Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira