Unsubscribe me please
Can someone please remove me from the mailing update list? Your help much appreciated. Thanks.
[jira] [Commented] (NUTCH-1323) AjaxNormalizer
[ https://issues.apache.org/jira/browse/NUTCH-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13274529#comment-13274529 ] behnam nikbakht commented on NUTCH-1323: yes , it's works correctly. thank you AjaxNormalizer -- Key: NUTCH-1323 URL: https://issues.apache.org/jira/browse/NUTCH-1323 Project: Nutch Issue Type: New Feature Reporter: Markus Jelsma Assignee: Markus Jelsma Fix For: 1.6 Attachments: NUTCH-1323-1.6-1.patch A two-way normalizer for Nutch able to deal with AJAX URL's, converting them to _escaped_fragment_ URL's and back to an AJAX URL. https://developers.google.com/webmasters/ajax-crawling/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1367) Port ParserChecker to Nutchgora
[ https://issues.apache.org/jira/browse/NUTCH-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13274590#comment-13274590 ] Ferdy Galema commented on NUTCH-1367: - Hey Lewis, This tool is already present in Nutchgora. Port ParserChecker to Nutchgora --- Key: NUTCH-1367 URL: https://issues.apache.org/jira/browse/NUTCH-1367 Project: Nutch Issue Type: New Feature Components: parser Affects Versions: nutchgora Reporter: Lewis John McGibbney Fix For: 2.1 This is such a great tool. It has come in handy so many times I would go blue in the face if I had to try and count. e.g. for (int i = 0; i infinity; i++) I think you get the idea. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (NUTCH-1366) speed up indexing by eliminating the indexreducer
[ https://issues.apache.org/jira/browse/NUTCH-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema closed NUTCH-1366. --- Resolution: Fixed committed speed up indexing by eliminating the indexreducer - Key: NUTCH-1366 URL: https://issues.apache.org/jira/browse/NUTCH-1366 Project: Nutch Issue Type: Improvement Components: indexer Reporter: Ferdy Galema Fix For: nutchgora Attachments: NUTCH-1366.patch Currently the indexer in Nutchgora consists of both mappers and reduces. But the reduce code does not actually iterate over any (grouped/sorted) values. It simply indexes individual key/value (String/Webpage) pairs. Therefore by moving this indexing code to the mapper we can eliminate the reduce step therefore making the indexing job much faster. (No more unnecessary spilling to disk/network and no cpu wasted to sorting). Note this is not (directly) applicable to trunk because trunk uses a quite different approach. Different types of input are combined to a single value in the reducer. Although I think it is possible to implement a similar optimization I am not sure how to do this. So if anyone wants this for trunk too feel free to implement a similar patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1366) speed up indexing by eliminating the indexreducer
Hudson commented on NUTCH-1366 speed up indexing by eliminating the indexreducer Integrated in Nutch-nutchgora #253 (See https://builds.apache.org/job/Nutch-nutchgora/253/) NUTCH-1366 speed up indexing by eliminating the indexreducer (Revision 1338217) Result = SUCCESS ferdy : Files : /nutch/branches/nutchgora/CHANGES.txt /nutch/branches/nutchgora/src/java/org/apache/nutch/indexer/IndexUtil.java /nutch/branches/nutchgora/src/java/org/apache/nutch/indexer/IndexerJob.java /nutch/branches/nutchgora/src/java/org/apache/nutch/indexer/IndexerReducer.java This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators. For more information on JIRA, see: http://www.atlassian.com/software/jira