[jira] [Commented] (NUTCH-1946) Upgrade to Gora 0.6.1
[ https://issues.apache.org/jira/browse/NUTCH-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350199#comment-14350199 ] Lewis John McGibbney commented on NUTCH-1946: - My current understanding is that Gora requires advanced shim tests. Right now I am not able to run any Hadoop based application off of the Hadoop shims layers. I can not run Hadoop 2 apps. This defeats the purpose of shims for Gora -- *Lewis* Upgrade to Gora 0.6.1 - Key: NUTCH-1946 URL: https://issues.apache.org/jira/browse/NUTCH-1946 Project: Nutch Issue Type: Improvement Components: storage Affects Versions: 2.3.1 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 2.3.1 Attachments: NUTCH-1946.patch, NUTCH-1946_Gora_fixes.patch, NUTCH-1946v2.patch, NUTCH-1946v3.patch Apache Gora was released recently. We should upgrade before pushing Nutch 2.3.1 as it will come in very handy for the new Docker containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Fwd: Google Summer of Code 2015 Mentor Registration
Nutch PMC, Please acknowledge my request to become a mentor for Google Summer of Code 2015 projects for Apache Nutch. My Melange username is lewismc. -- Forwarded message -- From: Ulrich Stärk u...@apache.org Date: Fri, Mar 6, 2015 at 11:32 AM Subject: Google Summer of Code 2015 Mentor Registration To: ment...@community.apache.org Dear PMCs, I'm happy to announce that the ASF has made it onto the list of 137 accepted organizations for Google Summer of Code 2015! [1,2] It is now time for the mentors to sign up, so please pass this email on to your community and podlings. If you aren’t already subscribed to ment...@community.apache.org you should do so now else you might miss important information. Mentor signup requires two steps: mentor signup in Melange and PMC acknowledgement. If you want to mentor a project in this year's SoC you will have to 1. Be an Apache committer. 2. Register with Melange and set up a profile [3]. 3. Add your username (formerly known as link_id) to [4]. This is NOT your email address but your Melange username. You can find it at the top of any page once you are logged in. 4. Request an acknowledgement from the PMC for which you want to mentor projects. Use the below template and do not forget to copy ment...@community.apache.org. 5. Once a PMC member acknowledges the request to mentor, and only then, go to [5] and send a connection request. PMCs, read carefully please. We request that each mentor is acknowledged by a PMC member. This is to ensure the mentor is in good standing with the community. When you receive a request for acknowledgement, please ACK it and cc ment...@community.apache.org Lastly, it is not yet too late to record your ideas in Jira (see my previous emails for details). Students will now begin to explore ideas so if you haven’t already done so, record your ideas immediately! Cheers, Uli mentor request email template: to: private@project.apache.org cc: ment...@community.apache.org subject: GSoC 2015 mentor request for mentor name project PMC, please acknowledge my request to become a mentor for Google Summer of Code 2015 projects for Apache project. My Melange username is username. custom content [1] http://www.google-melange.com/gsoc/org/list/public/google/gsoc2015 [2] http://www.google-melange.com/gsoc/org2/google/gsoc2015/apache [3] http://www.google-melange.com/gsoc/homepage/google/gsoc2015 [4] https://svn.apache.org/repos/private/committers/GsocLinkId.txt [5] http://www.google-melange.com/gsoc/connection/start/user/google/gsoc2015/apache -- *Lewis*
[GitHub] nutch pull request: Fix for NUTCH-1954: FilenameTooLong error appe...
GitHub user chrismattmann opened a pull request: https://github.com/apache/nutch/pull/11 Fix for NUTCH-1954: FilenameTooLong error appears in CommonCrawlDumper You can merge this pull request into a Git repository by running: $ git pull https://github.com/chrismattmann/nutch NUTCH-1954 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nutch/pull/11.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11 commit eb6684decec6c767db2339288ed846022471e56f Author: Chris Mattmann mattm...@apache.org Date: 2015-03-07T04:44:12Z Fix for NUTCH-1954: FilenameTooLong error appears in CommonCrawlDumper --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper
[ https://issues.apache.org/jira/browse/NUTCH-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351396#comment-14351396 ] ASF GitHub Bot commented on NUTCH-1954: --- GitHub user chrismattmann opened a pull request: https://github.com/apache/nutch/pull/10 Fix for NUTCH-1954: FilenameTooLong error appears in CommonCrawlDumper You can merge this pull request into a Git repository by running: $ git pull https://github.com/chrismattmann/nutch HEAD Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nutch/pull/10.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10 commit 70047ee4b0b09b2bbf344a3b96f8ab043c98678f Author: Chris Mattmann mattm...@apache.org Date: 2014-11-17T21:35:15Z Fix for NUTCH-1890: add copyfields and default text field catch all to schema.xml commit f9102d636051347aacb0706d408ef161bbcd29eb Author: Chris Mattmann mattm...@apache.org Date: 2014-11-18T01:54:32Z Merge https://github.com/apache/nutch into merge-nutch-master-nov17 FilenameTooLong error appears in CommonCrawlDumper -- Key: NUTCH-1954 URL: https://issues.apache.org/jira/browse/NUTCH-1954 Project: Nutch Issue Type: Bug Components: commoncrawl Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.10 The issue from NUTCH-1950 is appearing in the CommonCrawlDumper tool as well (FilenameTooLong). I'm going to apply that fix here as well (based on MD5/message digest). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper
[ https://issues.apache.org/jira/browse/NUTCH-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351397#comment-14351397 ] ASF GitHub Bot commented on NUTCH-1954: --- GitHub user chrismattmann opened a pull request: https://github.com/apache/nutch/pull/11 Fix for NUTCH-1954: FilenameTooLong error appears in CommonCrawlDumper You can merge this pull request into a Git repository by running: $ git pull https://github.com/chrismattmann/nutch NUTCH-1954 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nutch/pull/11.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11 commit eb6684decec6c767db2339288ed846022471e56f Author: Chris Mattmann mattm...@apache.org Date: 2015-03-07T04:44:12Z Fix for NUTCH-1954: FilenameTooLong error appears in CommonCrawlDumper FilenameTooLong error appears in CommonCrawlDumper -- Key: NUTCH-1954 URL: https://issues.apache.org/jira/browse/NUTCH-1954 Project: Nutch Issue Type: Bug Components: commoncrawl Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.10 The issue from NUTCH-1950 is appearing in the CommonCrawlDumper tool as well (FilenameTooLong). I'm going to apply that fix here as well (based on MD5/message digest). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] nutch pull request: Fix for NUTCH-1954: FilenameTooLong error appe...
GitHub user chrismattmann opened a pull request: https://github.com/apache/nutch/pull/10 Fix for NUTCH-1954: FilenameTooLong error appears in CommonCrawlDumper You can merge this pull request into a Git repository by running: $ git pull https://github.com/chrismattmann/nutch HEAD Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nutch/pull/10.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10 commit 70047ee4b0b09b2bbf344a3b96f8ab043c98678f Author: Chris Mattmann mattm...@apache.org Date: 2014-11-17T21:35:15Z Fix for NUTCH-1890: add copyfields and default text field catch all to schema.xml commit f9102d636051347aacb0706d408ef161bbcd29eb Author: Chris Mattmann mattm...@apache.org Date: 2014-11-18T01:54:32Z Merge https://github.com/apache/nutch into merge-nutch-master-nov17 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper
[ https://issues.apache.org/jira/browse/NUTCH-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351431#comment-14351431 ] Hudson commented on NUTCH-1954: --- SUCCESS: Integrated in Nutch-trunk #3005 (See [https://builds.apache.org/job/Nutch-trunk/3005/]) Fix for NUTCH-1954: FilenameTooLong error appears in CommonCrawlDumper. This closes #11 (mattmann: http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1664794) * /nutch/trunk/CHANGES.txt Fix for NUTCH-1954: FilenameTooLong error appears in CommonCrawlDumper. This closes #10 #11 (mattmann: http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1664793) * /nutch/trunk/CHANGES.txt Fix for NUTCH-1954: FilenameTooLong error appears in CommonCrawlDumper. (mattmann: http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1664792) * /nutch/trunk/CHANGES.txt * /nutch/trunk/src/java/org/apache/nutch/tools/CommonCrawlDataDumper.java FilenameTooLong error appears in CommonCrawlDumper -- Key: NUTCH-1954 URL: https://issues.apache.org/jira/browse/NUTCH-1954 Project: Nutch Issue Type: Bug Components: commoncrawl Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.10 The issue from NUTCH-1950 is appearing in the CommonCrawlDumper tool as well (FilenameTooLong). I'm going to apply that fix here as well (based on MD5/message digest). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper
[ https://issues.apache.org/jira/browse/NUTCH-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351402#comment-14351402 ] ASF GitHub Bot commented on NUTCH-1954: --- Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/11 FilenameTooLong error appears in CommonCrawlDumper -- Key: NUTCH-1954 URL: https://issues.apache.org/jira/browse/NUTCH-1954 Project: Nutch Issue Type: Bug Components: commoncrawl Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.10 The issue from NUTCH-1950 is appearing in the CommonCrawlDumper tool as well (FilenameTooLong). I'm going to apply that fix here as well (based on MD5/message digest). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] nutch pull request: Fix for NUTCH-1954: FilenameTooLong error appe...
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/11 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper
[ https://issues.apache.org/jira/browse/NUTCH-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351400#comment-14351400 ] ASF GitHub Bot commented on NUTCH-1954: --- Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/10 FilenameTooLong error appears in CommonCrawlDumper -- Key: NUTCH-1954 URL: https://issues.apache.org/jira/browse/NUTCH-1954 Project: Nutch Issue Type: Bug Components: commoncrawl Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.10 The issue from NUTCH-1950 is appearing in the CommonCrawlDumper tool as well (FilenameTooLong). I'm going to apply that fix here as well (based on MD5/message digest). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] nutch pull request: Fix for NUTCH-1954: FilenameTooLong error appe...
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/10 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Work started] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper
[ https://issues.apache.org/jira/browse/NUTCH-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1954 started by Chris A. Mattmann. FilenameTooLong error appears in CommonCrawlDumper -- Key: NUTCH-1954 URL: https://issues.apache.org/jira/browse/NUTCH-1954 Project: Nutch Issue Type: Bug Components: commoncrawl Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.10 The issue from NUTCH-1950 is appearing in the CommonCrawlDumper tool as well (FilenameTooLong). I'm going to apply that fix here as well (based on MD5/message digest). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper
[ https://issues.apache.org/jira/browse/NUTCH-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351387#comment-14351387 ] Chris A. Mattmann commented on NUTCH-1954: -- Here's the error I got running it on the NSF ACADIS/Polar data set from my class: {noformat} [mattmann@nsfpolardata local]$ ./bin/nutch commoncrawldump -outputDir out -segment /home/mattmann/polar-data/apache-nutch-1.9/bin/AcadisCrawl2/segments/ java.io.FileNotFoundException: out/redirect.html?link=http%3a%2f%2fdataportal.ucar.edu%2fmetadata%2fcadis%2fTerrestrial_Ecosystems%2fArctic_Ecosystem_Changes%2fBarrow_Atqasuk_ITEX_Detailed_Microclimate%2f1998-20XX%2520Barrow%2520Atqasuk%2520ITEX%2520Detailed%2520Microclimate%2520metadata.doc (File name too long) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:221) at java.io.FileOutputStream.init(FileOutputStream.java:171) at org.apache.nutch.tools.CommonCrawlDataDumper.dump(CommonCrawlDataDumper.java:372) at org.apache.nutch.tools.CommonCrawlDataDumper.main(CommonCrawlDataDumper.java:235) {noformat} FilenameTooLong error appears in CommonCrawlDumper -- Key: NUTCH-1954 URL: https://issues.apache.org/jira/browse/NUTCH-1954 Project: Nutch Issue Type: Bug Components: commoncrawl Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.10 The issue from NUTCH-1950 is appearing in the CommonCrawlDumper tool as well (FilenameTooLong). I'm going to apply that fix here as well (based on MD5/message digest). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper
Chris A. Mattmann created NUTCH-1954: Summary: FilenameTooLong error appears in CommonCrawlDumper Key: NUTCH-1954 URL: https://issues.apache.org/jira/browse/NUTCH-1954 Project: Nutch Issue Type: Bug Components: commoncrawl Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.10 The issue from NUTCH-1950 is appearing in the CommonCrawlDumper tool as well (FilenameTooLong). I'm going to apply that fix here as well (based on MD5/message digest). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper
[ https://issues.apache.org/jira/browse/NUTCH-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-1954. -- Resolution: Fixed - fixed in r1664792, r1664793 and r1664794. Had to tickle Git to make it close #10 and #11, sorry for the extra commits! Tested locally in NSF polar ville, works great. FilenameTooLong error appears in CommonCrawlDumper -- Key: NUTCH-1954 URL: https://issues.apache.org/jira/browse/NUTCH-1954 Project: Nutch Issue Type: Bug Components: commoncrawl Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 1.10 The issue from NUTCH-1950 is appearing in the CommonCrawlDumper tool as well (FilenameTooLong). I'm going to apply that fix here as well (based on MD5/message digest). -- This message was sent by Atlassian JIRA (v6.3.4#6332)