[jira] [Commented] (NUTCH-1946) Upgrade to Gora 0.6.1

2015-03-06 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350199#comment-14350199
 ] 

Lewis John McGibbney commented on NUTCH-1946:
-

My current understanding is that Gora requires advanced shim tests.
Right now I am not able to run any Hadoop based application off of the
Hadoop shims layers.
I can not run Hadoop 2 apps.
This defeats the purpose of shims for Gora




-- 
*Lewis*


 Upgrade to Gora 0.6.1
 -

 Key: NUTCH-1946
 URL: https://issues.apache.org/jira/browse/NUTCH-1946
 Project: Nutch
  Issue Type: Improvement
  Components: storage
Affects Versions: 2.3.1
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.3.1

 Attachments: NUTCH-1946.patch, NUTCH-1946_Gora_fixes.patch, 
 NUTCH-1946v2.patch, NUTCH-1946v3.patch


 Apache Gora was released recently.
 We should upgrade before pushing Nutch 2.3.1 as it will come in very handy 
 for the new Docker containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Fwd: Google Summer of Code 2015 Mentor Registration

2015-03-06 Thread Lewis John Mcgibbney
Nutch PMC,
Please acknowledge my request to become a mentor for Google Summer of Code
2015 projects for Apache
Nutch.

My Melange username is lewismc.


-- Forwarded message --
From: Ulrich Stärk u...@apache.org
Date: Fri, Mar 6, 2015 at 11:32 AM
Subject: Google Summer of Code 2015 Mentor Registration
To: ment...@community.apache.org


Dear PMCs,

I'm happy to announce that the ASF has made it onto the list of 137
accepted organizations for
Google Summer of Code 2015! [1,2]

It is now time for the mentors to sign up, so please pass this email on to
your community and
podlings. If you aren’t already subscribed to ment...@community.apache.org
you should do so now else
you might miss important information.

Mentor signup requires two steps: mentor signup in Melange and PMC
acknowledgement.

If you want to mentor a project in this year's SoC you will have to

1. Be an Apache committer.
2. Register with Melange and set up a profile [3].
3. Add your username (formerly known as link_id) to [4]. This is NOT your
email address but your
Melange username. You can find it at the top of any page once you are
logged in.
4. Request an acknowledgement from the PMC for which you want to mentor
projects. Use the below
template and do not forget to copy ment...@community.apache.org.
5. Once a PMC member acknowledges the request to mentor, and only then, go
to [5] and send a
connection request.

PMCs, read carefully please.

We request that each mentor is acknowledged by a PMC member. This is to
ensure the mentor is in good
standing with the community. When you receive a request for
acknowledgement, please ACK it and cc
ment...@community.apache.org

Lastly, it is not yet too late to record your ideas in Jira (see my
previous emails for details).
Students will now begin to explore ideas so if you haven’t already done so,
record your ideas
immediately!

Cheers,

Uli

mentor request email template:

to: private@project.apache.org
cc: ment...@community.apache.org
subject: GSoC 2015 mentor request for mentor name

project PMC,

please acknowledge my request to become a mentor for Google Summer of Code
2015 projects for Apache
project.

My Melange username is username.

custom content



[1] http://www.google-melange.com/gsoc/org/list/public/google/gsoc2015
[2] http://www.google-melange.com/gsoc/org2/google/gsoc2015/apache
[3] http://www.google-melange.com/gsoc/homepage/google/gsoc2015
[4] https://svn.apache.org/repos/private/committers/GsocLinkId.txt
[5]
http://www.google-melange.com/gsoc/connection/start/user/google/gsoc2015/apache



-- 
*Lewis*


[GitHub] nutch pull request: Fix for NUTCH-1954: FilenameTooLong error appe...

2015-03-06 Thread chrismattmann
GitHub user chrismattmann opened a pull request:

https://github.com/apache/nutch/pull/11

Fix for NUTCH-1954: FilenameTooLong error appears in CommonCrawlDumper



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chrismattmann/nutch NUTCH-1954

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/11.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11


commit eb6684decec6c767db2339288ed846022471e56f
Author: Chris Mattmann mattm...@apache.org
Date:   2015-03-07T04:44:12Z

Fix for NUTCH-1954: FilenameTooLong error appears in CommonCrawlDumper




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper

2015-03-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351396#comment-14351396
 ] 

ASF GitHub Bot commented on NUTCH-1954:
---

GitHub user chrismattmann opened a pull request:

https://github.com/apache/nutch/pull/10

Fix for NUTCH-1954: FilenameTooLong error appears in CommonCrawlDumper



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chrismattmann/nutch HEAD

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/10.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10


commit 70047ee4b0b09b2bbf344a3b96f8ab043c98678f
Author: Chris Mattmann mattm...@apache.org
Date:   2014-11-17T21:35:15Z

Fix for NUTCH-1890: add copyfields and default text field catch all to 
schema.xml

commit f9102d636051347aacb0706d408ef161bbcd29eb
Author: Chris Mattmann mattm...@apache.org
Date:   2014-11-18T01:54:32Z

Merge https://github.com/apache/nutch into merge-nutch-master-nov17




 FilenameTooLong error appears in CommonCrawlDumper
 --

 Key: NUTCH-1954
 URL: https://issues.apache.org/jira/browse/NUTCH-1954
 Project: Nutch
  Issue Type: Bug
  Components: commoncrawl
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.10


 The issue from NUTCH-1950 is appearing in the CommonCrawlDumper tool as well 
 (FilenameTooLong). I'm going to apply that fix here as well (based on 
 MD5/message digest).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper

2015-03-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351397#comment-14351397
 ] 

ASF GitHub Bot commented on NUTCH-1954:
---

GitHub user chrismattmann opened a pull request:

https://github.com/apache/nutch/pull/11

Fix for NUTCH-1954: FilenameTooLong error appears in CommonCrawlDumper



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chrismattmann/nutch NUTCH-1954

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/11.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11


commit eb6684decec6c767db2339288ed846022471e56f
Author: Chris Mattmann mattm...@apache.org
Date:   2015-03-07T04:44:12Z

Fix for NUTCH-1954: FilenameTooLong error appears in CommonCrawlDumper




 FilenameTooLong error appears in CommonCrawlDumper
 --

 Key: NUTCH-1954
 URL: https://issues.apache.org/jira/browse/NUTCH-1954
 Project: Nutch
  Issue Type: Bug
  Components: commoncrawl
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.10


 The issue from NUTCH-1950 is appearing in the CommonCrawlDumper tool as well 
 (FilenameTooLong). I'm going to apply that fix here as well (based on 
 MD5/message digest).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request: Fix for NUTCH-1954: FilenameTooLong error appe...

2015-03-06 Thread chrismattmann
GitHub user chrismattmann opened a pull request:

https://github.com/apache/nutch/pull/10

Fix for NUTCH-1954: FilenameTooLong error appears in CommonCrawlDumper



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chrismattmann/nutch HEAD

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/10.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10


commit 70047ee4b0b09b2bbf344a3b96f8ab043c98678f
Author: Chris Mattmann mattm...@apache.org
Date:   2014-11-17T21:35:15Z

Fix for NUTCH-1890: add copyfields and default text field catch all to 
schema.xml

commit f9102d636051347aacb0706d408ef161bbcd29eb
Author: Chris Mattmann mattm...@apache.org
Date:   2014-11-18T01:54:32Z

Merge https://github.com/apache/nutch into merge-nutch-master-nov17




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper

2015-03-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351431#comment-14351431
 ] 

Hudson commented on NUTCH-1954:
---

SUCCESS: Integrated in Nutch-trunk #3005 (See 
[https://builds.apache.org/job/Nutch-trunk/3005/])
Fix for NUTCH-1954: FilenameTooLong error appears in CommonCrawlDumper. This 
closes #11 (mattmann: 
http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1664794)
* /nutch/trunk/CHANGES.txt
Fix for NUTCH-1954: FilenameTooLong error appears in CommonCrawlDumper. This 
closes #10 #11 (mattmann: 
http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1664793)
* /nutch/trunk/CHANGES.txt
Fix for NUTCH-1954: FilenameTooLong error appears in CommonCrawlDumper. 
(mattmann: http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1664792)
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/tools/CommonCrawlDataDumper.java


 FilenameTooLong error appears in CommonCrawlDumper
 --

 Key: NUTCH-1954
 URL: https://issues.apache.org/jira/browse/NUTCH-1954
 Project: Nutch
  Issue Type: Bug
  Components: commoncrawl
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.10


 The issue from NUTCH-1950 is appearing in the CommonCrawlDumper tool as well 
 (FilenameTooLong). I'm going to apply that fix here as well (based on 
 MD5/message digest).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper

2015-03-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351402#comment-14351402
 ] 

ASF GitHub Bot commented on NUTCH-1954:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/11


 FilenameTooLong error appears in CommonCrawlDumper
 --

 Key: NUTCH-1954
 URL: https://issues.apache.org/jira/browse/NUTCH-1954
 Project: Nutch
  Issue Type: Bug
  Components: commoncrawl
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.10


 The issue from NUTCH-1950 is appearing in the CommonCrawlDumper tool as well 
 (FilenameTooLong). I'm going to apply that fix here as well (based on 
 MD5/message digest).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request: Fix for NUTCH-1954: FilenameTooLong error appe...

2015-03-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/11


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper

2015-03-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351400#comment-14351400
 ] 

ASF GitHub Bot commented on NUTCH-1954:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/10


 FilenameTooLong error appears in CommonCrawlDumper
 --

 Key: NUTCH-1954
 URL: https://issues.apache.org/jira/browse/NUTCH-1954
 Project: Nutch
  Issue Type: Bug
  Components: commoncrawl
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.10


 The issue from NUTCH-1950 is appearing in the CommonCrawlDumper tool as well 
 (FilenameTooLong). I'm going to apply that fix here as well (based on 
 MD5/message digest).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request: Fix for NUTCH-1954: FilenameTooLong error appe...

2015-03-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/10


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Work started] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper

2015-03-06 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-1954 started by Chris A. Mattmann.

 FilenameTooLong error appears in CommonCrawlDumper
 --

 Key: NUTCH-1954
 URL: https://issues.apache.org/jira/browse/NUTCH-1954
 Project: Nutch
  Issue Type: Bug
  Components: commoncrawl
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.10


 The issue from NUTCH-1950 is appearing in the CommonCrawlDumper tool as well 
 (FilenameTooLong). I'm going to apply that fix here as well (based on 
 MD5/message digest).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper

2015-03-06 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351387#comment-14351387
 ] 

Chris A. Mattmann commented on NUTCH-1954:
--

Here's the error I got running it on the NSF ACADIS/Polar data set from my 
class:

{noformat}
[mattmann@nsfpolardata local]$ ./bin/nutch commoncrawldump -outputDir out 
-segment /home/mattmann/polar-data/apache-nutch-1.9/bin/AcadisCrawl2/segments/
java.io.FileNotFoundException: 
out/redirect.html?link=http%3a%2f%2fdataportal.ucar.edu%2fmetadata%2fcadis%2fTerrestrial_Ecosystems%2fArctic_Ecosystem_Changes%2fBarrow_Atqasuk_ITEX_Detailed_Microclimate%2f1998-20XX%2520Barrow%2520Atqasuk%2520ITEX%2520Detailed%2520Microclimate%2520metadata.doc
 (File name too long)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.init(FileOutputStream.java:221)
at java.io.FileOutputStream.init(FileOutputStream.java:171)
at 
org.apache.nutch.tools.CommonCrawlDataDumper.dump(CommonCrawlDataDumper.java:372)
at 
org.apache.nutch.tools.CommonCrawlDataDumper.main(CommonCrawlDataDumper.java:235)
{noformat}


 FilenameTooLong error appears in CommonCrawlDumper
 --

 Key: NUTCH-1954
 URL: https://issues.apache.org/jira/browse/NUTCH-1954
 Project: Nutch
  Issue Type: Bug
  Components: commoncrawl
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.10


 The issue from NUTCH-1950 is appearing in the CommonCrawlDumper tool as well 
 (FilenameTooLong). I'm going to apply that fix here as well (based on 
 MD5/message digest).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper

2015-03-06 Thread Chris A. Mattmann (JIRA)
Chris A. Mattmann created NUTCH-1954:


 Summary: FilenameTooLong error appears in CommonCrawlDumper
 Key: NUTCH-1954
 URL: https://issues.apache.org/jira/browse/NUTCH-1954
 Project: Nutch
  Issue Type: Bug
  Components: commoncrawl
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.10


The issue from NUTCH-1950 is appearing in the CommonCrawlDumper tool as well 
(FilenameTooLong). I'm going to apply that fix here as well (based on 
MD5/message digest).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (NUTCH-1954) FilenameTooLong error appears in CommonCrawlDumper

2015-03-06 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann resolved NUTCH-1954.
--
Resolution: Fixed

- fixed in r1664792, r1664793 and r1664794. Had to tickle Git to make it close 
#10 and #11, sorry for the extra commits! Tested locally in NSF polar ville, 
works great.

 FilenameTooLong error appears in CommonCrawlDumper
 --

 Key: NUTCH-1954
 URL: https://issues.apache.org/jira/browse/NUTCH-1954
 Project: Nutch
  Issue Type: Bug
  Components: commoncrawl
Reporter: Chris A. Mattmann
Assignee: Chris A. Mattmann
 Fix For: 1.10


 The issue from NUTCH-1950 is appearing in the CommonCrawlDumper tool as well 
 (FilenameTooLong). I'm going to apply that fix here as well (based on 
 MD5/message digest).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)