[jira] [Commented] (NUTCH-1927) Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing

2015-04-17 Thread Mattmann, Chris A (388J) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500604#comment-14500604
 ] 

Mattmann, Chris A (388J) commented on NUTCH-1927:
-

+1 please commit! Thanks seb 

Sent from my iPhone



> Create a whitelist of IPs/hostnames to allow skipping of RobotRules parsing
> ---
>
> Key: NUTCH-1927
> URL: https://issues.apache.org/jira/browse/NUTCH-1927
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Reporter: Chris A. Mattmann
>Assignee: Chris A. Mattmann
>  Labels: available, patch
> Fix For: 1.10
>
> Attachments: NUTCH-1927.2015-04-16.patch, 
> NUTCH-1927.2015-04-17.patch, NUTCH-1927.Mattmann.041115.patch.txt, 
> NUTCH-1927.Mattmann.041215.patch.txt, NUTCH-1927.Mattmann.041415.patch.txt, 
> test_NUTCH-1927.2015-04-17.txt
>
>
> Based on discussion on the dev list, to use Nutch for some security research 
> valid use cases (DDoS; DNS and other testing), I am going to create a patch 
> that allows a whitelist:
> {code:xml}
> 
>   robot.rules.whitelist
>   132.54.99.22,hostname.apache.org,foo.jpl.nasa.gov
>   Comma separated list of hostnames or IP addresses to ignore 
> robot rules parsing for.
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1832) Make Nutch work without an indexer

2014-09-04 Thread Mattmann, Chris A (388J) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121477#comment-14121477
 ] 

Mattmann, Chris A (388J) commented on NUTCH-1832:
-

Will reply in more detail soon but will look into enabling plugin back then

Sent from my iPhone



> Make Nutch work without an indexer
> --
>
> Key: NUTCH-1832
> URL: https://issues.apache.org/jira/browse/NUTCH-1832
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.9
>Reporter: Chris A. Mattmann
>Assignee: Chris A. Mattmann
> Fix For: 1.10
>
> Attachments: NUTCH-1832.Mattmann.090314.patch.2.txt, 
> NUTCH-1832.Mattmann.090314.patch.txt
>
>
> Nutch used to work out of the box, without requiring an indexing backend. As 
> of 1.9, that's not the case anymore (it's possible even before that). Thanks 
> to [~markus17] for pointing out that this is due to the indexing-solr plugin 
> being enabled by default. We should disable it by default, so that the 
> regression is removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-04-01 Thread Mattmann, Chris A (388J)
Thanks Kiran!

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: kiran chitturi 
Reply-To: "dev@nutch.apache.org" 
Date: Monday, April 1, 2013 12:30 PM
To: "dev@nutch.apache.org" 
Subject: Re: Important : Bunch of Spam Created under Nutch Wiki!!

>I have posted the information on the JIRA issue page [0]. Let's hope the
>issue will be taken care of soon.
>
>
>
>
>[0] - https://issues.apache.org/jira/browse/INFRA-6081
>
>
>
>On Mon, Apr 1, 2013 at 3:27 PM, Lewis John Mcgibbney
> wrote:
>
>Hi Kiran,
>
>
>On Mon, Apr 1, 2013 at 6:53 AM,  wrote:
>
>
>Re: Important : Bunch of Spam Created under Nutch Wiki!!
>
>22926 by: kiran chitturi
>
>
>
>
>
>Hi guys,
>
>
>Do you know what is the destination for commit mails ? Can I give
>'dev@nutch.apache.org' ?
>
>
>
>
>
>
>No, we should put commit emails to the styatic archive here
>http://mail-archives.apache.org/mod_mbox/nutch-commits/
> 
>
>
>
>Thanks for sorting this out Kiran, we are truly getting hounded with spam
>just now.
>
>Best
>Lewis
>
>
>
>
>
>
>
>
>
>
>
>-- 
>Kiran Chitturi
>
>
> 
>
>
>
>
>
>
>
>



Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-04-01 Thread Mattmann, Chris A (388J)
Hi Kiran,

I would give comm...@nutch.apache.org. Please add ChrisMattmann
as a username.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: kiran chitturi 
Reply-To: "dev@nutch.apache.org" 
Date: Monday, April 1, 2013 6:52 AM
To: "dev@nutch.apache.org" 
Subject: Re: Important : Bunch of Spam Created under Nutch Wiki!!

>Hi guys,
>
>
>Do you know what is the destination for commit mails ? Can I give
>'dev@nutch.apache.org' ?
>
>
>I am planning on giving the below information so far for creating a moin
>wiki [1] 
>
>
>Wiki Name: Nutch
>Usernames: LewisJohnMcgibbney, kiranchitturi, SebastianNagel, JulienNioche
>Destination for Commit mails: dev@nutch.apache.org
>
>
>Please let me know if any of the information is incorrect or needed any
>modifications.
>
>
>[1] - 
>http://wiki.apache.org/general/OurWikiFarm#per_wiki_access_control_-_tight
>en_your_wiki_just_a_little.2C_benefit_just_a_lot
>
>
>
>
>On Sat, Mar 30, 2013 at 4:29 PM, Mattmann, Chris A (388J)
> wrote:
>
>Hey Kiran,
>
>I think here:
>
>http://wiki.apache.org/general/OurWikiFarm#per_wiki_access_control_-_tight
>e
>n_your_wiki_just_a_little.2C_benefit_just_a_lot
>
>
>Cheers,
>Chris
>
>++
>Chris Mattmann, Ph.D.
>Senior Computer Scientist
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 171-266B, Mailstop: 171-246
>Email: chris.a.mattm...@nasa.gov
>WWW:  http://sunset.usc.edu/~mattmann/
>++
>Adjunct Assistant Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++
>
>
>
>
>
>
>-Original Message-
>From: kiran chitturi 
>Reply-To: "dev@nutch.apache.org" 
>
>Date: Saturday, March 30, 2013 12:55 PM
>To: "dev@nutch.apache.org" 
>
>Subject: Re: Important : Bunch of Spam Created under Nutch Wiki!!
>
>
>>Does anyone know what details we need to provide for the new wiki
>>controls ?
>>
>>
>>
>>I have posted a JIRA [0] to control our spam but the infrabot is asking
>>more information [1]
>>
>>[0] - 
>https://issues.apache.org/jira/browse/INFRA-6081
><https://issues.apache.org/jira/browse/INFRA-6081>
>>[1] -  http://www.apache.org/dev/infra-contact#what-we-need-to-know
>>
>>
>>
>>On Thu, Mar 28, 2013 at 3:18 PM, Mattmann, Chris A (388J)
>> wrote:
>>
>>Hi Kiran,
>>
>>Yes, my recommendation:
>>
>>1. Jump into #asfinfra on freeonode, find Joe, or Gavin or Daniel,
>>ask for help. If you don't have IRC, email
>
>>infrastruct...@apache.org <mailto:infrastruct...@apache.org>
>>and/or file a
>>https://issues.apache.org/jira/browse/INFRA
>><https://issues.apache.org/jira/browse/INFRA> ticket
>>
>>2. Request that they enable ASAP ContributorsGroup only acls
>>
>>I know that many Apache wikis (MoinMon) are being attackedŠ
>>
>>Cheers,
>>Chris
>>
>>
>>++
>>Chris Mattmann, Ph.D.
>>Senior Computer Scientist
>>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>Office: 171-266B, Mailstop: 171-246
>>Email: chris.a.mattm...@nasa.gov
>>WWW:  http://sunset.usc.edu/~mattmann/
>>++
>>Adjunct Assistant Professor, Computer Science Department
>>University of Southern California, Los Angeles, CA 90089 USA
>>++
>>
>>
>>
>>
>>-Original Message-
>>From: kiran chitturi 
>>Reply-To: "dev@nutch.apache.org" 
>>Date: Thursday, March 28, 2013 12:15 PM
>>To: "dev@nutch.apache.org" 
>>Subject: Fwd: Important : Bunch of Spam Created under Nutch Wiki!!
>>
>>>Thanks to Ken (check message below) for reporting our insecure wiki. I
>>>have checked it and anyone can

Re: Nutch Wiki

2013-03-30 Thread Mattmann, Chris A (388J)
Seconded!

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Lewis John Mcgibbney 
Reply-To: "dev@nutch.apache.org" 
Date: Saturday, March 30, 2013 3:07 PM
To: "dev@nutch.apache.org" 
Subject: Nutch Wiki

>@Kiran & Others who have been updating the wiki,
>
>Great work on the command line options and elsewhere where you guys have
>been cleaning up and writing better documentation for Nutch.
>
>This is a crucial part of the workload and is greatly appreciated.
>
>Have a great weekend.
>Lewis
>
>-- 
>Lewis
>
>
>
>
>



Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-03-30 Thread Mattmann, Chris A (388J)
Hey Kiran,

I think here:

http://wiki.apache.org/general/OurWikiFarm#per_wiki_access_control_-_tighte
n_your_wiki_just_a_little.2C_benefit_just_a_lot


Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: kiran chitturi 
Reply-To: "dev@nutch.apache.org" 
Date: Saturday, March 30, 2013 12:55 PM
To: "dev@nutch.apache.org" 
Subject: Re: Important : Bunch of Spam Created under Nutch Wiki!!

>Does anyone know what details we need to provide for the new wiki
>controls ? 
>
>
>
>I have posted a JIRA [0] to control our spam but the infrabot is asking
>more information [1]
>
>[0] - https://issues.apache.org/jira/browse/INFRA-6081
>[1] -  http://www.apache.org/dev/infra-contact#what-we-need-to-know
>
>
>
>On Thu, Mar 28, 2013 at 3:18 PM, Mattmann, Chris A (388J)
> wrote:
>
>Hi Kiran,
>
>Yes, my recommendation:
>
>1. Jump into #asfinfra on freeonode, find Joe, or Gavin or Daniel,
>ask for help. If you don't have IRC, email
>infrastruct...@apache.org <mailto:infrastruct...@apache.org>
>and/or file a 
>https://issues.apache.org/jira/browse/INFRA
><https://issues.apache.org/jira/browse/INFRA> ticket
>
>2. Request that they enable ASAP ContributorsGroup only acls
>
>I know that many Apache wikis (MoinMon) are being attackedŠ
>
>Cheers,
>Chris
>
>
>++
>Chris Mattmann, Ph.D.
>Senior Computer Scientist
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 171-266B, Mailstop: 171-246
>Email: chris.a.mattm...@nasa.gov
>WWW:  http://sunset.usc.edu/~mattmann/
>++
>Adjunct Assistant Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++
>
>
>
>
>-Original Message-
>From: kiran chitturi 
>Reply-To: "dev@nutch.apache.org" 
>Date: Thursday, March 28, 2013 12:15 PM
>To: "dev@nutch.apache.org" 
>Subject: Fwd: Important : Bunch of Spam Created under Nutch Wiki!!
>
>>Thanks to Ken (check message below) for reporting our insecure wiki. I
>>have checked it and anyone can create an fake account and edit any of our
>>wiki pages or create new ones.
>>
>>
>>When I first registered to the wiki, all the pages are immutable and
>>Lewis had to add me to Contributors group to make changes to the wiki.
>>
>>
>>Probably, the setting was hacked for now and that is the reason we are
>>facing lot of spam.
>>
>>
>>Can we contact the infra@apache and request them to lock down the wiki as
>>the other groups did ?
>>
>>
>>
>>
>>-- Forwarded message --
>>From: Ken Krugler 
>>Date: Thu, Mar 28, 2013 at 1:35 PM
>>Subject: Re: Important : Bunch of Spam Created under Nutch Wiki!!
>>To: dev@nutch.apache.org
>>
>>
>>Hi Kiran,
>>
>>On Mar 28, 2013, at 2:03am, kiran chitturi wrote:
>>
>>
>>Thank you Ken for the information. I think the access is already
>>restricted to Contributors Only. Someone can please confirm, if it is
>>not.
>>
>>
>>
>>
>>
>>It's not, as far as I know. I just created a fake account, logged in with
>>it, and edited the front page.
>>
>>
>>If anyone needs to edit wiki, they would need to ask someone to get
>>access to wiki pages.
>>
>>
>>Do you know if Solr still got hit by spam after locking down the wiki ?
>>
>>
>>
>>
>>
>>
>>I think that change helped cut down most of the spam, but I don't monitor
>>the Solr list that closely, sorry.
>>
>>
>>-- Ken
>>
>>
>>
>>
>>
>>
>>On Thu, Mar 28, 2013 at 1:40 AM, Ken Krugler
>> wrote:
>>
>>
>>
>>On Mar 27, 2013, at 6:54pm, kiran chitturi wrote:
>>
>>
>>Thank you Binoy for reporting.
>>
>>
>>We have been monitoring the pages and deleting them when we get time but
>>there are more coming up. Today, I have

Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-03-28 Thread Mattmann, Chris A (388J)
Hi Kiran, 

Yes, my recommendation:

1. Jump into #asfinfra on freeonode, find Joe, or Gavin or Daniel,
ask for help. If you don't have IRC, email infrastruct...@apache.org
and/or file a https://issues.apache.org/jira/browse/INFRA ticket

2. Request that they enable ASAP ContributorsGroup only acls

I know that many Apache wikis (MoinMon) are being attackedŠ

Cheers,
Chris


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++




-Original Message-
From: kiran chitturi 
Reply-To: "dev@nutch.apache.org" 
Date: Thursday, March 28, 2013 12:15 PM
To: "dev@nutch.apache.org" 
Subject: Fwd: Important : Bunch of Spam Created under Nutch Wiki!!

>Thanks to Ken (check message below) for reporting our insecure wiki. I
>have checked it and anyone can create an fake account and edit any of our
>wiki pages or create new ones.
>
>
>When I first registered to the wiki, all the pages are immutable and
>Lewis had to add me to Contributors group to make changes to the wiki.
>
>
>Probably, the setting was hacked for now and that is the reason we are
>facing lot of spam.
>
>
>Can we contact the infra@apache and request them to lock down the wiki as
>the other groups did ?
>
>
>
>
>-- Forwarded message --
>From: Ken Krugler 
>Date: Thu, Mar 28, 2013 at 1:35 PM
>Subject: Re: Important : Bunch of Spam Created under Nutch Wiki!!
>To: dev@nutch.apache.org
>
>
>Hi Kiran,
>
>On Mar 28, 2013, at 2:03am, kiran chitturi wrote:
>
>
>Thank you Ken for the information. I think the access is already
>restricted to Contributors Only. Someone can please confirm, if it is
>not. 
>
>
>
>
>
>It's not, as far as I know. I just created a fake account, logged in with
>it, and edited the front page.
>
>
>If anyone needs to edit wiki, they would need to ask someone to get
>access to wiki pages.
>
>
>Do you know if Solr still got hit by spam after locking down the wiki ?
>
>
>
>
>
>
>I think that change helped cut down most of the spam, but I don't monitor
>the Solr list that closely, sorry.
>
>
>-- Ken
>
>
>
>
>
>
>On Thu, Mar 28, 2013 at 1:40 AM, Ken Krugler
> wrote:
>
>
>
>On Mar 27, 2013, at 6:54pm, kiran chitturi wrote:
>
>
>Thank you Binoy for reporting.
>
>
>We have been monitoring the pages and deleting them when we get time but
>there are more coming up. Today, I have seen a spam editing on the home
>page of Nutch wiki. It has inserted spam links under tutorials.
>
>
>We need to find a permanent solution to this. I wonder if any other
>list-servs are facing the same issue.
>
>
>
>
>
>
>Yes - Solr recently had to lock down editing on their wiki:
>
>
>
>The wiki at http://wiki.apache.org/solr/ has come under attack by
>spammers more frequently of late, so the PMC has decided to lock it down
> in an attempt to reduce the work involved in tracking and removing spam.
>
>From now on, only people who appear on
>http://wiki.apache.org/solr/ContributorsGroup will be able to
>create/modify/delete wiki pages.
>
>Please request either on the solr-u...@lucene.apache.org or on
>d...@lucene.apache.org to have your wiki username added to the
>ContributorsGroup
> page - this is a one-time step.
>
>
>
>
>So I think you need to make a request to Infra to lock down the wiki,
>then add people (generally in response to explicit requests) to the
>ContributorsGroup page.
>
>
>-- Ken
>
>
>
>
>
>
>On Thu, Mar 28, 2013 at 12:49 AM, Binoy d
> wrote:
>
>I am quite suprised looking at the notification I am getting for new
>pages for Nutch Wiki
>Example :
>http://wiki.apache.org/nutch/KarlPuent
>
>I see at least 25-35 emails regarding such notification.
>
>All of the links I got are  rooted under
>http://wiki.apache.org/nutch/ 
>
>
>Is some one looking into this , If needed I can gladly forward emails to
>the person cleaning it up as I am not sure if every one has access to
>delete the pages.
>
>Regards,
>b
>
>-- Forwarded message --
>From: Apache Wiki 
>Date: Wed, Mar 27, 2013 at 9:32 PM
>Subject: [Nutch Wiki] Trivial Update of "EdwinaBro" by EdwinaBro
>To: Apache Wiki 
>
>
>Dear Wiki user,
>
>You have subscribed to a wiki page or wiki category on "Nutch Wiki" for
>change notification.
>
>The "EdwinaBro" page has been changed by EdwinaBro:
>http://wiki.apache.org/nutch/EdwinaBro
>
>New page:
>I am 24 years old and my name is Edwina Brownlee. I life in Corjolens
>(Switzerland).<>
><>
><>
>Take a look at my web-site ... [[http://modform.org/SolomonKr|Continue
>]]
>
>
>
>
>
>
>
>
>
>-- 
>Kiran Chi

Re: [Nutch Wiki] Trivial Update of "PGOSimone" by PGOSimone

2013-03-25 Thread Mattmann, Chris A (388J)
Hey Julien,

I heard on #asfinfra that any of our MoinMoin wikis have been attacked recently 
by SPAM.

I think we may want to contact infra and ask for specific ContributorsGroup 
only Nutch wiki access.

http://wiki.apache.org/general/OurWikiFarm

Cheers,
Chris


From: Julien Nioche 
mailto:lists.digitalpeb...@gmail.com>>
Reply-To: "dev@nutch.apache.org" 
mailto:dev@nutch.apache.org>>
Date: Monday, March 25, 2013 1:55 AM
To: "dev@nutch.apache.org" 
mailto:dev@nutch.apache.org>>
Subject: Re: [Nutch Wiki] Trivial Update of "PGOSimone" by PGOSimone

I thought we had to have a login / password to modify the Wiki. If so how come 
we got so much spam lately?

Julien

On 25 March 2013 04:26, Apache Wiki 
mailto:wikidi...@apache.org>> wrote:
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "PGOSimone" page has been changed by PGOSimone:
http://wiki.apache.org/nutch/PGOSimone
[..snip..]


--
[http://digitalpebble.com/img/logo.gif]
Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble


Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-03-24 Thread Mattmann, Chris A (388J)
Cool thanks!

From: kiran chitturi 
mailto:chitturikira...@gmail.com>>
Reply-To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Date: Saturday, March 23, 2013 1:36 PM
To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

Thank you Chris for your interest.

I would love to share my thesis and the work but I am still in experimenting 
stage and I will share with you soon once I have a decent UI running with 
functionalities.

Regards,
Kiran.


On Sat, Mar 23, 2013 at 2:33 PM, Mattmann, Chris A (388J) 
mailto:chris.a.mattm...@jpl.nasa.gov>> wrote:
That is so awesome Kiran.

Great job and I would love a link to your thesis (or even seeing the work in 
progress)
if you are willing to share and have the time.

Good plane reading material for me and congrats again. Looking forward to 
working
with you.

Cheers,
Chris


From: kiran chitturi 
mailto:chitturikira...@gmail.com>>
Reply-To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Date: Saturday, March 23, 2013 9:54 AM

To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

Thanks Chris!

I am planning to graduate with Masters degree in Computer Science from Virginia 
Tech University and my advisor is Dr.Fox.

My thesis work mostly relates to building search engine for the 10TB crises 
event data that we have collected over last three years. The data is collected 
using Internet Archive crawler (Archive-it) and I am indexing data using 
LucidWorks Big Data Software. The process also involves finding more metadata 
and clustering. All of this work is related to 'Crisis, Tragedy and Recovery 
Network Project (CTRnet)' (www.ctrnet.net<http://www.ctrnet.net>)

My thesis, library work and Nutch are all closely related. It has been a great 
learning experience so far :)




On Sat, Mar 23, 2013 at 12:23 PM, Mattmann, Chris A (388J) 
mailto:chris.a.mattm...@jpl.nasa.gov>> wrote:
Hi Kiran,

Awesome that works fine for me! Happy to have you contribute, and whether you 
are a formal mentor or not,
if we get a GSoC 2013 student for this you can help me, Lewis, (and others) 
shepherd it in!

Thanks man and congrats on graduating soon! Where are you graduating from and 
in what subject?

Cheers,
Chris

From: kiran chitturi 
mailto:chitturikira...@gmail.com>>
Reply-To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Date: Saturday, March 23, 2013 8:51 AM

To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

I am very much interested in the Apache Wicket project but I wouldn't be able 
to be a student since i am finishing my graduation and looking for full-time 
jobs. I have discussed with Lewis previously about this, and it wouldn't be 
ideal for me to be a GSoc 2013 student as I can't devote my full-time work on 
this.

However, I will be very happy to work on this in my free time. This is 
something I am interested in for long time and I would try to contribute in 
anyway possible.

Thank you,
Kiran.






On Sat, Mar 23, 2013 at 11:23 AM, Mattmann, Chris A (388J) 
mailto:chris.a.mattm...@jpl.nasa.gov>> wrote:
Hi Kiran,

Great, yes the REST services need work for sure. They haven't been worked on in 
a while.

I'm privy to Apache CXF, but I haven't done anything with it, and Andrzej did 
an awesome job
using Restlet, so we've got Reslet for now.

If you are interested in documenting the services, then awesome! Do you want to 
be a GSoC 2013 student,
and are you interested in this project?

Cheers,
Chris


From: kiran chitturi 
mailto:chitturikira...@gmail.com>>
Reply-To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Date: Friday, March 22, 2013 9:19 PM
To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

Hi Chris,

I was just thinking about that this evening. First, to start with this I want 
to do well documentation of the Nutch REST API.

What is the status of Rest API ? Does it need any fixes and working examples ?

Hopefully my start would be helpful and it be soon.

Thanks for opening up the issue.

Regards,
kIran.






On Fri, Mar 22, 2013 at 11:43 PM, Mattmann, Chris A (388J) 
mailto:chris.a.mattm...@jpl.nasa.gov>> wrote:
Hey Guys,

I posted:

https://issues.apache.org/jira/browse/NUTCH-841


As a potential GSOC 2013 summer project. I'm willing to mentor it, 

Re: Google Summer of Code 2013 - Giraph implementation of Nutch LinkRank Algorithm

2013-03-24 Thread Mattmann, Chris A (388J)
Super +1 -- sounds awesome Lewis.

Cheers,
Chris


On 3/24/13 12:38 PM, "Lewis John Mcgibbney" 
wrote:

>Hi All,
>
>After some discussion and drumming up of interest within the Giraph
>community, I've logged a Google Summer of Code issue [0] for this topic.
>We are looking for interested students to come forward and participate in
>the effort.
>I logged this over in Giraph as there was no GSoC eefort already going on
>there, we already have an issue for the Wicket-based User Interface
>implementation in Nutch.
>I would be very happy if people (users and developers) could chime in on
>the thread so we can get the project started with the right direction and
>intention in mind.
>I propose this for Nutch TRUNK.
>
>Thanks for now
>
>Best
>
>Lewis
>
>[0] https://issues.apache.org/jira/browse/GIRAPH-584
>
>-- 
>*Lewis*



Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-03-24 Thread Mattmann, Chris A (388J)
Hi Evert,

Thanks. Velocity would be fine, but the big issue is that I don't know
Velocity, and I know Wicket.

The great part about Wicket is that it's pure XHTML + Java code. No
config, no anything in-between.
So if you understand the component model of widgets behind the scenes, and
understand HTML, JS and CSS,
you can easily maintain a Wicket web app.

And as a Nutch PMC member there's one person here at least (me) who's
willing to maintain and steward
such a web app. So we're in business!

Cheers,
Chris


From:  Evert Wagenaar 
Reply-To:  "dev@nutch.apache.org" , Evert Wagenaar

Date:  Sunday, March 24, 2013 12:46 AM
To:  "dev@nutch.apache.org" 
Subject:  Re: GSOC 2013 project: Apache-Wicket based Nutch webapp


 I agree as well. The jsp version has become a
mess and is currently almost not s supportable anymore. Would velocity be
a good alternative? It is very good with Solr Facets and also fits into
any CMS.

 


Evert Wagenaar
evert.wagen...@me.com
+31 653 606 293






From: kiran chitturi 
To: dev@nutch.apache.org
Sent: Saturday, March 23, 2013 9:36 PM
Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp


Thank you Chris for your interest.

I would love to share my thesis and the work but I am still in
experimenting stage and I will share with you soon once I have a decent UI
running with functionalities.

Regards,
Kiran.


On Sat, Mar 23, 2013 at 2:33 PM, Mattmann, Chris A (388J)
 wrote:

That is so awesome Kiran.

Great job and I would love a link to your thesis (or even seeing the work
in progress) 
if you are willing to share and have the time.

Good plane reading material for me and congrats again. Looking forward to
working
with you.

Cheers,
Chris


From: kiran chitturi 
Reply-To: "dev@nutch.apache.org" 

Date: Saturday, March 23, 2013 9:54 AM

To: "dev@nutch.apache.org" 
Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp




Thanks Chris!

I am planning to graduate with Masters degree in Computer Science from
Virginia Tech University and my advisor is Dr.Fox.

My thesis work mostly relates to building search engine for the 10TB
crises event data that we have collected over last three years. The data
is collected using Internet Archive crawler (Archive-it) and I am indexing
data using LucidWorks Big Data Software.
 The process also involves finding more metadata and clustering. All of
this work is related to 'Crisis, Tragedy and Recovery Network Project
(CTRnet)' (www.ctrnet.net <http://www.ctrnet.net/>)

My thesis, library work and Nutch are all closely related. It has been a
great learning experience so far :)




On Sat, Mar 23, 2013 at 12:23 PM, Mattmann, Chris A (388J)
 wrote:

Hi Kiran,

Awesome that works fine for me! Happy to have you contribute, and whether
you are a formal mentor or not,
if we get a GSoC 2013 student for this you can help me, Lewis, (and
others) shepherd it in!

Thanks man and congrats on graduating soon! Where are you graduating from
and in what subject?

Cheers,
Chris

From: kiran chitturi 
Reply-To: "dev@nutch.apache.org" 

Date: Saturday, March 23, 2013 8:51 AM

To: "dev@nutch.apache.org" 
Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp




I am very much interested in the Apache Wicket project but I wouldn't be
able to be a student since i am finishing my graduation and looking for
full-time jobs. I have discussed with Lewis previously about this, and it
wouldn't be ideal for me
 to be a GSoc 2013 student as I can't devote my full-time work on this.

However, I will be very happy to work on this in my free time. This is
something I am interested in for long time and I would try to contribute
in anyway possible.

Thank you,
Kiran.







On Sat, Mar 23, 2013 at 11:23 AM, Mattmann, Chris A (388J)
 wrote:

Hi Kiran,

Great, yes the REST services need work for sure. They haven't been worked
on in a while.

I'm privy to Apache CXF, but I haven't done anything with it, and Andrzej
did an awesome job
using Restlet, so we've got Reslet for now.

If you are interested in documenting the services, then awesome! Do you
want to be a GSoC 2013 student,
and are you interested in this project?

Cheers,
Chris


From: kiran chitturi 
Reply-To: "dev@nutch.apache.org" 
Date: Friday, March 22, 2013 9:19 PM
To: "dev@nutch.apache.org" 
Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp


Hi Chris,

I was just thinking about that this evening. First, to start with this I
want to do well documentation of the Nutch REST API.

What is the status of Rest API ? Does it need any fixes and working
examples ?

Hopefully my start would be helpful and it be soon.

Thanks for opening up the issue.

Regards,
kIran.







On Fri, Mar 22, 2013 at 11:43 PM, Mattmann, Chris A (388J)
 wrote:

Hey Guys,


Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-03-23 Thread Mattmann, Chris A (388J)
That is so awesome Kiran.

Great job and I would love a link to your thesis (or even seeing the work in 
progress)
if you are willing to share and have the time.

Good plane reading material for me and congrats again. Looking forward to 
working
with you.

Cheers,
Chris


From: kiran chitturi 
mailto:chitturikira...@gmail.com>>
Reply-To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Date: Saturday, March 23, 2013 9:54 AM
To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

Thanks Chris!

I am planning to graduate with Masters degree in Computer Science from Virginia 
Tech University and my advisor is Dr.Fox.

My thesis work mostly relates to building search engine for the 10TB crises 
event data that we have collected over last three years. The data is collected 
using Internet Archive crawler (Archive-it) and I am indexing data using 
LucidWorks Big Data Software. The process also involves finding more metadata 
and clustering. All of this work is related to 'Crisis, Tragedy and Recovery 
Network Project (CTRnet)' (www.ctrnet.net<http://www.ctrnet.net>)

My thesis, library work and Nutch are all closely related. It has been a great 
learning experience so far :)




On Sat, Mar 23, 2013 at 12:23 PM, Mattmann, Chris A (388J) 
mailto:chris.a.mattm...@jpl.nasa.gov>> wrote:
Hi Kiran,

Awesome that works fine for me! Happy to have you contribute, and whether you 
are a formal mentor or not,
if we get a GSoC 2013 student for this you can help me, Lewis, (and others) 
shepherd it in!

Thanks man and congrats on graduating soon! Where are you graduating from and 
in what subject?

Cheers,
Chris

From: kiran chitturi 
mailto:chitturikira...@gmail.com>>
Reply-To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Date: Saturday, March 23, 2013 8:51 AM

To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

I am very much interested in the Apache Wicket project but I wouldn't be able 
to be a student since i am finishing my graduation and looking for full-time 
jobs. I have discussed with Lewis previously about this, and it wouldn't be 
ideal for me to be a GSoc 2013 student as I can't devote my full-time work on 
this.

However, I will be very happy to work on this in my free time. This is 
something I am interested in for long time and I would try to contribute in 
anyway possible.

Thank you,
Kiran.






On Sat, Mar 23, 2013 at 11:23 AM, Mattmann, Chris A (388J) 
mailto:chris.a.mattm...@jpl.nasa.gov>> wrote:
Hi Kiran,

Great, yes the REST services need work for sure. They haven't been worked on in 
a while.

I'm privy to Apache CXF, but I haven't done anything with it, and Andrzej did 
an awesome job
using Restlet, so we've got Reslet for now.

If you are interested in documenting the services, then awesome! Do you want to 
be a GSoC 2013 student,
and are you interested in this project?

Cheers,
Chris


From: kiran chitturi 
mailto:chitturikira...@gmail.com>>
Reply-To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Date: Friday, March 22, 2013 9:19 PM
To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

Hi Chris,

I was just thinking about that this evening. First, to start with this I want 
to do well documentation of the Nutch REST API.

What is the status of Rest API ? Does it need any fixes and working examples ?

Hopefully my start would be helpful and it be soon.

Thanks for opening up the issue.

Regards,
kIran.






On Fri, Mar 22, 2013 at 11:43 PM, Mattmann, Chris A (388J) 
mailto:chris.a.mattm...@jpl.nasa.gov>> wrote:
Hey Guys,

I posted:

https://issues.apache.org/jira/browse/NUTCH-841


As a potential GSOC 2013 summer project. I'm willing to mentor it, since I
love
Wicket, and I'm willing to maintain the result as a Nutch committer.

If NUTCH-841 doesn't get selected, I'll start implementing it this summer
if no
one beats me to it.

Cheers,
Chris




--
Kiran Chitturi

[X]<http://www.linkedin.com/in/kiranchitturi>





--
Kiran Chitturi

[X]<http://www.linkedin.com/in/kiranchitturi>





--
Kiran Chitturi

[http://www.linkedin.com/img/webpromo/btn_viewmy_160x33.png]<http://www.linkedin.com/in/kiranchitturi>




Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-03-23 Thread Mattmann, Chris A (388J)
Hi Kiran,

Awesome that works fine for me! Happy to have you contribute, and whether you 
are a formal mentor or not,
if we get a GSoC 2013 student for this you can help me, Lewis, (and others) 
shepherd it in!

Thanks man and congrats on graduating soon! Where are you graduating from and 
in what subject?

Cheers,
Chris

From: kiran chitturi 
mailto:chitturikira...@gmail.com>>
Reply-To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Date: Saturday, March 23, 2013 8:51 AM
To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

I am very much interested in the Apache Wicket project but I wouldn't be able 
to be a student since i am finishing my graduation and looking for full-time 
jobs. I have discussed with Lewis previously about this, and it wouldn't be 
ideal for me to be a GSoc 2013 student as I can't devote my full-time work on 
this.

However, I will be very happy to work on this in my free time. This is 
something I am interested in for long time and I would try to contribute in 
anyway possible.

Thank you,
Kiran.






On Sat, Mar 23, 2013 at 11:23 AM, Mattmann, Chris A (388J) 
mailto:chris.a.mattm...@jpl.nasa.gov>> wrote:
Hi Kiran,

Great, yes the REST services need work for sure. They haven't been worked on in 
a while.

I'm privy to Apache CXF, but I haven't done anything with it, and Andrzej did 
an awesome job
using Restlet, so we've got Reslet for now.

If you are interested in documenting the services, then awesome! Do you want to 
be a GSoC 2013 student,
and are you interested in this project?

Cheers,
Chris


From: kiran chitturi 
mailto:chitturikira...@gmail.com>>
Reply-To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Date: Friday, March 22, 2013 9:19 PM
To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

Hi Chris,

I was just thinking about that this evening. First, to start with this I want 
to do well documentation of the Nutch REST API.

What is the status of Rest API ? Does it need any fixes and working examples ?

Hopefully my start would be helpful and it be soon.

Thanks for opening up the issue.

Regards,
kIran.






On Fri, Mar 22, 2013 at 11:43 PM, Mattmann, Chris A (388J) 
mailto:chris.a.mattm...@jpl.nasa.gov>> wrote:
Hey Guys,

I posted:

https://issues.apache.org/jira/browse/NUTCH-841


As a potential GSOC 2013 summer project. I'm willing to mentor it, since I
love
Wicket, and I'm willing to maintain the result as a Nutch committer.

If NUTCH-841 doesn't get selected, I'll start implementing it this summer
if no
one beats me to it.

Cheers,
Chris




--
Kiran Chitturi

[http://www.linkedin.com/img/webpromo/btn_viewmy_160x33.png]<http://www.linkedin.com/in/kiranchitturi>





--
Kiran Chitturi

[http://www.linkedin.com/img/webpromo/btn_viewmy_160x33.png]<http://www.linkedin.com/in/kiranchitturi>




Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-03-23 Thread Mattmann, Chris A (388J)
Hi Kiran,

Great, yes the REST services need work for sure. They haven't been worked on in 
a while.

I'm privy to Apache CXF, but I haven't done anything with it, and Andrzej did 
an awesome job
using Restlet, so we've got Reslet for now.

If you are interested in documenting the services, then awesome! Do you want to 
be a GSoC 2013 student,
and are you interested in this project?

Cheers,
Chris


From: kiran chitturi 
mailto:chitturikira...@gmail.com>>
Reply-To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Date: Friday, March 22, 2013 9:19 PM
To: "dev@nutch.apache.org<mailto:dev@nutch.apache.org>" 
mailto:dev@nutch.apache.org>>
Subject: Re: GSOC 2013 project: Apache-Wicket based Nutch webapp

Hi Chris,

I was just thinking about that this evening. First, to start with this I want 
to do well documentation of the Nutch REST API.

What is the status of Rest API ? Does it need any fixes and working examples ?

Hopefully my start would be helpful and it be soon.

Thanks for opening up the issue.

Regards,
kIran.






On Fri, Mar 22, 2013 at 11:43 PM, Mattmann, Chris A (388J) 
mailto:chris.a.mattm...@jpl.nasa.gov>> wrote:
Hey Guys,

I posted:

https://issues.apache.org/jira/browse/NUTCH-841


As a potential GSOC 2013 summer project. I'm willing to mentor it, since I
love
Wicket, and I'm willing to maintain the result as a Nutch committer.

If NUTCH-841 doesn't get selected, I'll start implementing it this summer
if no
one beats me to it.

Cheers,
Chris




--
Kiran Chitturi

[http://www.linkedin.com/img/webpromo/btn_viewmy_160x33.png]<http://www.linkedin.com/in/kiranchitturi>




GSOC 2013 project: Apache-Wicket based Nutch webapp

2013-03-22 Thread Mattmann, Chris A (388J)
Hey Guys,

I posted:

https://issues.apache.org/jira/browse/NUTCH-841


As a potential GSOC 2013 summer project. I'm willing to mentor it, since I
love
Wicket, and I'm willing to maintain the result as a Nutch committer.

If NUTCH-841 doesn't get selected, I'll start implementing it this summer
if no
one beats me to it.

Cheers,
Chris



FW: GSoC 2013

2013-03-18 Thread Mattmann, Chris A (388J)
[Apologies for cross post]

Guys, to play in the GSoC 2013 spec, we just need to tag issues in JIRA
with the gsoc2013 tag.

I'll try and come up with  few projects soon :)

Cheers,
Chris


On 3/15/13 11:15 AM, "Luciano Resende"  wrote:

>On Fri, Mar 15, 2013 at 11:01 AM, Manish Agrawal 
>wrote:
>> Hi
>>
>> I am Manish Agrawal, a 3rd year student of Mathematics and computing
>> department from IIT Delhi.
>>
>> I want to participate in GSoC 2013 through one of the ASF projects. I
>>would
>> be really thankful if you could please suggest me how should I proceed
>>for
>> the same.
>>
>> Hoping for a reply.
>>
>> Thanks
>> Manish Agrawal
>
>Google is sponsoring GSoC 2013, and Apache Software Foundation is
>planing to participate again.
>More information about Apache Participation in GSoC is available at :
>http://community.apache.org/gsoc.html.
>
>The proper way to find a project idea would be to identify an Apache
>Project in the area of your interest and start discussions with them
>via the project mailing list.
>
>The projects are starting to create their project ideas, and you can
>start browsing them at
>https://issues.apache.org/jira/secure/IssueNavigator!executeAdvanced.jspa?
>jqlQuery=labels+=+gsoc2013&runQuery=true&clear=true
>
>
>-- 
>Luciano Resende
>http://people.apache.org/~lresende
>http://twitter.com/lresende1975
>http://lresende.blogspot.com/



FW: [OPENING] Google Summer of Code Applications

2013-03-10 Thread Mattmann, Chris A (388J)
FYI

On 3/10/13 5:10 PM, "Lewis John Mcgibbney" 
wrote:

>I just told a huge lie.
>I got my dates mixed up...
>Students have from between April 22nd and May 3rd to get proposals in.
>Sorry about the mix up.
>
>Lewis
>
>On Sun, Mar 10, 2013 at 5:09 PM, Lewis John Mcgibbney <
>lewis.mcgibb...@gmail.com> wrote:
>
>> Hi All,
>>
>> We have from the 18th until the 29th to submit this years GSoC
>> proposals[0].
>>
>> Just a gentle reminder for any potential guys wanting to formally
>>apply...
>>
>> The idea would be to sort out any discrepancies just now and to develop
>> your proposal to a comprehensive standard.
>>
>> I am interested in mentoring another project this year, so can work with
>> folks who wish to progress with proposals.
>>
>> Thanks
>>
>> Lewis
>>
>> [0] http://www.google-melange.com/gsoc/events/google/gsoc2013
>>
>> --
>> *Lewis*
>>
>
>
>
>-- 
>*Lewis*



Re: [ANNOUNCEMENT] Welcome Kiran Chitturi as Apache Nutch PMC and Committer

2013-03-10 Thread Mattmann, Chris A (388J)
This is great to hear Kiran, welcome to the team!

Cheers,
Chris


From: Julien Nioche 
mailto:lists.digitalpeb...@gmail.com>>
Reply-To: "dev@nutch.apache.org" 
mailto:dev@nutch.apache.org>>
Date: Sunday, March 10, 2013 2:15 PM
To: "dev@nutch.apache.org" 
mailto:dev@nutch.apache.org>>
Subject: Re: [ANNOUNCEMENT] Welcome Kiran Chitturi as Apache Nutch PMC and 
Committer

Great to hear about your use of Nutch at your library and welcome on board 
Kiran!

Julien

On 10 March 2013 01:27, kiran chitturi 
mailto:chitturikira...@gmail.com>> wrote:
Thanks a lot guys for inviting me and for the wishes.

I am a graduate student in Virginia Tech University doing my Masters in 
Computer Science. I have been using Apache Nutch for the last one year as part 
of my assistantship with our University Library.

The Digital Libraries and Archives division of our libraries was using Google 
Mini Search Engine for their website that hosts 600k files but Google Mini was 
no longer supported and we want to try building Search Engine using Open Source 
technologies.

That is when i started my journey with Nutch and we were able to successfully 
achieve our Goals using Nutch and Solr. The library was pleased with the 
project and they are more interested now to work with Open Source software 
whenever possible.

I liked working with Nutch community and it has been a great learning 
experience for me. I would like to learn and contribute back even after my 
graduation.

Few things that I have in my mind right now other than committing patches are 
to improve our documentation (Wiki), helping users to my best and also to start 
the Apache Wicket UI work soon for 2.x in Nutch.

Regards,
Kiran.




On Sat, Mar 9, 2013 at 4:06 PM, Tejas Patil 
mailto:tejas.patil...@gmail.com>> wrote:
Welcome aboard Kiran :)


On Sat, Mar 9, 2013 at 12:56 PM, lewis john mcgibbney 
mailto:lewi...@apache.org>> wrote:
Hi All,

Over the last while we have been aware of Kiran's ongoing contribution to the 
Nutch community.
It is with great pleasure that we invite Kiran to join the Nutch PMC and also 
take up Committer role.
@Kiran, please feel free to say a bit about yourself and introduce what brought 
you to Apache Nutch.
Have a great weekend.
Best
Lewis




--
Kiran Chitturi



--
[http://digitalpebble.com/img/logo.gif]
Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble


Re: Review board giving issue

2013-03-07 Thread Mattmann, Chris A (388J)
Hi Tejas,

Yeah I was having some issue at the time, but will try and see if it is working 
tomorrow. If it's still not working we can contact infra@

Cheers,
Chris


From: Tejas Patil mailto:tejas.patil...@gmail.com>>
Reply-To: "dev@nutch.apache.org" 
mailto:dev@nutch.apache.org>>
Date: Tuesday, March 5, 2013 9:07 PM
To: "dev@nutch.apache.org" 
mailto:dev@nutch.apache.org>>
Subject: Review board giving issue

Hi all,

I am trying to use review board to upload a patch for a Jira and it is giving 
me same issue as I had before [0]. Below are the steps that I follow:
1. Generate a patch file using "svn diff" command.
2. On review board page, I select repository as "Nutch"
3. Repository as "https://svn.apache.org/repos/asf/nutch/trunk"; (the patch is 
for 1.x)
4. Attach the diff file.

I used to follow the same steps at work and it worked out well.
But over here I get this error message:
The file 
'https://svn.apache.org/repos/asf/nutch/trunk/src/plugin/lib-http/src/test/org/apache/nutch/protocol/http/api/TestRobotRulesParser.java'
 (r1453161) could not be found in the repository

There was a review request in nutch group in last month [1] after the thread 
[0]. So I have a feeling that there is something weird with my account or I am 
doing something wrong. Can anyone help me here ?

[0] : 
http://mail-archives.apache.org/mod_mbox/nutch-dev/201301.mbox/%3cfa2d97dfc830824e9040174e7f89744925085...@ap-embx-sp40.res.ad.jpl%3E

[1] : https://reviews.apache.org/r/9119/


Re: [DISCUSS] Google Summer of Code

2013-03-04 Thread Mattmann, Chris A (388J)
Hey Markus:

https://issues.apache.org/jira/browse/NUTCH-1539


Will submit the code soon.

Cheers,
Chris

On 3/4/13 1:43 PM, "Markus Jelsma"  wrote:

>Ah yes! Please open an issue and if you can attach anything that matters
>such as a description of the algorithm, how it should work with
>Nutch/MapReduce or even code/tests.
>
>If there's code i may be able to patch it up for trunk rather quickly and
>see how it performs.
>
>Cheers,
>Markus
>
> 
> 
>-Original message-
>> From:Mattmann, Chris A (388J) 
>> Sent: Mon 04-Mar-2013 22:27
>> To: dev@nutch.apache.org
>> Subject: Re: [DISCUSS] Google Summer of Code
>> 
>> Hey Markus,
>> 
>> Yep my student implement HITS (on the fly) ranking, and classification
>>(I
>> think).
>> 
>> It's sitting on my HD for 2 years :(
>> 
>> So if someone can pick it up it would be a nice GSoC project.
>> 
>> Glad to hear there is interest.
>> 
>> Cheers,
>> Chris
>> 
>> On 3/4/13 1:21 PM, "Markus Jelsma"  wrote:
>> 
>> >Chris!
>> >
>> >Do you mean automatic classification of hub and authority pages? If so,
>> >we're more than interested in that. This is still an issue for our site
>> >search platform and one that haven't given much more attention than
>>some
>> >research and prototypes.
>> >
>> >Cheers
>> >
>> > 
>> > 
>> >-Original message-
>> >> From:Mattmann, Chris A (388J) 
>> >> Sent: Mon 04-Mar-2013 22:02
>> >> To: dev@nutch.apache.org
>> >> Subject: Re: [DISCUSS] Google Summer of Code
>> >> 
>> >> Hey Lewis,
>> >> 
>> >> Great job starting this thread. +1 Giraph is welcome here.
>> >>Multi-project GSoCs always do well.
>> >> 
>> >> One thing I had in mind was taking an implementation of Hubs and
>> >>Authorities developed for
>> >> Nutch 1.3 a few years back in my USC class and then having someone
>> >>integrate it into the
>> >> current Nutch 1.x branch to start.
>> >> 
>> >> If folks are interested I can create a JIRA.
>> >> 
>> >> Cheers,
>> >> Chris
>> >> 
>> >> 
>> >> From: Lewis John Mcgibbney > >> >
>> >> Reply-To: "dev@nutch.apache.org  "
>> >>mailto:dev@nutch.apache.org> >
>> >> Date: Monday, March 4, 2013 12:23 PM
>> >> To: "dev@nutch.apache.org  "
>> >>mailto:dev@nutch.apache.org> >
>> >> Subject: [DISCUSS] Google Summer of Code
>> >> 
>> >> Hi All,
>> >> 
>> >> I thought I would ask the question as to who (if anyone) is intending
>> >>on engaging as a mentor (or student if you are one) within this years
>> >>GSoC project.
>> >> There are plenty of projects we could do within Nutch.
>> >> Obvious ones that come to mind are
>> >> - Wicket webapp for Nutch 2.x
>> >> - Integration of Giraph with Nutch
>> >> We already have one proposal which I would consider mentoring over on
>> >>Apache Gora, but I will certainly not back down from any proposals in
>> >>Nutch.
>> >> Would the Giraph project be welcomed here? If so I can head over to
>> >>user@ Giraph in an attempt to attract interest.
>> >> Of course this is a discussion based on what folks want to do and the
>> >>list above should be added to.
>> >> Thanks for now
>> >> Lewis
>> >> 
>> >> -- 
>> >> Lewis
>> >> 
>> 
>> 



Re: [DISCUSS] Google Summer of Code

2013-03-04 Thread Mattmann, Chris A (388J)
Hey Markus,

Yep my student implement HITS (on the fly) ranking, and classification (I
think).

It's sitting on my HD for 2 years :(

So if someone can pick it up it would be a nice GSoC project.

Glad to hear there is interest.

Cheers,
Chris

On 3/4/13 1:21 PM, "Markus Jelsma"  wrote:

>Chris!
>
>Do you mean automatic classification of hub and authority pages? If so,
>we're more than interested in that. This is still an issue for our site
>search platform and one that haven't given much more attention than some
>research and prototypes.
>
>Cheers
>
> 
> 
>-Original message-
>> From:Mattmann, Chris A (388J) 
>> Sent: Mon 04-Mar-2013 22:02
>> To: dev@nutch.apache.org
>> Subject: Re: [DISCUSS] Google Summer of Code
>> 
>> Hey Lewis,
>> 
>> Great job starting this thread. +1 Giraph is welcome here.
>>Multi-project GSoCs always do well.
>> 
>> One thing I had in mind was taking an implementation of Hubs and
>>Authorities developed for
>> Nutch 1.3 a few years back in my USC class and then having someone
>>integrate it into the
>> current Nutch 1.x branch to start.
>> 
>> If folks are interested I can create a JIRA.
>> 
>> Cheers,
>> Chris
>> 
>> 
>> From: Lewis John Mcgibbney > >
>> Reply-To: "dev@nutch.apache.org  "
>>mailto:dev@nutch.apache.org> >
>> Date: Monday, March 4, 2013 12:23 PM
>> To: "dev@nutch.apache.org  "
>>mailto:dev@nutch.apache.org> >
>> Subject: [DISCUSS] Google Summer of Code
>> 
>> Hi All,
>> 
>> I thought I would ask the question as to who (if anyone) is intending
>>on engaging as a mentor (or student if you are one) within this years
>>GSoC project.
>> There are plenty of projects we could do within Nutch.
>> Obvious ones that come to mind are
>> - Wicket webapp for Nutch 2.x
>> - Integration of Giraph with Nutch
>> We already have one proposal which I would consider mentoring over on
>>Apache Gora, but I will certainly not back down from any proposals in
>>Nutch.
>> Would the Giraph project be welcomed here? If so I can head over to
>>user@ Giraph in an attempt to attract interest.
>> Of course this is a discussion based on what folks want to do and the
>>list above should be added to.
>> Thanks for now
>> Lewis
>> 
>> -- 
>> Lewis
>> 



Re: [DISCUSS] Google Summer of Code

2013-03-04 Thread Mattmann, Chris A (388J)
Hey Lewis,

Great job starting this thread. +1 Giraph is welcome here. Multi-project GSoCs 
always do well.

One thing I had in mind was taking an implementation of Hubs and Authorities 
developed for
Nutch 1.3 a few years back in my USC class and then having someone integrate it 
into the
current Nutch 1.x branch to start.

If folks are interested I can create a JIRA.

Cheers,
Chris


From: Lewis John Mcgibbney 
mailto:lewis.mcgibb...@gmail.com>>
Reply-To: "dev@nutch.apache.org" 
mailto:dev@nutch.apache.org>>
Date: Monday, March 4, 2013 12:23 PM
To: "dev@nutch.apache.org" 
mailto:dev@nutch.apache.org>>
Subject: [DISCUSS] Google Summer of Code

Hi All,

I thought I would ask the question as to who (if anyone) is intending on 
engaging as a mentor (or student if you are one) within this years GSoC project.
There are plenty of projects we could do within Nutch.
Obvious ones that come to mind are
- Wicket webapp for Nutch 2.x
- Integration of Giraph with Nutch
We already have one proposal which I would consider mentoring over on Apache 
Gora, but I will certainly not back down from any proposals in Nutch.
Would the Giraph project be welcomed here? If so I can head over to user@ 
Giraph in an attempt to attract interest.
Of course this is a discussion based on what folks want to do and the list 
above should be added to.
Thanks for now
Lewis

--
Lewis


Re: Nutch JAVA Application

2013-02-12 Thread Mattmann, Chris A (388J)
Hi Shann,

Thank you for reaching out! If your goal is to get your project integrated
into Apache Nutch, 
proper, then I would recommend simply:

0. File some JIRA issues in Apache Nutch
http://issues.apache.org/jira/browse/NUTCH Small incremental patches and
issues are preferred and this will let people know what your plan is so
you can get committers and PMC members attention.

1. svn co http://svn.apache.org/repos/asf/nutch/branches/2.x/
2. cd 2.x
3. Edit files
4. svn status (make sure the files you edited looked correct)
6. svn diff > NUTCH-xxx.sleduc.yyMMdd.patch.txt for each issue you created
7. Attach patches from #6 to issues from #1

Otherwise if you go off onto Github, and work it's going to be harder to
get your patch accepted since it will represent large change when instead
you can effect the change here at the ASF, incrementally making sure your
code gets in.

ALv2 is the license to use, BTW, either way you decide.

Cheers,
Chris


On 2/12/13 12:25 PM, "Shann"  wrote:

>Hi,
>Part of my internship, we must develop a specialized search engine using
>Nutch, Solr, HBase, Tika.
>
>I began to develop a Java application for crawler with Nuth branch 2.x.
>
>Functions inject, generate, fetch, parse, updatedb, solrindex based on the
>actual execution of nutch via a shell command from Java application.
>
>As an advocate of free software, I propose therefore to give you access to
>my git project.
> 
>Using nutch in the background, under what license should I put my
>application ?
>
>
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Nutch-JAVA-Application-tp4040050.html
>Sent from the Nutch - Dev mailing list archive at Nabble.com.



FW: [GSoC Mentors] Google Summer of Code 2013

2013-02-11 Thread Mattmann, Chris A (388J)
[Sorry for cross posting]

Guys,

FYI please note that you can participate as a mentor from a PMC via Apache as 
they are a GSoC org. ComDev will coordinate our participation but start 
thinking about what projects we may want to do.

Cheers,
Chris

From: Carol Smith mailto:car...@google.com>>
Date: Monday, February 11, 2013 11:02 AM
To: Google Summer of Code Mentors List 
mailto:google-summer-of-code-mentors-l...@googlegroups.com>>
Subject: [GSoC Mentors] Google Summer of Code 2013

Hi GSoC mentors and org admins,

We've announced that we're doing Google Summer of Code 2013 [1]. Yay!

If you would like to help spread the word about GSoC, we have presentations 
[2], logos [3], and flyers [4] for you to use. Please host meetups, tell your 
friends and colleagues about the program, go to conferences, talk to people 
about the program, and just generally do all the awesome word-of-mouth stuff 
you do every year to promote the program.

The GSoC calendar, FAQ, and events timeline have all been updated with this 
year's important dates, so please refer to those for the milestones for this 
year's program. NB: the normal timeline for the program has been modified for 
this year. You'll probably want to examine the dates closely to make sure you 
know when important things are happening.

Please consider translating the presentations and/or flyers into your native 
language and submitting them directly to me to post on the wiki. Localization 
for our material is integral to reaching the widest possible audience around 
the world. If you decide to translate a flyer, please fill out our form to 
request a thank you gift for your effort. [5]

If you decide to host a meetup, please email me to let me know the date, time, 
and location so I can put it on the GSoC calendar. Also, remember to take 
pictures at your meetup and write up a blog post for our blog using our 
provided template for formatting [6]. If you need promotional items for your 
attendees, please fill out our form [7] to request some; we're happy to send 
some along. We can provide up to about 25 pens, notebooks, or stickers and/or a 
few t-shirts. Please keep in mind, though, that shipping restrictions and 
timeline vary country-to-country; request items early to make sure they get 
there on time! If you have questions about hosting meetups, please see the 
section in our FAQ [8].

Please consider applying to participate as an organization again this year or 
maybe joining as a mentor for your favorite organization if they are selected 
this year.

We rely on you for your help for the success of this program, so thank you in 
advance for all the work you do!

[1] - 
http://google-opensource.blogspot.com/2013/02/flip-bits-not-burgers-google-summer-of.html
[2] - http://code.google.com/p/google-summer-of-code/wiki/ProgramPresentations
[3] - http://code.google.com/p/google-summer-of-code/wiki/GsocLogos
[4] - http://code.google.com/p/google-summer-of-code/wiki/GsocFlyers
[5] - http://goo.gl/gEHDO
[6] - http://goo.gl/wbZrt
[7] - http://goo.gl/0BsR8
[8] - http://goo.gl/2NGfp

Cheers,
Carol

--
You received this message because you are subscribed to the Google Groups 
"Google Summer of Code Mentors List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
google-summer-of-code-mentors-list+unsubscr...@googlegroups.com.
To post to this group, send email to 
google-summer-of-code-mentors-l...@googlegroups.com.
Visit this group at 
http://groups.google.com/group/google-summer-of-code-mentors-list?hl=en-US.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [DISCUSS] Nutch Policy/Opinion on Review Board

2013-01-31 Thread Mattmann, Chris A (388J)
I love it and will use it but don't think it needs to be a policy to each their 
own :)

Thanks buddy

Sent from my iPhone

On Jan 31, 2013, at 3:58 PM, "Lewis John Mcgibbney"  
wrote:

> Hi All,
> 
> I thought I would  create this thread as the Review Board platform has
> been floating around now for a bit and I wonder if we can leverage it
> to improve/streamline the efficiency of Nutch community contributions.
> 
> So I thought I'd leave this thread nice and short.
> 
> 1) I am new to Review Board. I don't know much about it. I haven't
> used it before.
> 2) I am interested to see if we can make contributions and
> particularly reviewing a more open and transparent process.
> 3) I want to hear what you guys think.
> 
> Some links which may be of interest [0][1][2]
> 
> Ta
> Lewis
> 
> [0] https://blogs.apache.org/infra/entry/reviewboard_instance_running_at_the
> [1] https://reviews.apache.org
> [2] http://www.reviewboard.org/
> 
> -- 
> Lewis


Re: review board

2013-01-26 Thread Mattmann, Chris A (388J)
Hey Tejas,

Yeah I think this has to do with something in the repo URL on the RB server 
side. I would file an INFRA ticket, or jump on #asfinfra on IRC and ask one of 
the guys for help there.

Cheers,
Chris

From: Tejas Patil mailto:tejas.patil...@gmail.com>>
Reply-To: "dev@nutch.apache.org" 
mailto:dev@nutch.apache.org>>
Date: Friday, January 25, 2013 10:28 PM
To: "dev@nutch.apache.org" 
mailto:dev@nutch.apache.org>>
Subject: review board

Hi,

Has anyone recently faced an issue with Review Board while uploading a patch ?
I created a patch for a change and tried to upload it via web UI of review 
board. It says:
The file 
'https://svn.apache.org/repos/asf/nutch/trunk/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java'
 (r1438860) could not be found in the repository

Quite similar to the description given in [0]. HttpBase.java exists at the link 
given. My patch involves few changes to it.

I think what I did is right, but still want to confirm. I generated the patch 
file using "svn diff" command. I am using svn, version 1.7.5. The patch was for 
nutch trunk. For uploading, I obtained the base directory from "svn info" 
command.

Meanwhile I am googling for this issue, it would be great if someone can point 
out the problem here.

[0] : https://issues.apache.org/jira/browse/INFRA-5046

Thanks,
Tejas Patil



Re: 1.8 in Jira

2012-12-21 Thread Mattmann, Chris A (388J)
woot yep ;)

On 12/21/12 2:55 AM, "Markus Jelsma"  wrote:

>forget it, i meant 1.7 but it's there already!
> 
>-Original message-
>> From:Markus Jelsma 
>> Sent: Fri 21-Dec-2012 11:54
>> To:  
>> Subject: 1.8 in Jira
>> 
>> Anyone here with rights to add 1.8 to Jira?
>> Thanks
>> 



Re: [ANNOUNCE] Apache Nutch 1.6 Released

2012-12-10 Thread Mattmann, Chris A (388J)
Here here, excellent work!

Cheers,
Chris

From: Julien Nioche 
mailto:lists.digitalpeb...@gmail.com>>
Reply-To: "dev@nutch.apache.org" 
mailto:dev@nutch.apache.org>>
Date: Saturday, December 8, 2012 10:34 PM
To: "dev@nutch.apache.org" 
mailto:dev@nutch.apache.org>>
Subject: Re: [ANNOUNCE] Apache Nutch 1.6 Released

Great stuff! Thanks Lewis

On 8 December 2012 21:50, Lewis John Mcgibbney 
mailto:lewis.mcgibb...@gmail.com>> wrote:
Hi All,

The Apache Nutch PMC are extremely pleased to announce the release of
Apache Nutch v1.6. This release includes over 20 bug fixes, the same
in improvements, as well as new functionalities including a new
HostNormalizer, the ability to dynamically set fetchInterval by
MIME-type and functional enhancements to the Indexer API inluding the
normalization of URL's and the deletion of robots noIndex documents.
Other notable improvements include the upgrade of key dependencies to
Tika 1.2 and Automaton 1.11-8.

A full PMC statement can be found here [0]

The release can be found on official Apache mirrors [1] as well as
sources in Maven Central [2]

Thank you

Lewis
On Behalf of the Nutch PMC

[0] http://s.apache.org/NFp
[1] http://www.apache.org/dyn/closer.cgi/nutch/
[2] http://search.maven.org/#artifactdetails|org.apache.nutch|nutch|1.6|jar

--
Lewis



--
[http://digitalpebble.com/img/logo.gif]
Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble



Re: Strategy for Assigning Issues by Version

2012-11-29 Thread Mattmann, Chris A (388J)
+50 :)

On Nov 29, 2012, at 8:32 AM, Lewis John Mcgibbney wrote:

> So in summary,
> 
> We retain the legacy behavior and bump them ALL to 1.7
> 
> In the 1.7 development drive (if and when we can) we make an effort to act on 
> patched issues in an attempt to pick the low hanging fruit so to speak... if 
> such a thing exists.
> 
> best
> 
> Lewis
> 
> On Thu, Nov 29, 2012 at 3:56 PM, Julien Nioche 
>  wrote:
> Good idea! I suspect that most of them will be dating from a looong time ago 
> and it won't be such a straightforward task to apply them, however this would 
> be a good way of sorting them
> 
> 
> 
> Additionally, may I suggest (and please shoot me down here if I sound
> cheeky) that we make it a priority in the next development drive, to
> harness the issues which are marked as patch submitted? It seems to be
> a waste for such issues to be stagnating. I am conscious that this
> comment may sound wide of me, this is not the intention, I do think
> however that it would be nice to work our way towards Nucth releases
> in a more strategic manner than we have been doing. Hopefully this
> proposal is a step in the right direction.
> 
> 
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 
> 
> 
> 
> -- 
> Lewis 
> 



Re: Strategy for Assigning Issues by Version

2012-11-29 Thread Mattmann, Chris A (388J)
Hey Lewis,

On Nov 29, 2012, at 5:54 AM, Lewis John Mcgibbney wrote:

> Hi All,
> 
> Right now I found myself facing a bit of a dilemma w.r.t "bumping on"
> the issues for the next Nutch release.
> 
> Currently due to legacy workflows, we have some 120 issues assigned
> for 1.6... however ALL issues have been addressed for 1.6 meaning that
> the 120 issues are for > 1.6 however not necessarily for 1.7.

I would just set them for 1.7. I just use N+1 as the next release whether or 
not we actually plan to solve them for 1.7. Then when 1.7 comes along you 
can bump those 1.7s that we didn't get to, to 1.8, etc.

> 
> A suggestion from myself, can I mark these issues as no fix version?
> This means that we can carve/manufacture the next development drive to
> what developers want to fix and to what features requests we receive
> from the community rather than sitting with a constant pile of issues
> which are always for the next development drive.

Marking them as no fix version destroys pretty important reporting that I like
to use which is pulling up a list of all the upcoming issues of relevance set
for the next release. Without setting a Fix version you have to use the other
JIRA search tools to search by things other than next version.

> 
> Additionally, may I suggest (and please shoot me down here if I sound
> cheeky) that we make it a priority in the next development drive, to
> harness the issues which are marked as patch submitted? It seems to be
> a waste for such issues to be stagnating. I am conscious that this
> comment may sound wide of me, this is not the intention, I do think
> however that it would be nice to work our way towards Nucth releases
> in a more strategic manner than we have been doing. Hopefully this
> proposal is a step in the right direction.

+50. That was one of my keys to success when I had more time. I would look
for issues sitting with patches and just commit them. If I can wrangle some 
Nutch
time over Christmas, I'll do a bunch of this as well. :)

> 
> Thanks for any feedback. The issue at the top I suppose is the most
> important one in the short term.

Cheers my friend.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.6 Release Candidate

2012-11-29 Thread Mattmann, Chris A (388J)
Thanks guys.

I should review this today.

Cheers,
Chris

On Nov 29, 2012, at 5:31 AM, Lewis John Mcgibbney wrote:

> Hi,
> 
> On Wed, Nov 28, 2012 at 10:11 AM, Julien Nioche
>  wrote:
> 
>>   - CHANGES.txt contains dates in both MM/DD/ and DD/MM/ formats.
>>   Shall we write the month in text form e.g. 7th July 2012 from now on?
> 
> Done
> 
>>   - Don't we need to have signatures as part of the RC?
>> 
> 
> Done, thanks for the attention to detail Julien.
> 
> Best
> 
> Lewis



Re: [DISCUSS] trunk release?

2012-11-22 Thread Mattmann, Chris A (388J)
"Release early, release often" :)

I'd say I'd be happy to try and spin it, but you'd beat me to it so I just 
will say I'll be happy to test the RC and voice my VOTE when you roll
it Lewis :)

Happy Thanksgiving (even though you're not in the States yet)!

Cheers,
Chris

On Nov 22, 2012, at 7:15 AM, Lewis John Mcgibbney wrote:

> Hi All,
> 
> A while ago I asked if it was time to get another release of trunk...
> Markus expressed the valid opinion that there were some issues with
> recently committed material that had maybe not been given the chance
> to mature enough and that could do with more testing.
> 
> So far in trunk (since 1.5.1), we've resolved some 45 issues [0], but
> we have some critical issues open [1] which could do with some
> attention as well.
> None of these issues are mine therefore I don't know how those of us
> feel (with patches available) about integrating these issues
> prior/post 1.6 release... or indeed whether a 1.6 release is welcomed
> at the moment? The codebase seems to be stable and getting better so
> from my perspective I would back a 1.X release.
> 
> All the best for now
> 
> Lewis
> 
> [0] http://tinyurl.com/cf3vcpr
> [1] http://tinyurl.com/d4omnrc
> 
> -- 
> Lewis



Re: [ANNOUNCE] Apache Nutch 2.1 Released

2012-10-05 Thread Mattmann, Chris A (388J)
Great job everyone!

Cheers,
Chris

On Oct 5, 2012, at 9:29 AM, Julien Nioche wrote:

> Thanks Lewis and well done everyone!
> Enjoy your week end
> 
> Julien
> 
> On 5 October 2012 16:12, lewis john mcgibbney  wrote:
> Good Afternoon Everyone,
> 
> The Apache Nutch PMC are very pleased to announce the release of
> Apache Nutch v2.1. This release continues to provide Nutch users with
> a simplified Nutch distribution building on the 2.x development drive
> which is growing in popularity amongst the community. As well as
> addressing ~20 bugs this release also offers improved properties for
> better Solr configuration, upgrades to various Gora dependencies and
> the introduction of the option to build indexes in elastic search,
> amongst various others.
> 
> A full PMC Announcement can be seen here [0]
> 
> Thanks you, have a great weekend on behalf of the Nutch community.
> 
> Lewis
> 
> [0] http://nutch.apache.org/#05+October+2012+-+Apache+Nutch+v2.1+Released
> 
> 
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [PING] [VOTE] Apache Nutch 2.1 Release Candidate Available

2012-10-04 Thread Mattmann, Chris A (388J)
Thanks for your VOTE!

Cheers,
Chris

On Oct 4, 2012, at 1:08 AM, 
  wrote:

> A bit late but my two cents. I have done a couple of installs on Ubuntu 12.04 
> using MySQL for the backend and have noticed a couple of the improvements and 
> no regressions so +1 for releasing from my end.
> 
> -Original Message-
> From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com] 
> Sent: Monday, October 01, 2012 9:18 PM
> To: dev@nutch.apache.org; u...@nutch.apache.org
> Subject: [PING] [VOTE] Apache Nutch 2.1 Release Candidate Available
> 
> Hi All,
> 
> Anyone else for this VOTE?
> 
> Sorry to be a pest!
> 
> Thanks
> 
> Lewis
> 
> On Fri, Sep 21, 2012 at 4:07 PM, Lewis John Mcgibbney 
>  wrote:
>> Hi Everyone,
>> 
>> A candidate for Apache Nutch 2.1 is available at:
>> 
>> http://people.apache.org/~lewismc/apache-nutch-2.1
>> 
>> The release candidate is a src.zip and src.tar.gz ONLY archive of the 
>> sources in:
>> 
>> http://svn.apache.org/repos/asf/nutch/tags/release-2.1/
>> 
>> We release Nutch 2.1 in this fashion due to the inclusion of Apache 
>> Gora and the likelihood that users will regularly recompile the code 
>> to suit dynamic requirements.
>> 
>> Further, a staged Maven repository of the 2.1 jar, sources.jar and 
>> javadoc.jar is available here:
>> 
>> https://repository.apache.org/content/repositories/orgapachenutch-020/
>> 
>> Please vote on releasing this package as Apache Nutch 2.1.
>> The vote is open for the next 72 hours and passes if a majority of at 
>> least three +1 Nutch PMC votes are cast.
>> 
>> [ ] +1 Release this package as Apache Nutch 2.1  [ ] -1 Do not 
>> release this package because...
>> 
>> Many Thanks and heres to plenty more.
>> 
>> Kind Regards,
>> Lewis
>> 
>> P.S. Here's my +1.
>> 
>> --
>> Lewis
> 
> 
> 
> --
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 2.1 Release Candidate Available

2012-10-01 Thread Mattmann, Chris A (388J)
+1 from me:

SIGS check out:

[chipotle:~/tmp/apache-nutch-2.1] mattmann% $HOME/bin/verify_gpg_sigs 
Verifying Signature for file apache-nutch-2.1-src.tar.gz.asc
gpg: Signature made Fri Sep 21 15:59:21 2012 BST using RSA key ID C601BCA7
gpg: Good signature from "Lewis John McGibbney (CODE SIGNING KEY) 
"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: 2A23 D53F 8D27 5CB6 91E1  89C1 F45E 7970 C601 BCA7
Verifying Signature for file apache-nutch-2.1-src.zip.asc
gpg: Signature made Fri Sep 21 15:59:42 2012 BST using RSA key ID C601BCA7
gpg: Good signature from "Lewis John McGibbney (CODE SIGNING KEY) 
"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: 2A23 D53F 8D27 5CB6 91E1  89C1 F45E 7970 C601 BCA7
[chipotle:~/tmp/apache-nutch-2.1] mattmann% 

MD5s check out:

[chipotle:~/tmp/apache-nutch-2.1] mattmann% $HOME/bin/verify_md5_checksums 
md5sum: stat '*.bz2': No such file or directory
apache-nutch-2.1-src.tar.gz: OK
apache-nutch-2.1-src.zip: OK
[chipotle:~/tmp/apache-nutch-2.1] mattmann% 

I built the code using ant runtime and it checked out fine:

...snip

runtime:
[mkdir] Created dir: 
/Users/mattmann/tmp/apache-nutch-2.1/apache-nutch-2.1/runtime
[mkdir] Created dir: 
/Users/mattmann/tmp/apache-nutch-2.1/apache-nutch-2.1/runtime/local
[mkdir] Created dir: 
/Users/mattmann/tmp/apache-nutch-2.1/apache-nutch-2.1/runtime/deploy
 [copy] Copying 1 file to 
/Users/mattmann/tmp/apache-nutch-2.1/apache-nutch-2.1/runtime/deploy
 [copy] Copying 1 file to 
/Users/mattmann/tmp/apache-nutch-2.1/apache-nutch-2.1/runtime/deploy/bin
 [copy] Copying 1 file to 
/Users/mattmann/tmp/apache-nutch-2.1/apache-nutch-2.1/runtime/local/lib
 [copy] Copying 1 file to 
/Users/mattmann/tmp/apache-nutch-2.1/apache-nutch-2.1/runtime/local/lib/native
 [copy] Copying 26 files to 
/Users/mattmann/tmp/apache-nutch-2.1/apache-nutch-2.1/runtime/local/conf
 [copy] Copying 1 file to 
/Users/mattmann/tmp/apache-nutch-2.1/apache-nutch-2.1/runtime/local/bin
 [copy] Copying 89 files to 
/Users/mattmann/tmp/apache-nutch-2.1/apache-nutch-2.1/runtime/local/lib
 [copy] Copying 97 files to 
/Users/mattmann/tmp/apache-nutch-2.1/apache-nutch-2.1/runtime/local/plugins
 [copy] Copied 2 empty directories to 2 empty directories under 
/Users/mattmann/tmp/apache-nutch-2.1/apache-nutch-2.1/runtime/local/test

BUILD SUCCESSFUL
Total time: 1 minute 24 seconds
[chipotle:~/tmp/apache-nutch-2.1/apache-nutch-2.1] mattmann% 

Looks great and great work!

Cheers,
Chris


On Sep 21, 2012, at 4:07 PM, Lewis John Mcgibbney wrote:

> Hi Everyone,
> 
> A candidate for Apache Nutch 2.1 is available at:
> 
> http://people.apache.org/~lewismc/apache-nutch-2.1
> 
> The release candidate is a src.zip and src.tar.gz ONLY
> archive of the sources in:
> 
> http://svn.apache.org/repos/asf/nutch/tags/release-2.1/
> 
> We release Nutch 2.1 in this fashion due to the inclusion of
> Apache Gora and the likelihood that users will regularly recompile
> the code to suit dynamic requirements.
> 
> Further, a staged Maven repository of the 2.1 jar, sources.jar and
> javadoc.jar is available here:
> 
> https://repository.apache.org/content/repositories/orgapachenutch-020/
> 
> Please vote on releasing this package as Apache Nutch 2.1.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Nutch PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Nutch 2.1
> [ ] -1 Do not release this package because...
> 
> Many Thanks and heres to plenty more.
> 
> Kind Regards,
> Lewis
> 
> P.S. Here's my +1.
> 
> -- 
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Status of 2.1 release

2012-09-21 Thread Mattmann, Chris A (388J)
Take care dude! I'll give trunk a shot...

Cheers,
Chris

On Sep 21, 2012, at 7:34 AM, Lewis John Mcgibbney wrote:

> Hi All,
> 
> Basically thank god it was brought to our attention that
> giora-cassandra 0.2.1 is buggy and needs some work before it is ready
> to be integrated into a stable Nutch 2.x release.
> 
> For the time being I've committed a revert for gora-cassandra v0.2 to
> the 2.1 branch and to 2.x branch (the latter of which can continue
> development regardless).
> 
> I'll run the RC for 2.1 just now.
> 
> @Markus,
> How are your thoughts on trunk?
> 
> @Chris,
> 
> Depending on outcome of discussion on trunk, do you want to spin an RC?
> 
> Have a great weekend everyone.
> 
> Lewis
> 
> -- 
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: svn commit: r1387363 - in /nutch/branches/2.1: CHANGES.txt build.xml pom.xml

2012-09-18 Thread Mattmann, Chris A (388J)
Lewis you beat me to it, you ROCK!

Cheers,
Chris

On Sep 18, 2012, at 5:11 PM, 
  wrote:

> Author: lewismc
> Date: Tue Sep 18 21:11:06 2012
> New Revision: 1387363
> 
> URL: http://svn.apache.org/viewvc?rev=1387363&view=rev
> Log:
> forward port of NUTCH-1415
> 
> Modified:
>nutch/branches/2.1/CHANGES.txt
>nutch/branches/2.1/build.xml
>nutch/branches/2.1/pom.xml
> 
> Modified: nutch/branches/2.1/CHANGES.txt
> URL: 
> http://svn.apache.org/viewvc/nutch/branches/2.1/CHANGES.txt?rev=1387363&r1=1387362&r2=1387363&view=diff
> ==
> --- nutch/branches/2.1/CHANGES.txt (original)
> +++ nutch/branches/2.1/CHANGES.txt Tue Sep 18 21:11:06 2012
> @@ -3,6 +3,8 @@ Nutch Change Log
> Release 2.1 (19/09/2012) ddmm
> Full Jira Report - 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=10680&version=12321040
> 
> +* NUTCH-1415 release packages to contain top level folder apache-nutch-x.x 
> (snagel)
> +
> * NUTCH-1432 property storage.schema does not work anymore, should be 
> storage.schema.webpage and storage.schema.host (lewismc)
> 
> * NUTCH-1468 Redirects that are external links not adhering to 
> db.ignore.external.links (Matt MacDonald via ferdy)
> 
> Modified: nutch/branches/2.1/build.xml
> URL: 
> http://svn.apache.org/viewvc/nutch/branches/2.1/build.xml?rev=1387363&r1=1387362&r2=1387363&view=diff
> ==
> --- nutch/branches/2.1/build.xml (original)
> +++ nutch/branches/2.1/build.xml Tue Sep 18 21:11:06 2012
> @@ -700,14 +700,13 @@
>   
>   
>  -  destfile="${src.dist.version.dir}.tar.gz" 
> basedir="${src.dist.version.dir}">
> -  
> - 
> - 
> -
> +  destfile="${src.dist.version.dir}.tar.gz">
> +   prefix="${final.name}">
> +
> +
>   
> -  
> -
> +   prefix="${final.name}">
> +
>   
> 
>   
> @@ -717,13 +716,13 @@
>   
>   
>  -  destfile="${bin.dist.version.dir}.tar.gz" 
> basedir="${bin.dist.version.dir}">
> -  
> - 
> -
> +  destfile="${bin.dist.version.dir}.tar.gz">
> +   prefix="${final.name}">
> +
> +
>   
> -  
> -
> +   prefix="${final.name}">
> +
>   
> 
>   
> @@ -733,14 +732,13 @@
>   
>   
> -   destfile="${src.dist.version.dir}.zip" basedir="${src.dist.version.dir}">
> -   
> -   
> -   
> -   
> + destfile="${src.dist.version.dir}.zip">
> +prefix="${final.name}">
> +   
> +   
>
> -   
> -   
> +prefix="${final.name}">
> +   
>
>
>   
> @@ -750,13 +748,13 @@
>   
>   
> -   destfile="${bin.dist.version.dir}.zip" basedir="${bin.dist.version.dir}">
> -   
> -   
> -   
> + destfile="${bin.dist.version.dir}.zip">
> +prefix="${final.name}">
> +   
> +   
>
> -   
> -   
> +prefix="${final.name}">
> +   
>
>
>   
> 
> Modified: nutch/branches/2.1/pom.xml
> URL: 
> http://svn.apache.org/viewvc/nutch/branches/2.1/pom.xml?rev=1387363&r1=1387362&r2=1387363&view=diff
> ==
> --- nutch/branches/2.1/pom.xml (original)
> +++ nutch/branches/2.1/pom.xml Tue Sep 18 21:11:06 2012
> @@ -22,7 +22,7 @@
>   org.apache.nutch
>   nutch
>   jar
> -  2.0
> +  2.1
>   Apache Nutch
>   http://nutch.apache.org
>   
> @@ -109,6 +109,12 @@
> 
> 
> 
> +org.elasticsearch
> +elasticsearch
> +0.19.4
> +true
> +
> +
> org.apache.solr
> solr-solrj
> 3.4.0
> @@ -165,7 +171,7 @@
> 
> org.apache.gora
> gora-core
> -0.2
> +0.2.1
> true
> 
> 
> 
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Nutch 2.1 Release???

2012-09-15 Thread Mattmann, Chris A (388J)
Awesome Lewis. I'll try and roll a 2.1 RC by mid next week if no one
beats me to it.

Cheers,
Chris

On Sep 15, 2012, at 2:18 PM, Lewis John Mcgibbney wrote:

> Actually when I look at it now we're at nearly 30 tickets for trunk as well.
> 
> Up to you guys
> 
> @Chris
> Nice one. Fire in my friend. If you can do RM role it would be great.
> 
> Best
> 
> Lewis
> 
> On Sat, Sep 15, 2012 at 6:07 PM, Mattmann, Chris A (388J)
>  wrote:
>> +1 I'd be happy to help!
>> 
>> Cheers,
>> Chris
>> 
>> On Sep 15, 2012, at 9:24 AM, Lewis John Mcgibbney wrote:
>> 
>>> Hi Everyone,
>>> 
>>> Without me slevering on, this suggestion speaks for itself.
>>> 
>>> We have resolved 32 issues, including pulling in upgrades on the Gora
>>> dependency. It would be nice to push these improvements in a stable
>>> release to the Nutch community.
>>> 
>>> Any thoughts.
>>> 
>>> Best
>>> 
>>> Lewis
>>> 
>>> --
>>> Lewis
>> 
>> 
>> ++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattm...@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++
>> 
> 
> 
> 
> -- 
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Nutch 2.1 Release???

2012-09-15 Thread Mattmann, Chris A (388J)
+1 I'd be happy to help!

Cheers,
Chris

On Sep 15, 2012, at 9:24 AM, Lewis John Mcgibbney wrote:

> Hi Everyone,
> 
> Without me slevering on, this suggestion speaks for itself.
> 
> We have resolved 32 issues, including pulling in upgrades on the Gora
> dependency. It would be nice to push these improvements in a stable
> release to the Nutch community.
> 
> Any thoughts.
> 
> Best
> 
> Lewis
> 
> -- 
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Nutch talk accepted at ApacheCon Europe

2012-09-13 Thread Mattmann, Chris A (388J)
Great to hear, Julien, nice!

Cheers,
Chris

On Sep 13, 2012, at 3:39 AM, Julien Nioche wrote:

> Hi, 
> 
> I'd just like to mention that I will be giving a talk about Nutch at the 
> Apache Conference Europe (Sinsheim, Germany 5–8 November 2012). The Apache 
> Conference should be a good opportunity for the Nutch community (committers 
> as well as users) to get together and I hope to see many of you there. Early 
> Birds tickets are available until the 1st October.
> 
> The talk itself will be an overview of Nutch and will be part of the 
> Lucene/SOLR Ecosystem track. If you have an interesting use case using Nutch 
> or have something in particular that you'd like me to talk about, please do 
> get in touch and I'll try to blend that in the presentation.
> 
> I look foward to seeing you in Sinsheim.
> 
> Julien
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Happy 10th Birthday Nutch!

2012-08-22 Thread Mattmann, Chris A (388J)
Awesome, Jerome! I need to get a Nutch hat!

Cheers,
Chris

On Aug 21, 2012, at 3:59 PM, Markus Jelsma wrote:

> Hehehe, nice! 
> 
> Cheers
> 
> -Original message-
>> From:Jérôme Charron 
>> Sent: Tue 21-Aug-2012 23:58
>> To: dev@nutch.apache.org
>> Cc: u...@nutch.apache.org
>> Subject: Re: Happy 10th Birthday Nutch!
>> 
>> Oups! Sorry...
>> These one should be ok : http://statigr.am/p/254365383887354210_4414285 
>> <http://statigr.am/p/254365383887354210_4414285> 
>> ;)
>> 
>> 
>> On Tue, Aug 21, 2012 at 11:40 PM, Markus Jelsma > <mailto:markus.jel...@openindex.io> > wrote:
>> Hi Jérôme,
>> 
>> It asks for a login.
>> 
>> Cheers
>> 
>> 
>> 
>> -Original message-
>>> From:Jérôme Charron >> <mailto:jerome.char...@gmail.com> >
>>> Sent: Tue 21-Aug-2012 22:22
>>> To: u...@nutch.apache.org <mailto:u...@nutch.apache.org> 
>>> Cc: mailto:dev@nutch.apache.org> > 
>>> mailto:dev@nutch.apache.org> >
>>> Subject: Re: Happy 10th Birthday Nutch!
>>> 
>>> My small contribution to Nutch birthday...
>>> http://statigr.am/viewer.php#/detail/254365383887354210_4414285 
>>> <http://statigr.am/viewer.php#/detail/254365383887354210_4414285> 
>>> <http://statigr.am/viewer.php#/detail/254365383887354210_4414285 
>>> <http://statigr.am/viewer.php#/detail/254365383887354210_4414285> >
>>> 
>>> Cheers,
>>> Jérôme
>>> 
>>> On Fri, Aug 10, 2012 at 1:44 AM, Mattmann, Chris A (388J) 
>>> mailto:chris.a.mattm...@jpl.nasa.gov> 
>>> <mailto:chris.a.mattm...@jpl.nasa.gov 
>>> <mailto:chris.a.mattm...@jpl.nasa.gov> > > wrote:
>>> Super cool. Proud to have been around since 2005 (7 of them!)
>>> 
>>> :)
>>> 
>>> Cheers,
>>> Chris
>>> 
>>> On Aug 9, 2012, at 1:31 PM, Lewis John Mcgibbney wrote:
>>> 
>>>> Nice one Julien
>>>> 
>>>> I'm going to update the site with this as its a pretty huge milestone
>>>> @Apache and a lot of projects and current developers owe a lot to the
>>>> great work done by all you guys over the years.
>>>> 
>>>> Thank you for sharing.
>>>> 
>>>> Lewis
>>>> 
>>>> On Thu, Aug 9, 2012 at 8:56 AM, Julien Nioche
>>>> mailto:lists.digitalpeb...@gmail.com> 
>>>> <mailto:lists.digitalpeb...@gmail.com 
>>>> <mailto:lists.digitalpeb...@gmail.com> > > wrote:
>>>>> Doug Cutting on twitter :
>>>>> https://twitter.com/cutting/status/233415059798372353
>>>>> 
>>>>> *RT @StefanGroschupf: Happy 10th birthday#Nutch! Registered at sourceforce
>>>>> august 2002. Turned out to be quite a game changer. #Hadoop
>>>>> *
>>>>> Happy birthday Nutch and thanks to all contributors past and present!
>>>>> 
>>>>> Julien
>>>>> 
>>>>> --
>>>>> 
>>>>> Open Source Solutions for Text Engineering
>>>>> 
>>>>> http://digitalpebble.blogspot.com/ <http://digitalpebble.blogspot.com/> 
>>>>> <http://digitalpebble.blogspot.com/ <http://digitalpebble.blogspot.com/> >
>>>>> http://www.digitalpebble.com <http://www.digitalpebble.com> 
>>>>> <http://www.digitalpebble.com <http://www.digitalpebble.com> >
>>>>> http://twitter.com/digitalpebble <http://twitter.com/digitalpebble> 
>>>>> <http://twitter.com/digitalpebble <http://twitter.com/digitalpebble> >
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Lewis
>>> 
>>> 
>>> ++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: chris.a.mattm...@nasa.gov <mailto:chris.a.mattm...@nasa.gov> 
>>> <mailto:chris.a.mattm...@nasa.gov <mailto:chris.a.mattm...@nasa.gov> >
>>> WWW:   http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/> 
>>> <http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/> >
>>> ++
>>> Adjunct Assistant 

Re: Happy 10th Birthday Nutch!

2012-08-09 Thread Mattmann, Chris A (388J)
Super cool. Proud to have been around since 2005 (7 of them!)

:)

Cheers,
Chris

On Aug 9, 2012, at 1:31 PM, Lewis John Mcgibbney wrote:

> Nice one Julien
> 
> I'm going to update the site with this as its a pretty huge milestone
> @Apache and a lot of projects and current developers owe a lot to the
> great work done by all you guys over the years.
> 
> Thank you for sharing.
> 
> Lewis
> 
> On Thu, Aug 9, 2012 at 8:56 AM, Julien Nioche
>  wrote:
>> Doug Cutting on twitter :
>> https://twitter.com/cutting/status/233415059798372353
>> 
>> *RT @StefanGroschupf: Happy 10th birthday#Nutch! Registered at sourceforce
>> august 2002. Turned out to be quite a game changer. #Hadoop
>> *
>> Happy birthday Nutch and thanks to all contributors past and present!
>> 
>> Julien
>> 
>> --
>> 
>> Open Source Solutions for Text Engineering
>> 
>> http://digitalpebble.blogspot.com/
>> http://www.digitalpebble.com
>> http://twitter.com/digitalpebble
> 
> 
> 
> -- 
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Fwd: Call for Papers for ApacheCon Europe 2012 now open!

2012-07-19 Thread Mattmann, Chris A (388J)
FYI...

Begin forwarded message:

> From: Nick Burch 
> Date: July 19, 2012 1:14:57 PM CDT
> To: 
> Subject: Call for Papers for ApacheCon Europe 2012 now open!
> Reply-To: 
> 
> Hi All
> 
> We're pleased to announce that the Call for Papers for ApacheCon Europe 2012 
> is finally open!
> 
> (For those who don't already know, ApacheCon Europe will be taking place 
> between the 5th and the 9th of November this year, in Sinsheim, Germany.)
> 
> If you'd like to submit a talk proposal, please visit the conference website 
> at  and sign up for a new account. Once you've 
> signed up, use your dashboard to enter your speaker bio, then submit your 
> talk proposal(s). There's more information on the CFP page on the conference 
> website.
> 
> We welcome talk proposals from all projects, from right across the bredth of 
> projects at the foundation! To make things easier for talk selection and 
> scheduling, we'd ask that you tag your proposal with the track that it most 
> closely fits within. The details of the tracks, and what projects they expect 
> to cover, are available at .
> 
> (If your project/group of projects was intending to submit a track, and 
> missed the deadline, then please get in touch with us on 
>  straight away, so we can work out if it's 
> possible to squeeze you in...)
> 
> The CFP will close on Friday 3rd August, so you've a little over weeks to 
> send in your talk proposal. Don't put it off! We'll look forward to seeing 
> some great ones shortly!
> 
> Thanks
> Nick
> (On behalf of the Conferences committee)


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Apache Nutch being used at National Snow and Ice Data Center: ESIP Federation

2012-07-18 Thread Mattmann, Chris A (388J)
Hi Ian,

On Jul 18, 2012, at 10:01 AM, Ian Truslove wrote:

> Chris: message received - I signed up :)

Thanks for doing this!

> 
> As part of Ruth's Libre project (http://nsidc.org/libre/) we are using
> Nutch to find various types of XML data.  We're targeting our search at
> geospatial data, and more specifically cryospheric data, but the tools
> will remain more broadly applicable.  Specifically we are looking for ESIP
> data casts, collection casts, service casts, and ESIP Discovery OpenSearch
> services (all the specs are in
> http://wiki.esipfed.org/index.php/Discovery_Cluster).  These XML documents
> and services are characterizable through fairly simple means such as XML
> namespaces.
> 
> We are currently developing against the Nutch 1.4 tarball distribution
> (SVN HEAD was moving quicker than our configuration could keep up with)
> and plugging into a standalone Solr instance.
> 
> What we have done to date is do some basic configuration work, set the
> code up to play nice(-ish) with Eclipse, our internal SVN, and our
> CI/deployment system, and write some plugins to help us find our various
> XML docs.  We wrote a pair to extract and index the full raw XML content
> of the source document, extending the HtmlParseFilter and IndexingFilter
> respectively.  XML (and of course HTML too) are just wrapped within a
> CDATA section (and CDATA sections within the document are just removed),
> and indexed as a big text blob in Solr.  We can do naive text matching and
> are having success extracting the URLs of the data feeds we're after.
> 
> We also wrote a pair of plugins to keep track of the original index date
> of a document (the overarching use case is to determine documents that are
> newly found).  We used the ScoringFilter and IndexingFilter for those.
> 
> Planned work includes extracting data from the XML before indexing and
> using Solr fields more effectively, indexing GCMD keywords, simple spatial
> subsetting, and tweaking the ranking algorithms to do a broad search to
> identify good sites for deep data searches.
> 
> Thanks for the interest - it's been a fun project to work on so far, and
> I'm sure we'd be happy to talk more or provide more details.

Super awesome! 

Well if you get around to it, feel free to:

1. file JIRA issues at our JIRA issue tracker 
https://issues.apache.org/jira/browse/NUTCH identifying, as incrementally and 
as easily revertible and small as possible your changes.
2. create patch files and attach them to our JIRA issue tracker for the issues 
that you create in #1
3. work with a committer here in Nutch to get your patches contributed. Usually 
having unit tests, code that conforms to the rest of the Nutch style (e.g., no 
tabs, etc.), are all good helpers. Doug Cutting used to say if he could apply a 
few of your patches without modification, then you are well on the track 
towards getting your code included in the project.

Thanks much! Any questions, let me or any of the rest of the Nutch devs that 
hang out here know.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Apache Nutch being used at National Snow and Ice Data Center: ESIP Federation

2012-07-17 Thread Mattmann, Chris A (388J)
Hi Markus,

Great question. I am CC'ing Ruth Duerr and Ian Truslove and Ruth Duerr at NSIDC 
-- maybe they
can provide more information?

Ruth, ian, please consider subcribing to dev@nutch.apache.org and/or 
u...@nutch.apache.org
by sending blank emails to:

dev-subscr...@nutch.apache.org
user-subscr...@nutch.apache.org

To follow along in the conversation.

Thanks all!

Cheers,
Chris

On Jul 17, 2012, at 5:27 PM, Markus Jelsma wrote:

> Cool!
> 
> What are they exactly doing with Apache Nutch? And, more interesting, what 
> non-standard stuff do they use?
> 
> Cheers
> 
> -Original message-
>> From:Mattmann, Chris A (388J) 
>> Sent: Tue 17-Jul-2012 21:29
>> To: dev@nutch.apache.org
>> Subject: Apache Nutch being used at National Snow and Ice Data Center: ESIP 
>> Federation
>> 
>> Hey Folks,
>> 
>> Ruth Duerr is presenting at today's ESIP Federation and Discovery Hackathon:
>> 
>> http://commons.esipfed.org/node/424
>> 
>> The U.S. National Snow and Ice Data Center (NSIDC) is deploying Apache Nutch 
>> and 
>> Solr to support discovery of datasets (called "casting").
>> 
>> Really interesting stuff, and worth contacting Ruth and NSIDC if you're 
>> interested.
>> I'm highly suggesting to to the NSIDC folks to try and contribute any 
>> updates or plugins
>> they are making to the software upstream here to the ASF.
>> 
>> Thanks!
>> 
>> Cheers,
>> Chris
>> 
>> ++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattm...@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++
>> 
>> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Apache Nutch being used at National Snow and Ice Data Center: ESIP Federation

2012-07-17 Thread Mattmann, Chris A (388J)
Hey Folks,

Ruth Duerr is presenting at today's ESIP Federation and Discovery Hackathon:

http://commons.esipfed.org/node/424

The U.S. National Snow and Ice Data Center (NSIDC) is deploying Apache Nutch 
and 
Solr to support discovery of datasets (called "casting").

Really interesting stuff, and worth contacting Ruth and NSIDC if you're 
interested.
I'm highly suggesting to to the NSIDC folks to try and contribute any updates 
or plugins
they are making to the software upstream here to the ASF.

Thanks!

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [DONE] Renamed branch nutchgora into 2.x

2012-07-10 Thread Mattmann, Chris A (388J)
Thanks Juls!

Cheers,
Chris

On Jul 10, 2012, at 1:50 AM, Julien Nioche wrote:

> Guys,
> 
> The nutchgora branch is now called 2.x. Run the following command to update
> your copy of the SVN repo
> 
> * svn switch https://svn.apache.org/repos/asf/nutch/branches/2.x*
> 
> or create a brand new copy with
> 
> *svn co https://svn.apache.org/repos/asf/nutch/branches/2.x nutch-2.x*
> 
> Thanks
> 
> Julien
> 
> 
> On 9 July 2012 20:12, Sebastian Nagel  wrote:
> 
>> +1 (it's just a name, mainly in svn and jira)
>> 
>> Sebastian
>> 
>> On 07/09/2012 12:37 PM, Julien Nioche wrote:
>>> Guys,
>>> 
>>> Now that we've released 2.0, wouldn't it be better to rename the
>>> 'nutchgora' branch into something like 'branch-2.x'? Any thoughts on
>> this?
>>> 
>>> Julien
>>> 
>> 
>> 
>> 
> 
> 
> -- 
> *
> *Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [ANNOUNCEMENT] Apache Nutch v1.5.1 Released

2012-07-10 Thread Mattmann, Chris A (388J)
Congrats, all!

Cheers,
Chris

On Jul 10, 2012, at 8:03 AM, Julien Nioche wrote:

> Great Job Lewis! Thanks a lot
> 
> On 10 July 2012 15:40, lewis john mcgibbney  wrote:
> Good Afternoon Everyone,
> 
> The Apache Nutch PMC are very pleased to announce the release of
> Apache Nutch v1.5.1. This release is a maintenance release of the
> popular mainstream
> 1.5.X series of the Apache Nutch web search software project.
> 
> Please see the list of changes
> 
> http://www.apache.org/dist/nutch/1.5.1/CHANGES.txt
> 
> made in this version for a full breakdown.. A full PMC release
> statement can be found below
> 
> http://nutch.apache.org/#10+July+2012+-+Apache+Nutch+v1.5.1+Released
> 
> Nutch v1.5.1 is available in source and binary (zip and tar.gz) from the
> following download page: http://www.apache.org/dyn/closer.cgi/nutch/1.5.1
> 
> When downloading from a mirror site, please remember to verify the
> downloads using signatures found on the Apache site:
> 
> http://www.apache.org/dist/nutch/KEYS
> 
> For more information on Apache Nutch, visit the project home page:
> http://nutch.apache.org
> 
> Thank you very much
> 
> Lewis John McGibbney (on behalf of the Apache Nutch community)
> 
> 
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [PROPOSAL] Rename branch nutchgora into 2.x

2012-07-09 Thread Mattmann, Chris A (388J)
+1 from me.

Cheers,
Chris

On Jul 9, 2012, at 3:37 AM, Julien Nioche wrote:

> Guys, 
> 
> Now that we've released 2.0, wouldn't it be better to rename the 'nutchgora' 
> branch into something like 'branch-2.x'? Any thoughts on this?
> 
> Julien
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-07 Thread Mattmann, Chris A (388J)
Thanks for your hard work here, Lewis!

Cheers,
Chris

On Jul 7, 2012, at 3:44 PM, Lewis John Mcgibbney wrote:

> Hi Julien,
> 
> Believe it or not I've just spent around 45 mins waiting on committing
> the site... broadband in Paris is nothing short of utterly abysmal to
> say the very best. Please see my comments below
> 
> On Sat, Jul 7, 2012 at 9:58 PM, Julien Nioche
>  wrote:
>> Looks like you've released 2.0. If so can you make an announcement to the
>> mailing list + update the website.
> 
> Done
> 
>> It's not really something that should go
>> unnoticed. I know about the press release but surely it does not mean that
>> NOTHING should be said about the release then.
> 
> Quite right.
> 
>> 
>> I see a 1.5 on a mirror (http://apache.mirrors.timporter.net/nutch/) with
>> the same release date as 2.0. Shouldn't it be 1.5.1? Can you please clarify?
> 
> This relates to the message on private@ the other night and concerns
> the rearranging (cleaning up) of the dist/nutch directory on
> people.apache.org to accommodate the additional 2.0 directory. The 1.5
> artifacts are identical to the ones we VOTE'd on, same goes with
> 2.0's. The mirror will confusingly display that these have been
> mirrored at the same time, which of course is the case, but they were
> certainly not released in parallel.
> 
> OK so now concerning 1.5.1, we have still to VOTE on the rc#3 so I've
> gently put out a ping for this on dev@ and user@
> 
> I hope this answers all and I can only really apologise and say thanks
> to everyone who has made time and effort to VOTE over the last few
> months. There has been a very encouraging amount of work done within
> the dev community and it's been very rewarding to see us getting Nutch
> moving at a really steady pace.
> 
> All for now
> 
> Have a great weekend
> 
> Lewis
> 
> -- 
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.5.1 RC#3

2012-07-07 Thread Mattmann, Chris A (388J)
Hi Lewis,

+1 from me!

SIGS check out:

[chipotle:~/tmp/nutch-1.5.1] mattmann% $HOME/bin/verify_md5_checksums 
md5sum: stat '*.bz2': No such file or directory
apache-nutch-1.5.1-bin.tar.gz: OK
apache-nutch-1.5.1-src.tar.gz: OK
apache-nutch-1.5.1-bin.zip: OK
apache-nutch-1.5.1-src.zip: OK

checksums check out:

[chipotle:~/tmp/nutch-1.5.1] mattmann% $HOME/bin/verify_gpg_sigs 
Verifying Signature for file apache-nutch-1.5.1-bin.tar.gz.asc
gpg: Signature made Tue Jul  3 11:31:31 2012 PDT using RSA key ID C601BCA7
gpg: Good signature from "Lewis John McGibbney (CODE SIGNING KEY) 
"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: 2A23 D53F 8D27 5CB6 91E1  89C1 F45E 7970 C601 BCA7
Verifying Signature for file apache-nutch-1.5.1-bin.zip.asc
gpg: Signature made Tue Jul  3 11:32:16 2012 PDT using RSA key ID C601BCA7
gpg: Good signature from "Lewis John McGibbney (CODE SIGNING KEY) 
"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: 2A23 D53F 8D27 5CB6 91E1  89C1 F45E 7970 C601 BCA7
Verifying Signature for file apache-nutch-1.5.1-src.tar.gz.asc
gpg: Signature made Tue Jul  3 11:31:58 2012 PDT using RSA key ID C601BCA7
gpg: Good signature from "Lewis John McGibbney (CODE SIGNING KEY) 
"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: 2A23 D53F 8D27 5CB6 91E1  89C1 F45E 7970 C601 BCA7
Verifying Signature for file apache-nutch-1.5.1-src.zip.asc
gpg: Signature made Tue Jul  3 11:32:33 2012 PDT using RSA key ID C601BCA7
gpg: Good signature from "Lewis John McGibbney (CODE SIGNING KEY) 
"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: 2A23 D53F 8D27 5CB6 91E1  89C1 F45E 7970 C601 BCA7
[chipotle:~/tmp/nutch-1.5.1] mattmann% 

Builds fine!


runtime:
[mkdir] Created dir: 
/Users/mattmann/tmp/nutch-1.5.1/apache-nutch-1.5.1/runtime
[mkdir] Created dir: 
/Users/mattmann/tmp/nutch-1.5.1/apache-nutch-1.5.1/runtime/local
[mkdir] Created dir: 
/Users/mattmann/tmp/nutch-1.5.1/apache-nutch-1.5.1/runtime/deploy
 [copy] Copying 1 file to 
/Users/mattmann/tmp/nutch-1.5.1/apache-nutch-1.5.1/runtime/deploy
 [copy] Copying 1 file to 
/Users/mattmann/tmp/nutch-1.5.1/apache-nutch-1.5.1/runtime/deploy/bin
 [copy] Copying 1 file to 
/Users/mattmann/tmp/nutch-1.5.1/apache-nutch-1.5.1/runtime/local/lib
 [copy] Copying 1 file to 
/Users/mattmann/tmp/nutch-1.5.1/apache-nutch-1.5.1/runtime/local/lib/native
 [copy] Copying 21 files to 
/Users/mattmann/tmp/nutch-1.5.1/apache-nutch-1.5.1/runtime/local/conf
 [copy] Copying 1 file to 
/Users/mattmann/tmp/nutch-1.5.1/apache-nutch-1.5.1/runtime/local/bin
 [copy] Copying 48 files to 
/Users/mattmann/tmp/nutch-1.5.1/apache-nutch-1.5.1/runtime/local/lib
 [copy] Copying 123 files to 
/Users/mattmann/tmp/nutch-1.5.1/apache-nutch-1.5.1/runtime/local/plugins
 [copy] Copied 2 empty directories to 2 empty directories under 
/Users/mattmann/tmp/nutch-1.5.1/apache-nutch-1.5.1/runtime/local/test

BUILD SUCCESSFUL
Total time: 1 minute 28 seconds
[chipotle:~/tmp/nutch-1.5.1/apache-nutch-1.5.1] mattmann% 

Cheers,
Chris


On Jul 3, 2012, at 11:42 AM, Lewis John Mcgibbney wrote:

> Hi Everyone,
> 
> A candidate for the Apache Nutch 1.5.1 RC#3 is available at:
> 
> http://people.apache.org/~lewismc/apache-nutch-1.5.1-rc3
> 
> The release candidate is a src.zip, src.tar.gz, bin-zip and bin-tar.gz
> archive of the sources in:
> 
> http://svn.apache.org/repos/asf/nutch/tags/release-1.5.1-rc3/
> 
> This Release Candidate (and subsequent release) is a bug fix of the
> recently released Apache Nutch 1.5 and CHANGES.txt can be seen below
> 
> http://people.apache.org/~lewismc/apache-nutch-1.5.1-rc3/CHANGES.txt
> 
> Further, a staged Maven repository of the 1.5.1 jar, sources.jar and
> javadoc.jar is available here:
> 
> https://repository.apache.org/content/repositories/orgapachenutch-023
> 
> Please vote on releasing this package as Apache Nutch 1.5.1.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Nutch PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Nutch 1.5.1
> [ ] -1 Do not release this package because...
> 
> Many Thanks and heres to plenty more.
> 
> Kind Regards,
> Lewis
> 
> P.S. Here's my +1.
> 
> -- 
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
+++

Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-06 Thread Mattmann, Chris A (388J)
OK, +1 from me :) 

ant runtime works:

job:
  [jar] Building jar: /Users/mattmann/tmp/nutch2/build/apache-nutch-2.0.job

runtime:
[mkdir] Created dir: /Users/mattmann/tmp/nutch2/runtime
[mkdir] Created dir: /Users/mattmann/tmp/nutch2/runtime/local
[mkdir] Created dir: /Users/mattmann/tmp/nutch2/runtime/deploy
 [copy] Copying 1 file to /Users/mattmann/tmp/nutch2/runtime/deploy
 [copy] Copying 1 file to /Users/mattmann/tmp/nutch2/runtime/deploy/bin
 [copy] Copying 1 file to /Users/mattmann/tmp/nutch2/runtime/local/lib
 [copy] Copying 1 file to 
/Users/mattmann/tmp/nutch2/runtime/local/lib/native
 [copy] Copying 25 files to /Users/mattmann/tmp/nutch2/runtime/local/conf
 [copy] Copying 1 file to /Users/mattmann/tmp/nutch2/runtime/local/bin
 [copy] Copying 84 files to /Users/mattmann/tmp/nutch2/runtime/local/lib
 [copy] Copying 97 files to /Users/mattmann/tmp/nutch2/runtime/local/plugins
 [copy] Copied 2 empty directories to 2 empty directories under 
/Users/mattmann/tmp/nutch2/runtime/local/test

BUILD SUCCESSFUL
Total time: 3 minutes 24 seconds
[chipotle:~/tmp/nutch2] mattmann% 

Good enough for me!

Cheers,
Chris

On Jul 3, 2012, at 11:24 AM, Mattmann, Chris A (388J) wrote:

> Hey Lewis,
> 
> I was running ant test -- sorry -- will try ant runtime now (any idea
> what's up with test?)
> 
> Cheers,
> Chris
> 
> On Jul 3, 2012, at 11:11 AM, Lewis John Mcgibbney wrote:
> 
>> What commands are you using?
>> 
>> I just grabbed the src-tar.gz from my local area with wget
>> extracted it to ~/Desktop
>> rm -r ~/.ivy2
>> cd ~/Desktop/$nutch_folder
>> ant runtime
>> 
>> runtime:
>>   [mkdir] Created dir: /home/lewismc/Desktop/nutch/runtime
>>   [mkdir] Created dir: /home/lewismc/Desktop/nutch/runtime/local
>>   [mkdir] Created dir: /home/lewismc/Desktop/nutch/runtime/deploy
>>[copy] Copying 1 file to /home/lewismc/Desktop/nutch/runtime/deploy
>>[copy] Copying 1 file to /home/lewismc/Desktop/nutch/runtime/deploy/bin
>>[copy] Copying 1 file to /home/lewismc/Desktop/nutch/runtime/local/lib
>>[copy] Copying 1 file to
>> /home/lewismc/Desktop/nutch/runtime/local/lib/native
>>[copy] Copying 25 files to /home/lewismc/Desktop/nutch/runtime/local/conf
>>[copy] Copying 1 file to /home/lewismc/Desktop/nutch/runtime/local/bin
>>[copy] Copying 84 files to /home/lewismc/Desktop/nutch/runtime/local/lib
>>[copy] Copying 97 files to
>> /home/lewismc/Desktop/nutch/runtime/local/plugins
>>[copy] Copied 2 empty directories to 2 empty directories under
>> /home/lewismc/Desktop/nutch/runtime/local/test
>> 
>> BUILD SUCCESSFUL
>> Total time: 2 minutes 40 seconds
>> 
>> This is every dependency being down loaded to ivy cache
>> 
>> Lewis
>> 
>> On Tue, Jul 3, 2012 at 5:12 PM, Mattmann, Chris A (388J)
>>  wrote:
>>> Hey Julien,
>>> 
>>> I ran this command: rm -rf /Users/mattmann/.ivy2/
>>> 
>>> But it still failed with the below messages:
>>> 
>>> [ivy:resolve] :: problems summary ::
>>> [ivy:resolve]  WARNINGS
>>> [ivy:resolve]   [FAILED ] 
>>> org.apache.hadoop#hadoop-core;1.0.3!hadoop-core.jar: invalid sha1: 
>>> expected=d7d8610ba4aad504475e568fd3badb412a0beae9 
>>> computed=f8369ff1a71e1a8febbb8e9c3a54ffbb08048f19 (1598ms)
>>> [ivy:resolve]   [FAILED ] 
>>> org.apache.hadoop#hadoop-core;1.0.3!hadoop-core.jar:  (0ms)
>>> [ivy:resolve]    local: tried
>>> [ivy:resolve] 
>>> /Users/mattmann/.ivy2/local/org.apache.hadoop/hadoop-core/1.0.3/jars/hadoop-core.jar
>>> [ivy:resolve]    maven2: tried
>>> [ivy:resolve] 
>>> http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-core/1.0.3/hadoop-core-1.0.3.jar
>>> [ivy:resolve]   [FAILED ] org.hsqldb#hsqldb;2.2.8!hsqldb.jar: 
>>> invalid sha1: expected=8231a3ff71ba5889f9e2d01ce13503cbdd4038e9 
>>> computed=81a7e8d5d1802c7acbc8f8f81d3e4680a4b2441c (523ms)
>>> [ivy:resolve]   [FAILED ] org.hsqldb#hsqldb;2.2.8!hsqldb.jar:  
>>> (0ms)
>>> [ivy:resolve]    local: tried
>>> [ivy:resolve] 
>>> /Users/mattmann/.ivy2/local/org.hsqldb/hsqldb/2.2.8/jars/hsqldb.jar
>>> [ivy:resolve]    maven2: tried
>>> [ivy:resolve] 
>>> http://repo1.maven.org/maven2/org/hsqldb/hsqldb/2.2.8/hsqldb-2.2.8.jar
>>> [ivy:resolve]   [FAILED ] 
>>> org.apache.lucene#lucene-core;3.4.0!lucene-core.jar: invalid sha1: 
>>> expected=4426bf0764ec5fa634abca236b469d2519c74f65 
>>> computed=112d245439

Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-04 Thread Mattmann, Chris A (388J)
Hey Julien,

Well, mainly because the *that* is unknown. Not sure why it's not working for 
me,
it would be nice for it to be working for me :)

That being said if it's working for others and there are at least 3 +1s and 
more 
+1s than my lone -1 then Lewis can surely decide to move forward.

I'll try to test again today.

Cheers,
Chris

On Jul 4, 2012, at 12:50 AM, Julien Nioche wrote:

> Guys, 
> 
> If it is working fine for most people why don't we release and fix that later?
> 
> J
> 
> On 4 July 2012 07:18, Mattmann, Chris A (388J) 
>  wrote:
> Hi Lewis,
> 
> Odd, I don't get that.
> 
> I'll try futzing around again with it tomorrow -- what system are you on? 
> What is
> your Ant version and Java version?
> 
> Cheers,
> Chris
> 
> On Jul 3, 2012, at 11:49 AM, Lewis John Mcgibbney wrote:
> 
> > Hi Chris,
> >
> > I've no clue whats going on locally with you... em I just did
> >
> > ant test
> >
> > and I get
> >
> > copy-generated-lib:
> >
> > test:
> > [echo] Testing plugin: subcollection
> >[junit] Running org.apache.nutch.collection.TestSubcollection
> >[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.305 sec
> >
> > test:
> >
> > BUILD SUCCESSFUL
> > Total time: 12 minutes 28 seconds
> >
> >
> > On Tue, Jul 3, 2012 at 7:24 PM, Mattmann, Chris A (388J)
> >  wrote:
> >> Hey Lewis,
> >>
> >> I was running ant test -- sorry -- will try ant runtime now (any idea
> >> what's up with test?)
> >>
> >> Cheers,
> >> Chris
> 
> 
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
> 
> 
> 
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-04 Thread Mattmann, Chris A (388J)
Thanks Lewis, here are mine:

[chipotle:~/tmp/nutch2/apache-nutch-2.0] mattmann% ant -version
Apache Ant(TM) version 1.8.2 compiled on May 17 2012
[chipotle:~/tmp/nutch2/apache-nutch-2.0] mattmann% java -version
java version "1.6.0_33"
Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-10M3720)
Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03-424, mixed mode)
[chipotle:~/tmp/nutch2/apache-nutch-2.0] mattmann% 

[chipotle:~/tmp/nutch2/apache-nutch-2.0] mattmann% uname -a
Darwin chipotle.local 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun  7 16:32:41 
PDT 2011; root:xnu-1504.15.3~1/RELEASE_X86_64 x86_64
[chipotle:~/tmp/nutch2/apache-nutch-2.0] mattmann% 

I'll try one more time today with a fresh build and see where I get :/

Thanks!

Cheers,
Chris


On Jul 4, 2012, at 3:27 AM, Lewis John Mcgibbney wrote:

> Hi Chris,
> 
> lewismc@lewismc-HP-Mini-110-3100:~$ java -showversion
> java version "1.6.0_25"
> Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
> Java HotSpot(TM) Client VM (build 20.0-b11, mixed mode, sharing)
> 
> lewismc@lewismc-HP-Mini-110-3100:~$ ant -v
> Apache Ant(TM) version 1.8.2 compiled on August 19 2011
> Trying the default build file: build.xml
> Buildfile: build.xml does not exist!
> Build failed
> 
> Lewis
> 
> On Wed, Jul 4, 2012 at 7:18 AM, Mattmann, Chris A (388J)
>  wrote:
>> Hi Lewis,
>> 
>> Odd, I don't get that.
>> 
>> I'll try futzing around again with it tomorrow -- what system are you on? 
>> What is
>> your Ant version and Java version?
>> 
>> Cheers,
>> Chris
>> 
>> On Jul 3, 2012, at 11:49 AM, Lewis John Mcgibbney wrote:
>> 
>>> Hi Chris,
>>> 
>>> I've no clue whats going on locally with you... em I just did
>>> 
>>> ant test
>>> 
>>> and I get
>>> 
>>> copy-generated-lib:
>>> 
>>> test:
>>>[echo] Testing plugin: subcollection
>>>   [junit] Running org.apache.nutch.collection.TestSubcollection
>>>   [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.305 sec
>>> 
>>> test:
>>> 
>>> BUILD SUCCESSFUL
>>> Total time: 12 minutes 28 seconds
>>> 
>>> 
>>> On Tue, Jul 3, 2012 at 7:24 PM, Mattmann, Chris A (388J)
>>>  wrote:
>>>> Hey Lewis,
>>>> 
>>>> I was running ant test -- sorry -- will try ant runtime now (any idea
>>>> what's up with test?)
>>>> 
>>>> Cheers,
>>>> Chris
>> 
>> 
>> ++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattm...@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++
>> 
> 
> 
> 
> -- 
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-03 Thread Mattmann, Chris A (388J)
Hi Lewis,

Odd, I don't get that.

I'll try futzing around again with it tomorrow -- what system are you on? What 
is
your Ant version and Java version?

Cheers,
Chris

On Jul 3, 2012, at 11:49 AM, Lewis John Mcgibbney wrote:

> Hi Chris,
> 
> I've no clue whats going on locally with you... em I just did
> 
> ant test
> 
> and I get
> 
> copy-generated-lib:
> 
> test:
> [echo] Testing plugin: subcollection
>[junit] Running org.apache.nutch.collection.TestSubcollection
>[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.305 sec
> 
> test:
> 
> BUILD SUCCESSFUL
> Total time: 12 minutes 28 seconds
> 
> 
> On Tue, Jul 3, 2012 at 7:24 PM, Mattmann, Chris A (388J)
>  wrote:
>> Hey Lewis,
>> 
>> I was running ant test -- sorry -- will try ant runtime now (any idea
>> what's up with test?)
>> 
>> Cheers,
>> Chris


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-03 Thread Mattmann, Chris A (388J)
Hey Lewis,

I was running ant test -- sorry -- will try ant runtime now (any idea
what's up with test?)

Cheers,
Chris

On Jul 3, 2012, at 11:11 AM, Lewis John Mcgibbney wrote:

> What commands are you using?
> 
> I just grabbed the src-tar.gz from my local area with wget
> extracted it to ~/Desktop
> rm -r ~/.ivy2
> cd ~/Desktop/$nutch_folder
> ant runtime
> 
> runtime:
>[mkdir] Created dir: /home/lewismc/Desktop/nutch/runtime
>[mkdir] Created dir: /home/lewismc/Desktop/nutch/runtime/local
>[mkdir] Created dir: /home/lewismc/Desktop/nutch/runtime/deploy
> [copy] Copying 1 file to /home/lewismc/Desktop/nutch/runtime/deploy
> [copy] Copying 1 file to /home/lewismc/Desktop/nutch/runtime/deploy/bin
> [copy] Copying 1 file to /home/lewismc/Desktop/nutch/runtime/local/lib
> [copy] Copying 1 file to
> /home/lewismc/Desktop/nutch/runtime/local/lib/native
> [copy] Copying 25 files to /home/lewismc/Desktop/nutch/runtime/local/conf
> [copy] Copying 1 file to /home/lewismc/Desktop/nutch/runtime/local/bin
> [copy] Copying 84 files to /home/lewismc/Desktop/nutch/runtime/local/lib
> [copy] Copying 97 files to
> /home/lewismc/Desktop/nutch/runtime/local/plugins
> [copy] Copied 2 empty directories to 2 empty directories under
> /home/lewismc/Desktop/nutch/runtime/local/test
> 
> BUILD SUCCESSFUL
> Total time: 2 minutes 40 seconds
> 
> This is every dependency being down loaded to ivy cache
> 
> Lewis
> 
> On Tue, Jul 3, 2012 at 5:12 PM, Mattmann, Chris A (388J)
>  wrote:
>> Hey Julien,
>> 
>> I ran this command: rm -rf /Users/mattmann/.ivy2/
>> 
>> But it still failed with the below messages:
>> 
>> [ivy:resolve] :: problems summary ::
>> [ivy:resolve]  WARNINGS
>> [ivy:resolve]   [FAILED ] 
>> org.apache.hadoop#hadoop-core;1.0.3!hadoop-core.jar: invalid sha1: 
>> expected=d7d8610ba4aad504475e568fd3badb412a0beae9 
>> computed=f8369ff1a71e1a8febbb8e9c3a54ffbb08048f19 (1598ms)
>> [ivy:resolve]   [FAILED ] 
>> org.apache.hadoop#hadoop-core;1.0.3!hadoop-core.jar:  (0ms)
>> [ivy:resolve]    local: tried
>> [ivy:resolve] 
>> /Users/mattmann/.ivy2/local/org.apache.hadoop/hadoop-core/1.0.3/jars/hadoop-core.jar
>> [ivy:resolve]    maven2: tried
>> [ivy:resolve] 
>> http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-core/1.0.3/hadoop-core-1.0.3.jar
>> [ivy:resolve]   [FAILED ] org.hsqldb#hsqldb;2.2.8!hsqldb.jar: 
>> invalid sha1: expected=8231a3ff71ba5889f9e2d01ce13503cbdd4038e9 
>> computed=81a7e8d5d1802c7acbc8f8f81d3e4680a4b2441c (523ms)
>> [ivy:resolve]   [FAILED ] org.hsqldb#hsqldb;2.2.8!hsqldb.jar:  
>> (0ms)
>> [ivy:resolve]    local: tried
>> [ivy:resolve] 
>> /Users/mattmann/.ivy2/local/org.hsqldb/hsqldb/2.2.8/jars/hsqldb.jar
>> [ivy:resolve]    maven2: tried
>> [ivy:resolve] 
>> http://repo1.maven.org/maven2/org/hsqldb/hsqldb/2.2.8/hsqldb-2.2.8.jar
>> [ivy:resolve]   [FAILED ] 
>> org.apache.lucene#lucene-core;3.4.0!lucene-core.jar: invalid sha1: 
>> expected=4426bf0764ec5fa634abca236b469d2519c74f65 
>> computed=112d2454390cba8c7c35b34b8f7a821c6cec3f73 (775ms)
>> [ivy:resolve]   [FAILED ] 
>> org.apache.lucene#lucene-core;3.4.0!lucene-core.jar:  (0ms)
>> [ivy:resolve]    local: tried
>> [ivy:resolve] 
>> /Users/mattmann/.ivy2/local/org.apache.lucene/lucene-core/3.4.0/jars/lucene-core.jar
>> [ivy:resolve]    maven2: tried
>> [ivy:resolve] 
>> http://repo1.maven.org/maven2/org/apache/lucene/lucene-core/3.4.0/lucene-core-3.4.0.jar
>> [ivy:resolve]   [FAILED ] com.ibm.icu#icu4j;4.0.1!icu4j.jar: 
>> invalid sha1: expected=06362db7a2556bb58a04e991029196e2aad632d4 
>> computed=d9862ffbc6cd6241a03c06b5911bf22a079d2cda (1544ms)
>> [ivy:resolve]   [FAILED ] com.ibm.icu#icu4j;4.0.1!icu4j.jar:  
>> (0ms)
>> [ivy:resolve]    local: tried
>> [ivy:resolve] 
>> /Users/mattmann/.ivy2/local/com.ibm.icu/icu4j/4.0.1/jars/icu4j.jar
>> [ivy:resolve]    maven2: tried
>> [ivy:resolve] 
>> http://repo1.maven.org/maven2/com/ibm/icu/icu4j/4.0.1/icu4j-4.0.1.jar
>> [ivy:resolve]   [FAILED ] 
>> xerces#xercesImpl;2.9.1!xercesImpl.jar: invalid sha1: 
>> expected=7bc7e49ddfe4fb5f193ed37ecc96c12292c8ceb6 
>> computed=88931c057b31ba3ff7ac96e53817b25ff355c4a1 (393ms)
>> [ivy:resolve]   [FAILED ] 
>> xerces#xercesImpl;2.9.1!xercesImpl.jar:  (0ms)
>> [ivy:resolve]    local: tried
>> [ivy:resolve] 
>> /Users/mattmann/.ivy2/local/xerc

Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-03 Thread Mattmann, Chris A (388J)
Hey Julien,

I ran this command: rm -rf /Users/mattmann/.ivy2/

But it still failed with the below messages:

[ivy:resolve] :: problems summary ::
[ivy:resolve]  WARNINGS
[ivy:resolve]   [FAILED ] 
org.apache.hadoop#hadoop-core;1.0.3!hadoop-core.jar: invalid sha1: 
expected=d7d8610ba4aad504475e568fd3badb412a0beae9 
computed=f8369ff1a71e1a8febbb8e9c3a54ffbb08048f19 (1598ms)
[ivy:resolve]   [FAILED ] 
org.apache.hadoop#hadoop-core;1.0.3!hadoop-core.jar:  (0ms)
[ivy:resolve]    local: tried
[ivy:resolve] 
/Users/mattmann/.ivy2/local/org.apache.hadoop/hadoop-core/1.0.3/jars/hadoop-core.jar
[ivy:resolve]    maven2: tried
[ivy:resolve] 
http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-core/1.0.3/hadoop-core-1.0.3.jar
[ivy:resolve]   [FAILED ] org.hsqldb#hsqldb;2.2.8!hsqldb.jar: 
invalid sha1: expected=8231a3ff71ba5889f9e2d01ce13503cbdd4038e9 
computed=81a7e8d5d1802c7acbc8f8f81d3e4680a4b2441c (523ms)
[ivy:resolve]   [FAILED ] org.hsqldb#hsqldb;2.2.8!hsqldb.jar:  (0ms)
[ivy:resolve]    local: tried
[ivy:resolve] 
/Users/mattmann/.ivy2/local/org.hsqldb/hsqldb/2.2.8/jars/hsqldb.jar
[ivy:resolve]    maven2: tried
[ivy:resolve] 
http://repo1.maven.org/maven2/org/hsqldb/hsqldb/2.2.8/hsqldb-2.2.8.jar
[ivy:resolve]   [FAILED ] 
org.apache.lucene#lucene-core;3.4.0!lucene-core.jar: invalid sha1: 
expected=4426bf0764ec5fa634abca236b469d2519c74f65 
computed=112d2454390cba8c7c35b34b8f7a821c6cec3f73 (775ms)
[ivy:resolve]   [FAILED ] 
org.apache.lucene#lucene-core;3.4.0!lucene-core.jar:  (0ms)
[ivy:resolve]    local: tried
[ivy:resolve] 
/Users/mattmann/.ivy2/local/org.apache.lucene/lucene-core/3.4.0/jars/lucene-core.jar
[ivy:resolve]    maven2: tried
[ivy:resolve] 
http://repo1.maven.org/maven2/org/apache/lucene/lucene-core/3.4.0/lucene-core-3.4.0.jar
[ivy:resolve]   [FAILED ] com.ibm.icu#icu4j;4.0.1!icu4j.jar: 
invalid sha1: expected=06362db7a2556bb58a04e991029196e2aad632d4 
computed=d9862ffbc6cd6241a03c06b5911bf22a079d2cda (1544ms)
[ivy:resolve]   [FAILED ] com.ibm.icu#icu4j;4.0.1!icu4j.jar:  (0ms)
[ivy:resolve]    local: tried
[ivy:resolve] 
/Users/mattmann/.ivy2/local/com.ibm.icu/icu4j/4.0.1/jars/icu4j.jar
[ivy:resolve]    maven2: tried
[ivy:resolve] 
http://repo1.maven.org/maven2/com/ibm/icu/icu4j/4.0.1/icu4j-4.0.1.jar
[ivy:resolve]   [FAILED ] xerces#xercesImpl;2.9.1!xercesImpl.jar: 
invalid sha1: expected=7bc7e49ddfe4fb5f193ed37ecc96c12292c8ceb6 
computed=88931c057b31ba3ff7ac96e53817b25ff355c4a1 (393ms)
[ivy:resolve]   [FAILED ] xerces#xercesImpl;2.9.1!xercesImpl.jar:  
(0ms)
[ivy:resolve]    local: tried
[ivy:resolve] 
/Users/mattmann/.ivy2/local/xerces/xercesImpl/2.9.1/jars/xercesImpl.jar
[ivy:resolve]    maven2: tried
[ivy:resolve] 
http://repo1.maven.org/maven2/xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar
[ivy:resolve]   [FAILED ] com.google.guava#guava;11.0.2!guava.jar: 
invalid sha1: expected=35a3c69e19d72743cac83778aecbee68680f63eb 
computed=1e8507869d7db99f60f8d949bc5ba2b5410ce2db (355ms)
[ivy:resolve]   [FAILED ] com.google.guava#guava;11.0.2!guava.jar:  
(0ms)
[ivy:resolve]    local: tried
[ivy:resolve] 
/Users/mattmann/.ivy2/local/com.google.guava/guava/11.0.2/jars/guava.jar
[ivy:resolve]    maven2: tried
[ivy:resolve] 
http://repo1.maven.org/maven2/com/google/guava/guava/11.0.2/guava-11.0.2.jar
[ivy:resolve]   ::
[ivy:resolve]   ::  FAILED DOWNLOADS::
[ivy:resolve]   :: ^ see resolution messages for details  ^ ::
[ivy:resolve]   ::
[ivy:resolve]   :: org.apache.lucene#lucene-core;3.4.0!lucene-core.jar
[ivy:resolve]   :: org.apache.hadoop#hadoop-core;1.0.3!hadoop-core.jar
[ivy:resolve]   :: org.hsqldb#hsqldb;2.2.8!hsqldb.jar
[ivy:resolve]   :: com.ibm.icu#icu4j;4.0.1!icu4j.jar
[ivy:resolve]   :: xerces#xercesImpl;2.9.1!xercesImpl.jar
[ivy:resolve]   :: com.google.guava#guava;11.0.2!guava.jar
[ivy:resolve]   ::
[ivy:resolve] 
[ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS

BUILD FAILED
/Users/mattmann/tmp/nutch2/apache-nutch-2.0/build.xml:431: impossible to 
resolve dependencies:
resolve failed - see output for details

Total time: 1 minute 56 seconds
[chipotle:~/tmp/nutch2/apache-nutch-2.0] mattmann% 

Any ideas?

Cheers,
Chris


On Jul 3, 2012, at 7:49 AM, Julien Nioche wrote:

> Hi Chris
> 
> 
> 
> [chipotle:~/tmp/nutch2] mattmann% $HOME/bin/verify_gpg_sigs
> Verifying Signature for file apache-nutch-2.0-src.tar.gz.asc
> gpg: Signature made Mon Jun 25 09:28:36 2012 PDT using RSA key ID C601BCA7
> gpg: Good signature from "Lewis John McGibbney (CODE SIGNING KEY) 
> "
> gpg: WARNING: This key is

Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-03 Thread Mattmann, Chris A (388J)
Hey Julien,


On Jul 3, 2012, at 7:49 AM, Julien Nioche wrote:
[..snip..]
> 
> OK, so basically signatures and checksums are fine

+1, yep they are great.

> 
>  
> 
> Tried to build and test and got this:
> 
> [ivy:resolve]   ::
> [..snip...]
> 
> Try deleting your entire .ivy dir and re-run ant. Just did that on my machine 
> and Nutch compiles fine

OK will do now.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-03 Thread Mattmann, Chris A (388J)
Hi Guys,

Unfortunately, -1 from me, please read on:

Release SIGS check out:

[chipotle:~/tmp/nutch2] mattmann% $HOME/bin/verify_gpg_sigs 
Verifying Signature for file apache-nutch-2.0-src.tar.gz.asc
gpg: Signature made Mon Jun 25 09:28:36 2012 PDT using RSA key ID C601BCA7
gpg: Good signature from "Lewis John McGibbney (CODE SIGNING KEY) 
"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: 2A23 D53F 8D27 5CB6 91E1  89C1 F45E 7970 C601 BCA7
Verifying Signature for file apache-nutch-2.0-src.zip.asc
gpg: Signature made Mon Jun 25 09:28:18 2012 PDT using RSA key ID C601BCA7
gpg: Good signature from "Lewis John McGibbney (CODE SIGNING KEY) 
"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: 2A23 D53F 8D27 5CB6 91E1  89C1 F45E 7970 C601 BCA7
[chipotle:~/tmp/nutch2] mattmann% 

Checksums check out:

[chipotle:~/tmp/nutch2] mattmann% $HOME/bin/verify_md5_checksums 
md5sum: stat '*.bz2': No such file or directory
apache-nutch-2.0-src.tar.gz: OK
apache-nutch-2.0-src.zip: OK
[chipotle:~/tmp/nutch2] mattmann% 

Tried to build and test and got this:

[ivy:resolve]   ::
[ivy:resolve]   ::  UNRESOLVED DEPENDENCIES ::
[ivy:resolve]   ::
[ivy:resolve]   :: org.apache.gora#gora-sql;0.1.1-incubating: 
configuration not found in org.apache.gora#gora-sql;0.1.1-incubating: 
'default'. It was required from 
org.apache.nutch#nutch;work...@chipotle.jpl.nasa.gov default
[ivy:resolve]   ::
[ivy:resolve]   ::
[ivy:resolve]   ::  FAILED DOWNLOADS::
[ivy:resolve]   :: ^ see resolution messages for details  ^ ::
[ivy:resolve]   ::
[ivy:resolve]   :: org.apache.hadoop#hadoop-core;1.0.3!hadoop-core.jar
[ivy:resolve]   :: org.hsqldb#hsqldb;2.2.8!hsqldb.jar
[ivy:resolve]   :: com.google.guava#guava;11.0.2!guava.jar
[ivy:resolve]   ::

Consistently. I tried deleting the relevant entries out of my Ivy cache but 
still nothing.
Any idea what I'm doing wrong here? I'm on Mac OS X 10.6.8.

Cheers,
Chris

On Jun 25, 2012, at 9:32 AM, Lewis John Mcgibbney wrote:

> Hi Everyone,
> 
> A candidate for the Apache Nutch 2.0 RC3 is available at:
> 
> http://people.apache.org/~lewismc/apache-nutch-2.0rc3
> 
> The release candidate is a src.zip and src.tar.gz ONLY
> archive of the sources in:
> 
> http://svn.apache.org/repos/asf/nutch/tags/release-2.0rc3
> 
> We release Nutch 2.0 in this fashion due to the inclusion of
> Apache Gora and the likelihood that users will regularly recompile
> the code to suit dynamic requirements.
> 
> Further, a staged Maven repository of the 2.0 jar, sources.jar and
> javadoc.jar is available here:
> 
> https://repository.apache.org/content/repositories/orgapachenutch-275
> 
> Please vote on releasing this package as Apache Nutch 2.0.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Nutch PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Nutch 2.0
> [ ] -1 Do not release this package because...
> 
> Many Thanks and heres to plenty more.
> 
> Kind Regards,
> Lewis
> 
> P.S. Here's my +1.
> 
> -- 
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 2.0 Release Candidate #3

2012-07-02 Thread Mattmann, Chris A (388J)
I'll try to scope this by tomorrow...thanks Lewis.

Cheers,
Chris

On Jul 2, 2012, at 10:49 AM, Lewis John Mcgibbney wrote:

> Anyone else for this RC?
> 
> I've been slighyl distracted with a number of things recently and only
> just getting round to following this one up so apologies about that.
> 
> Best
> 
> Lewis
> 
> On Wed, Jun 27, 2012 at 10:23 AM, Ferdy Galema  
> wrote:
>> +1 Crawling with HBaseStore works from injecting to indexing.
>> 
>> Great work Lewis.
>> 
>> On Mon, Jun 25, 2012 at 6:32 PM, Lewis John Mcgibbney
>>  wrote:
>>> 
>>> Hi Everyone,
>>> 
>>> A candidate for the Apache Nutch 2.0 RC3 is available at:
>>> 
>>> http://people.apache.org/~lewismc/apache-nutch-2.0rc3
>>> 
>>> The release candidate is a src.zip and src.tar.gz ONLY
>>> archive of the sources in:
>>> 
>>> http://svn.apache.org/repos/asf/nutch/tags/release-2.0rc3
>>> 
>>> We release Nutch 2.0 in this fashion due to the inclusion of
>>> Apache Gora and the likelihood that users will regularly recompile
>>> the code to suit dynamic requirements.
>>> 
>>> Further, a staged Maven repository of the 2.0 jar, sources.jar and
>>> javadoc.jar is available here:
>>> 
>>> https://repository.apache.org/content/repositories/orgapachenutch-275
>>> 
>>> Please vote on releasing this package as Apache Nutch 2.0.
>>> The vote is open for the next 72 hours and passes if a majority of at
>>> least three +1 Nutch PMC votes are cast.
>>> 
>>> [ ] +1 Release this package as Apache Nutch 2.0
>>> [ ] -1 Do not release this package because...
>>> 
>>> Many Thanks and heres to plenty more.
>>> 
>>> Kind Regards,
>>> Lewis
>>> 
>>> P.S. Here's my +1.
>>> 
>>> --
>>> Lewis
>> 
>> 
> 
> 
> 
> -- 
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: o.a.n.metadata.Office still required?

2012-06-28 Thread Mattmann, Chris A (388J)
+1 to remove it...probably just can standardize on the Tika met set if we
need it...

Cheers,
Chris

On Jun 28, 2012, at 2:37 PM, Lewis John Mcgibbney wrote:

> Hi,
> 
> Please correct me if I am wrong. Is the above class required by any of
> our parsers anymore?
> 
> Thanks
> 
> Lewis
> 
> -- 
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: 1.5.1 release

2012-06-22 Thread Mattmann, Chris A (388J)
Hey Guys,

(sorry for the top post)

There's no reason to freeze trunk during releases. In fact, during the RC, once 
the branch (or tag for that matter)
is created, trunk can continue on, no need to stop. Heck, we can always just 
tag or branch from a specific 
revision too so it's not really a biggie.

Cheers,
Chris

On Jun 21, 2012, at 2:43 PM, Lewis John Mcgibbney wrote:

> Hi Markus,
> 
> On Thu, Jun 21, 2012 at 10:02 PM, Markus Jelsma
>  wrote:
>> It's still not clear to me what 1.5.1 is going to look like. Will it be 
>> current trunk incl. the script bugfix or just 1.5 plus the bugfix? I would 
>> vote for the latter as it makes more sense for a bugfix release.
> 
> I am easy on this one... I suggest we do it the normal way. Lets let
> folks chime in and see where we are on Saturday. It looks like 2.0 is
> going to be shifted with the new commits so do we wish to try and keep
> at least the minimal consistency between both releases?
> 
>> 
>> There is another debate behind this, in my opinion, about freezing trunk 
>> prior to releases and thus stopping active development. This has been an 
>> issue in the past. Is this something for another thread?
>> 
> 
> Yeah I must also agree that we should branch trunk, keep the branch
> for the release then run the RC's from the branch regardless of how
> trunk comes on. My only suggestion for  backporting patches from trunk
> to the release candidate branch is if it is a pretty critical bug fix
> as we've now discovered in 1.5!
> 
> Additionally there is another note here as well w.r.t release
> managers. We've relied on the excellent work done by Chris (and
> others) as RM's for a number of releases but during the release period
> (on occasion, more recently) as you mention trunk has frozen
> temporarily. Of course it is the aim to prevent this happening should
> the RC not progress as we would all like. Hopefully we are moving
> towards a more adaptable and sustainable RM process within Nutch where
> the RM responsibility can be undertaken/overseen by more than one
> individual over the entire duration of the process. I think (and hope)
> we can consider the slight struggle we've had for 1.5 as an exception.
> As far back as I can remember RC's have always been efficient and
> smooth and I personally am committed to ensuring we return to the high
> precedent set by previous RM's.
> We've also seen an alternative (and in my opinion an improved)
> publication of Nutch atrifacts for 1.5. For reference I direct you to
> Julien's commentary [0] on this topic. Due to this, we've had to run
> additional RC's which has taken a bit longer than usual and I must
> personally apologise to everyone for at least one RC cock up which
> could have been avoided had I been more familiar with the Nutch
> specific release process.
> 
> I think I'm ranting here so I'm going to give it a bye now.
> 
> Lewis
> 
> [0] http://digitalpebble.blogspot.co.uk/2012/06/whats-new-in-nutch-15.html


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Nutch 1.5 Deploy Mode Doesn't Work like Nutch 1.4 Deploy Mode

2012-06-19 Thread Mattmann, Chris A (388J)
+1!

Cheers,
Chris

On Jun 19, 2012, at 2:26 AM, Julien Nioche wrote:

> Quite annoying that we did not spot this before releasing. What about a 1.5.1 
> soonish with this fix + couple smallish improvements e.g. upgrade to Hadoop 
> 1.0.3?
> 
> J.
> 
> -- Forwarded message --
> From: Julien Nioche 
> Date: 19 June 2012 08:56
> Subject: Re: Nutch 1.5 Deploy Mode Doesn't Work like Nutch 1.4 Deploy Mode
> To: u...@nutch.apache.org
> 
> 
> Alternatively modify the bin/nutch script to make it more robust
> 
> # NUTCH_JOB 
> if [ -f ${NUTCH_HOME}/*nutch*.job ]; then
> local=false
>   for f in $NUTCH_HOME/*nutch*.job; do
> NUTCH_JOB=$f;
>   done
> fi
> 
> On 19 June 2012 00:09, sidbatra  wrote:
> This turns out to be a genuine bug with an easy fix.
> 
> build.xml is configured to generate a job file titled "apache-nutch-1.5.job"
> but the deploy binary is still looking for "nutch-1.5.job"
> 
> 
> Renaming "apache-nutch-1.5.job" to "nutch-1.5.job" fixes this bug in deploy
> mode.
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Nutch-1-5-Deploy-Mode-Doesn-t-Work-like-Nutch-1-4-Deploy-Mode-tp3990169p3990196.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 
> 
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 
> 
> 
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: VOTE Apache Nutch 2.0 RC1

2012-06-15 Thread Mattmann, Chris A (388J)
OK you are just making us all look bad now Juls ;)

Super fast!

Cheers,
Chris


On Jun 15, 2012, at 2:54 AM, Julien Nioche wrote:

> see https://issues.apache.org/jira/browse/NUTCH-1396
> 
> On 15 June 2012 10:43, Julien Nioche  wrote:
> Before you do, could you check that NutchGora passes ant test successfully. I 
> just tried and got an error related to the parse-tika tests. Am about to open 
> a JIRA to update to the latest version of Tika for NutchGora which should fix 
> the problem and put it at the same level as trunk
> 
> J
> 
> On 15 June 2012 10:01, Lewis John Mcgibbney  
> wrote:ly
> 
> I'll push this in an hour or so guys.
> 
> Thanks for the input.
> 
> Lewis
> 
> 
> On Fri, Jun 15, 2012 at 9:39 AM, Julien Nioche 
>  wrote:
> +1
> 
> 
> On 15 June 2012 09:00, Ferdy Galema  wrote:
> Agree with only releasing src.
> 
> 
> On Thu, Jun 14, 2012 at 11:32 PM, Mattmann, Chris A (388J) 
>  wrote:
> Or just not ship a bin release at all. Src is the only thing we really VOTE 
> on legally though bin is provided for convenience purposes. Will type more on 
> this later...
> 
> Sent from my iPhone
> 
> On Jun 14, 2012, at 2:18 PM, "Lewis John Mcgibbney" 
>  wrote:
> 
>> Hi Julien,
>> 
>> Do you suggest with the binary release that we simply open up all gora-* 
>> deps and ship it with every jar available?
>> 
>> Lewis
>> 
>> On Thu, Jun 14, 2012 at 9:39 PM, Julien Nioche 
>>  wrote:
>> I disagree. You'd expect a binary release to work out of the box - which is 
>> not the case. Plus we'd have to spend more time explaining the workaround, 
>> answering the same questions over and over on the ML etc... Fixing this 
>> should not be a big deal (i.e. add the gore-x modules for the backends to 
>> the ivy deps file).
>> 
>> Julien
>> 
>> 
>> On 14 June 2012 20:27, Mattmann, Chris A (388J) 
>>  wrote:
>> Hey Guys,
>> 
>> I think the annoyance is probably something folks can live with as they have 
>> been
>> waiting for an "official" release of 2.x for years :)
>> 
>> My +1 to roll RC #2 with or without a solution to this and mark it as a 
>> TODO. "release
>> eary", "release often" :)
>> 
>> Cheers,
>> Chris
>> 
>> On Jun 14, 2012, at 10:04 AM, Lewis John Mcgibbney wrote:
>> 
>> > Aye this is no good at all. Depending on which backend you wish to use 
>> > with Gora, you will need to go and manually fetch the correct .jar's from 
>> > maven central.
>> >
>> > Does anyone else have either solution or a workaround before I push RC2 
>> > with just src dists?
>> >
>> > Thanks
>> >
>> > Lewis
>> >
>> > On Thu, Jun 14, 2012 at 4:52 PM, Sebastian Nagel 
>> >  wrote:
>> > > We only supply src distributions...
>> > > Does this principle apply to Nutch 2 as well?
>> > Maybe, yes.
>> > The situation with the current binary package is uncomfortable:
>> > I had to copy/link gora-hbase and hbase jars into lib/ to get nutch 
>> > running.
>> >
>> > 2012/6/13 Lewis John Mcgibbney 
>> > Hi Guys,
>> >
>> > Whilst updating the Nutch2Tutorial I got thinking that within Gora we 
>> > don't supply binary distributions of the code, this is because when using 
>> > Gora a user may wish/require to recompile the code to accomodate config 
>> > changes etc. We only supply src distributions...
>> >
>> > Does this principle apply to Nutch 2 as well? I mean, what if your using 
>> > the gora-sql dependency, then you wish to switch to HBase and recompile, 
>> > is this possible within the binary distribution?
>> >
>> > Best
>> >
>> > Lewis
>> >
>> >
>> > On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche 
>> >  wrote:
>> > Ferdy
>> >
>> > The Nutch job jar is not present in the binary archive. This means 
>> > distributed running of jobs is not supported. I'm not sure if this is a 
>> > problem (since users can always build one themselves), merely pointing it 
>> > out. The recently released 1.5 also lacks this job jar, so at least no 
>> > difference there.
>> >
>> > The binary distrib corresponds to runtime/local and as such should NOT 
>> > have the job file there. This is now the norm since 1.5
>> >
>

Re: VOTE Apache Nutch 2.0 RC1

2012-06-14 Thread Mattmann, Chris A (388J)
Or just not ship a bin release at all. Src is the only thing we really VOTE on 
legally though bin is provided for convenience purposes. Will type more on this 
later...

Sent from my iPhone

On Jun 14, 2012, at 2:18 PM, "Lewis John Mcgibbney" 
mailto:lewis.mcgibb...@gmail.com>> wrote:

Hi Julien,

Do you suggest with the binary release that we simply open up all gora-* deps 
and ship it with every jar available?

Lewis

On Thu, Jun 14, 2012 at 9:39 PM, Julien Nioche 
mailto:lists.digitalpeb...@gmail.com>> wrote:
I disagree. You'd expect a binary release to work out of the box - which is not 
the case. Plus we'd have to spend more time explaining the workaround, 
answering the same questions over and over on the ML etc... Fixing this should 
not be a big deal (i.e. add the gore-x modules for the backends to the ivy deps 
file).

Julien


On 14 June 2012 20:27, Mattmann, Chris A (388J) 
mailto:chris.a.mattm...@jpl.nasa.gov>> wrote:
Hey Guys,

I think the annoyance is probably something folks can live with as they have 
been
waiting for an "official" release of 2.x for years :)

My +1 to roll RC #2 with or without a solution to this and mark it as a TODO. 
"release
eary", "release often" :)

Cheers,
Chris

On Jun 14, 2012, at 10:04 AM, Lewis John Mcgibbney wrote:

> Aye this is no good at all. Depending on which backend you wish to use with 
> Gora, you will need to go and manually fetch the correct .jar's from maven 
> central.
>
> Does anyone else have either solution or a workaround before I push RC2 with 
> just src dists?
>
> Thanks
>
> Lewis
>
> On Thu, Jun 14, 2012 at 4:52 PM, Sebastian Nagel 
> mailto:wastl.na...@googlemail.com>> wrote:
> > We only supply src distributions...
> > Does this principle apply to Nutch 2 as well?
> Maybe, yes.
> The situation with the current binary package is uncomfortable:
> I had to copy/link gora-hbase and hbase jars into lib/ to get nutch running.
>
> 2012/6/13 Lewis John Mcgibbney 
> mailto:lewis.mcgibb...@gmail.com>>
> Hi Guys,
>
> Whilst updating the Nutch2Tutorial I got thinking that within Gora we don't 
> supply binary distributions of the code, this is because when using Gora a 
> user may wish/require to recompile the code to accomodate config changes etc. 
> We only supply src distributions...
>
> Does this principle apply to Nutch 2 as well? I mean, what if your using the 
> gora-sql dependency, then you wish to switch to HBase and recompile, is this 
> possible within the binary distribution?
>
> Best
>
> Lewis
>
>
> On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche 
> mailto:lists.digitalpeb...@gmail.com>> wrote:
> Ferdy
>
> The Nutch job jar is not present in the binary archive. This means 
> distributed running of jobs is not supported. I'm not sure if this is a 
> problem (since users can always build one themselves), merely pointing it 
> out. The recently released 1.5 also lacks this job jar, so at least no 
> difference there.
>
> The binary distrib corresponds to runtime/local and as such should NOT have 
> the job file there. This is now the norm since 1.5
>
> Will try and do some testing of the RC
>
> Thanks
>
> Julien
>
>
>
> --
>
> Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>
>
>
>
> --
> Lewis
>
>
>
>
>
> --
> Lewis
>


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov<mailto:chris.a.mattm...@nasa.gov>
WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++




--
[http://digitalpebble.com/img/logo.gif]
Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble




--
Lewis



Re: VOTE Apache Nutch 2.0 RC1

2012-06-14 Thread Mattmann, Chris A (388J)
Hey Guys,

I think the annoyance is probably something folks can live with as they have 
been
waiting for an "official" release of 2.x for years :)

My +1 to roll RC #2 with or without a solution to this and mark it as a TODO. 
"release
eary", "release often" :)

Cheers,
Chris

On Jun 14, 2012, at 10:04 AM, Lewis John Mcgibbney wrote:

> Aye this is no good at all. Depending on which backend you wish to use with 
> Gora, you will need to go and manually fetch the correct .jar's from maven 
> central.
> 
> Does anyone else have either solution or a workaround before I push RC2 with 
> just src dists?
> 
> Thanks
> 
> Lewis
> 
> On Thu, Jun 14, 2012 at 4:52 PM, Sebastian Nagel  
> wrote:
> > We only supply src distributions... 
> > Does this principle apply to Nutch 2 as well?
> Maybe, yes.
> The situation with the current binary package is uncomfortable:
> I had to copy/link gora-hbase and hbase jars into lib/ to get nutch running.
> 
> 2012/6/13 Lewis John Mcgibbney 
> Hi Guys,
> 
> Whilst updating the Nutch2Tutorial I got thinking that within Gora we don't 
> supply binary distributions of the code, this is because when using Gora a 
> user may wish/require to recompile the code to accomodate config changes etc. 
> We only supply src distributions... 
> 
> Does this principle apply to Nutch 2 as well? I mean, what if your using the 
> gora-sql dependency, then you wish to switch to HBase and recompile, is this 
> possible within the binary distribution?
> 
> Best
> 
> Lewis
> 
> 
> On Wed, Jun 13, 2012 at 3:38 PM, Julien Nioche 
>  wrote:
> Ferdy
> 
> The Nutch job jar is not present in the binary archive. This means 
> distributed running of jobs is not supported. I'm not sure if this is a 
> problem (since users can always build one themselves), merely pointing it 
> out. The recently released 1.5 also lacks this job jar, so at least no 
> difference there.
> 
> The binary distrib corresponds to runtime/local and as such should NOT have 
> the job file there. This is now the norm since 1.5
> 
> Will try and do some testing of the RC
> 
> Thanks
> 
> Julien
> 
> 
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 
> 
> 
> 
> -- 
> Lewis 
> 
> 
> 
> 
> 
> -- 
> Lewis 
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Suitable Nutch 2.0 Project Description

2012-06-13 Thread Mattmann, Chris A (388J)
+1 to the description w/o experimental too (I agree with Ferdy).

You guys ROCK.

Cheers,
Chris

On Jun 13, 2012, at 5:29 AM, Lewis John Mcgibbney wrote:

> Hi,
> 
> Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask
> about a suitable project descriptor.
> 
> So far on trunk we have
> 
> ** Apache Nutch is an open source web-search software project.
> Stemming from Apache Lucene, it now builds on Apache Solr adding
> web-specifics, such as a crawler, a link-graph database and parsing
> support handled by Apache Tika for HTML and and array other document
> formats.
> 
> This is merely a pot shot, but I was thinking for Nutch 2.0, something like
> 
> ** Apache Nutch 2.X is an experimental branch of the Apache Nutch open
> source web-search software project. It builds on Apache Gora for data
> persistence and Apache Solr for indexing adding web-specifics, such as
> a crawler, a link-graph database and parsing support handled by Apache
> Tika for HTML and and array other document formats.
> 
> Although there are not many changes here I just wanted to run it by
> you folks...?
> 
> Thanks
> Lewis
> 
> -- 
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: VOTE Apache Nutch 2.0 RC1

2012-06-12 Thread Mattmann, Chris A (388J)
Hey Guys,

#2 is probably reason enough for a respin. 

Lewis if you don't have time to do it before Thursday, I could probably
give it a whack. Let me know.

Cheers,
Chris

On Jun 12, 2012, at 3:33 PM, Sebastian Nagel wrote:

> Hi Lewis,
> 
> my first steps with 2.0 (to be continued, still struggling).
> 
> Two points (I'll try to give a final vote tomorrow):
> 
> 1 some guidance would be nice. README.txt points
> to http://wiki.apache.org/nutch/NutchTutorial which refers to 1.x
> (I'm using 
> http://sujitpal.blogspot.de/2012/01/exploring-nutch-gora-with-cassandra.html)
> 
> 2 the package contains your nutch-site.xml:
>http.agent.email
>lewi...@apache.org
> I guess that's not intended :)
> 
> Cheers,
> Sebastian
> 
> On 06/12/2012 10:16 PM, Lewis John Mcgibbney wrote:
>> Hi Everyone,
>> 
>> I appreciate that most of the core dev's are using trunk, however I
>> would appeal to you guys to at least check out the artifacts and check
>> sigs, tests, license headers if possible. Although this does not fully
>> satisfy the requirements of a thoroughly reviewed RC, hopefully the
>> thorough stuff can be undertaken by those directly using the artifacts
>> and code in development/production.
>> 
>> Thanks very much in advance
>> 
>> Best
>> 
>> Lewis
>> 
>> On Fri, Jun 8, 2012 at 3:49 PM, lewis john mcgibbney  
>> wrote:
>>> Good Evening Everyone,
>>> 
>>> A candidate for the Apache Nutch 2.0 RC1 is available at:
>>> 
>>> http://people.apache.org/~lewismc/nutch-2.0
>>> 
>>> The release candidate is a src.zip, bin.zip, src.tar.gz and bin.tar.gz
>>> archive of the sources in:
>>> 
>>> http://svn.apache.org/repos/asf/nutch/tags/release-2.0rc1
>>> 
>>> Further, a staged Maven repository of the 2.0 jar, sources.jar and
>>> javadoc.jar is available here:
>>> 
>>> https://repository.apache.org/content/repositories/orgapachenutch-215
>>> 
>>> Please vote on releasing this package as Apache Nutch 2.0.
>>> The vote is open for the next 72 hours and passes if a majority of at
>>> least three +1 Nutch PMC votes are cast.
>>> 
>>> [ ] +1 Release this package as Apache Nutch 2.0
>>> [ ] -1 Do not release this package because...
>>> 
>>> Many Thanks and heres to plenty more.
>>> 
>>> Have a great weekend, Kind Regards,
>>> Lewis
>>> 
>>> P.S. Here's my +1.
>> 
>> 
>> 
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: VOTE Apache Nutch 2.0 RC1

2012-06-12 Thread Mattmann, Chris A (388J)
Hey Lewis,

I will get to this tonight, for sure.

Thanks!

Cheers,
Chris

On Jun 12, 2012, at 1:16 PM, Lewis John Mcgibbney wrote:

> Hi Everyone,
> 
> I appreciate that most of the core dev's are using trunk, however I
> would appeal to you guys to at least check out the artifacts and check
> sigs, tests, license headers if possible. Although this does not fully
> satisfy the requirements of a thoroughly reviewed RC, hopefully the
> thorough stuff can be undertaken by those directly using the artifacts
> and code in development/production.
> 
> Thanks very much in advance
> 
> Best
> 
> Lewis
> 
> On Fri, Jun 8, 2012 at 3:49 PM, lewis john mcgibbney  
> wrote:
>> Good Evening Everyone,
>> 
>> A candidate for the Apache Nutch 2.0 RC1 is available at:
>> 
>> http://people.apache.org/~lewismc/nutch-2.0
>> 
>> The release candidate is a src.zip, bin.zip, src.tar.gz and bin.tar.gz
>> archive of the sources in:
>> 
>> http://svn.apache.org/repos/asf/nutch/tags/release-2.0rc1
>> 
>> Further, a staged Maven repository of the 2.0 jar, sources.jar and
>> javadoc.jar is available here:
>> 
>> https://repository.apache.org/content/repositories/orgapachenutch-215
>> 
>> Please vote on releasing this package as Apache Nutch 2.0.
>> The vote is open for the next 72 hours and passes if a majority of at
>> least three +1 Nutch PMC votes are cast.
>> 
>>  [ ] +1 Release this package as Apache Nutch 2.0
>>  [ ] -1 Do not release this package because...
>> 
>> Many Thanks and heres to plenty more.
>> 
>> Have a great weekend, Kind Regards,
>> Lewis
>> 
>> P.S. Here's my +1.
> 
> 
> 
> -- 
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.5 release-1.5RC4

2012-06-07 Thread Mattmann, Chris A (388J)
Thanks Lewis you are full of win!

Cheers,
Chris

On Jun 7, 2012, at 4:58 AM, Lewis John Mcgibbney wrote:

> Hi Chris/Everyone,
> 
> Been full on recently so apologies for taking forever and a day to
> close the VOTE off.
> 
> On Sat, Jun 2, 2012 at 6:11 AM, Mattmann, Chris A (388J)
>  wrote:
>> 
>> Minor nit: source package unzips into the current directory as opposed to 
>> prior practice of having
>> it unzip into apache-nutch-X.Y folder. No biggie though. Thanks for stepping 
>> up and rocking
>> the release process!
>> 
> 
> Yeah we can fix this next time around along with Sebastian's comment
> regarding docs directory. I suppose this may all be subject to change
> depdending on whether Maven becomes our tool of choice...
> 
> Anyway, I'll get the release finished today and thanks again for the patience.
> 
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.5 release-1.5RC4

2012-06-01 Thread Mattmann, Chris A (388J)
Hey Lewis,

+1 from me!

SIGS check out:

[chipotle:nutch-dev/1.5-release/rc4] mattmann% ls
apache-nutch-1.5-bin.tar.gz  apache-nutch-1.5-bin.zip 
apache-nutch-1.5-src.tar.gz  apache-nutch-1.5-src.zip
apache-nutch-1.5-bin.tar.gz.asc  apache-nutch-1.5-bin.zip.asc 
apache-nutch-1.5-src.tar.gz.asc  apache-nutch-1.5-src.zip.asc
apache-nutch-1.5-bin.tar.gz.md5  apache-nutch-1.5-bin.zip.md5 
apache-nutch-1.5-src.tar.gz.md5  apache-nutch-1.5-src.zip.md5
apache-nutch-1.5-bin.tar.gz.sha  apache-nutch-1.5-bin.zip.sha 
apache-nutch-1.5-src.tar.gz.sha  apache-nutch-1.5-src.zip.sha
[chipotle:nutch-dev/1.5-release/rc4] mattmann% $HOME/bin/verify_gpg_sigs 
Verifying Signature for file apache-nutch-1.5-bin.tar.gz.asc
gpg: Signature made Thu May 31 13:24:55 2012 PDT using RSA key ID C601BCA7
gpg: Good signature from "Lewis John McGibbney (CODE SIGNING KEY) 
"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: 2A23 D53F 8D27 5CB6 91E1  89C1 F45E 7970 C601 BCA7
Verifying Signature for file apache-nutch-1.5-bin.zip.asc
gpg: Signature made Thu May 31 13:25:57 2012 PDT using RSA key ID C601BCA7
gpg: Good signature from "Lewis John McGibbney (CODE SIGNING KEY) 
"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: 2A23 D53F 8D27 5CB6 91E1  89C1 F45E 7970 C601 BCA7
Verifying Signature for file apache-nutch-1.5-src.tar.gz.asc
gpg: Signature made Thu May 31 13:25:34 2012 PDT using RSA key ID C601BCA7
gpg: Good signature from "Lewis John McGibbney (CODE SIGNING KEY) 
"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: 2A23 D53F 8D27 5CB6 91E1  89C1 F45E 7970 C601 BCA7
Verifying Signature for file apache-nutch-1.5-src.zip.asc
gpg: Signature made Thu May 31 13:26:15 2012 PDT using RSA key ID C601BCA7
gpg: Good signature from "Lewis John McGibbney (CODE SIGNING KEY) 
"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the owner.
Primary key fingerprint: 2A23 D53F 8D27 5CB6 91E1  89C1 F45E 7970 C601 BCA7
[chipotle:nutch-dev/1.5-release/rc4] mattmann% 

checkums check out:

[chipotle:nutch-dev/1.5-release/rc4] mattmann% $HOME/bin/verify_md5_checksums 
md5sum: stat '*.bz2': No such file or directory
apache-nutch-1.5-bin.tar.gz: OK
apache-nutch-1.5-src.tar.gz: OK
apache-nutch-1.5-bin.zip: OK
apache-nutch-1.5-src.zip: OK
[chipotle:nutch-dev/1.5-release/rc4] mattmann% 

Built source. All good!

runtime:
[mkdir] Created dir: 
/Users/mattmann/Desktop/Apache/nutch-dev/1.5-release/rc4/apache-nutch-1.5/runtime
[mkdir] Created dir: 
/Users/mattmann/Desktop/Apache/nutch-dev/1.5-release/rc4/apache-nutch-1.5/runtime/local
[mkdir] Created dir: 
/Users/mattmann/Desktop/Apache/nutch-dev/1.5-release/rc4/apache-nutch-1.5/runtime/deploy
 [copy] Copying 1 file to 
/Users/mattmann/Desktop/Apache/nutch-dev/1.5-release/rc4/apache-nutch-1.5/runtime/deploy
 [copy] Copying 1 file to 
/Users/mattmann/Desktop/Apache/nutch-dev/1.5-release/rc4/apache-nutch-1.5/runtime/deploy/bin
 [copy] Copying 1 file to 
/Users/mattmann/Desktop/Apache/nutch-dev/1.5-release/rc4/apache-nutch-1.5/runtime/local/lib
 [copy] Copying 1 file to 
/Users/mattmann/Desktop/Apache/nutch-dev/1.5-release/rc4/apache-nutch-1.5/runtime/local/lib/native
 [copy] Copying 21 files to 
/Users/mattmann/Desktop/Apache/nutch-dev/1.5-release/rc4/apache-nutch-1.5/runtime/local/conf
 [copy] Copying 1 file to 
/Users/mattmann/Desktop/Apache/nutch-dev/1.5-release/rc4/apache-nutch-1.5/runtime/local/bin
 [copy] Copying 48 files to 
/Users/mattmann/Desktop/Apache/nutch-dev/1.5-release/rc4/apache-nutch-1.5/runtime/local/lib
 [copy] Copying 123 files to 
/Users/mattmann/Desktop/Apache/nutch-dev/1.5-release/rc4/apache-nutch-1.5/runtime/local/plugins
 [copy] Copied 2 empty directories to 2 empty directories under 
/Users/mattmann/Desktop/Apache/nutch-dev/1.5-release/rc4/apache-nutch-1.5/runtime/local/test

BUILD SUCCESSFUL
Total time: 2 minutes 17 seconds
[chipotle:1.5-release/rc4/apache-nutch-1.5] mattmann% 

Minor nit: source package unzips into the current directory as opposed to prior 
practice of having
it unzip into apache-nutch-X.Y folder. No biggie though. Thanks for stepping up 
and rocking
the release process!

Cheers,
Chris

On May 31, 2012, at 1:37 PM, Lewis John Mcgibbney wrote:

> Good Evening Everyone,
> 
> A candidate for the Apache Nutch 1.5 RC4 is available at:
> 
> http://people.apache.org/~lewismc/apache-nutch-1.5-rc4/
> 
> The release candidate is a src.zip, bin.zip, src.tar.gz and bin.tar.gz
> archive of the sources in:
> 
> http://svn.apache.org/repos/asf/nutch/tags/release-1.5-rc4/
> 
> Furthe

Re: [VOTE] Apache Nutch release 1.5 RC3

2012-05-31 Thread Mattmann, Chris A (388J)
Hey Lewis,

Actually if the bits change, in the past, I've been pushed to generate a new
RC (as the SIG files, checksum, etc. will change too).

My +1 for a new RC to accommodate that. If you don't have time today
I would be happy to help.

Cheers,
Chris (who now has more time *grin*)

On May 31, 2012, at 8:42 AM, Lewis John Mcgibbney wrote:

> If I were to change to artifacts to accommodate the removal of the
> runtime dir I don't think it would require a completely new RC.
> 
> I am happy to generate the same sources via the tag, sign, then push
> them pending the VOTE result.
> 
> Does this comply with release policy?
> 
> Thanks
> 
> Lewis
> 
> On Thu, May 31, 2012 at 3:49 PM, Mattmann, Chris A (388J)
>  wrote:
>> okey dokey.
>> 
>> I will try and take the time to review the RC today. Thanks for pushing
>> this Lewis!
>> 
>> Cheers,
>> Chris
>> 
>> On May 31, 2012, at 7:36 AM, Julien Nioche wrote:
>> 
>>> Hi,
>>> 
>>> Depends on Lewis :-) Let's say I am +1 but if it is not too much hassle it 
>>> would be nice to fix it
>>> 
>>> J.
>>> 
>>> On 31 May 2012 15:24, Mattmann, Chris A (388J) 
>>>  wrote:
>>> Hey Guys,
>>> 
>>> Does this warrant a respin, or are you +1 Juls?
>>> 
>>> Cheers,
>>> Chris
>>> 
>>> On May 31, 2012, at 1:44 AM, Julien Nioche wrote:
>>> 
>>>> Hi Lewis,
>>>> 
>>>> Minor nitpick : the directory /runtime is not necessary as it is built 
>>>> with ANT. Removing it would massively reduce the size of the archive. 
>>>> Could we fix it for the final release?
>>>> 
>>>> All fine apart from this. The content of the src archive compiles fine, 
>>>> the pom on the Maven repo looks good.
>>>> 
>>>> Thanks a lot
>>>> 
>>>> Julien
>>>> 
>>>> 
>>>> On 30 May 2012 21:59, lewis john mcgibbney  wrote:
>>>> Good Evening Everyone,
>>>> 
>>>> A candidate for the Apache Nutch 1.5 RC3 is available at:
>>>> 
>>>> http://people.apache.org/~lewismc/apache-nutch-1.5-rc3/
>>>> 
>>>> The release candidate is a src.zip, bin.zip, src.tar.gz and bin.tar.gz
>>>> archive of the sources in:
>>>> 
>>>> http://svn.apache.org/repos/asf/nutch/tags/release-1.5-rc3/
>>>> 
>>>> Further, a staged Maven repository of the 1.5 sources.jar and
>>>> javadoc.jar is available here:
>>>> 
>>>> https://repository.apache.org/content/repositories/orgapachenutch-167/
>>>> 
>>>> Please vote on releasing this package as Apache Nutch 1.5.
>>>> The vote is open for the next 72 hours and passes if a majority of at
>>>> least three +1 Nutch PMC votes are cast.
>>>> 
>>>>  [ ] +1 Release this package as Apache Nutch 1.5
>>>>  [ ] -1 Do not release this package because...
>>>> 
>>>> Many Thanks and heres to plenty more.
>>>> 
>>>> Kind Regards,
>>>> Lewis
>>>> 
>>>> P.S. Here's my +1.
>>>> 
>>>> 
>>>> 
>>>> --
>>>> 
>>>> Open Source Solutions for Text Engineering
>>>> 
>>>> http://digitalpebble.blogspot.com/
>>>> http://www.digitalpebble.com
>>>> http://twitter.com/digitalpebble
>>>> 
>>> 
>>> 
>>> ++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: chris.a.mattm...@nasa.gov
>>> WWW:   http://sunset.usc.edu/~mattmann/
>>> ++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++
>>> 
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Open Source Solutions for Text Engineering
>>> 
>>> http://digitalpebble.blogspot.com/
>>> http://www.digitalpebble.com
>>> http://twitter.com/digitalpebble
>>> 
>> 
>> 
>> ++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattm...@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++
>> 
> 
> 
> 
> -- 
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch release 1.5 RC3

2012-05-31 Thread Mattmann, Chris A (388J)
okey dokey. 

I will try and take the time to review the RC today. Thanks for pushing
this Lewis!

Cheers,
Chris

On May 31, 2012, at 7:36 AM, Julien Nioche wrote:

> Hi, 
> 
> Depends on Lewis :-) Let's say I am +1 but if it is not too much hassle it 
> would be nice to fix it 
> 
> J.
> 
> On 31 May 2012 15:24, Mattmann, Chris A (388J) 
>  wrote:
> Hey Guys,
> 
> Does this warrant a respin, or are you +1 Juls?
> 
> Cheers,
> Chris
> 
> On May 31, 2012, at 1:44 AM, Julien Nioche wrote:
> 
> > Hi Lewis,
> >
> > Minor nitpick : the directory /runtime is not necessary as it is built with 
> > ANT. Removing it would massively reduce the size of the archive. Could we 
> > fix it for the final release?
> >
> > All fine apart from this. The content of the src archive compiles fine, the 
> > pom on the Maven repo looks good.
> >
> > Thanks a lot
> >
> > Julien
> >
> >
> > On 30 May 2012 21:59, lewis john mcgibbney  wrote:
> > Good Evening Everyone,
> >
> > A candidate for the Apache Nutch 1.5 RC3 is available at:
> >
> > http://people.apache.org/~lewismc/apache-nutch-1.5-rc3/
> >
> > The release candidate is a src.zip, bin.zip, src.tar.gz and bin.tar.gz
> > archive of the sources in:
> >
> > http://svn.apache.org/repos/asf/nutch/tags/release-1.5-rc3/
> >
> > Further, a staged Maven repository of the 1.5 sources.jar and
> > javadoc.jar is available here:
> >
> > https://repository.apache.org/content/repositories/orgapachenutch-167/
> >
> > Please vote on releasing this package as Apache Nutch 1.5.
> > The vote is open for the next 72 hours and passes if a majority of at
> > least three +1 Nutch PMC votes are cast.
> >
> >  [ ] +1 Release this package as Apache Nutch 1.5
> >  [ ] -1 Do not release this package because...
> >
> > Many Thanks and heres to plenty more.
> >
> > Kind Regards,
> > Lewis
> >
> > P.S. Here's my +1.
> >
> >
> >
> > --
> >
> > Open Source Solutions for Text Engineering
> >
> > http://digitalpebble.blogspot.com/
> > http://www.digitalpebble.com
> > http://twitter.com/digitalpebble
> >
> 
> 
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
> 
> 
> 
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch release 1.5 RC3

2012-05-31 Thread Mattmann, Chris A (388J)
Hey Guys,

Does this warrant a respin, or are you +1 Juls?

Cheers,
Chris

On May 31, 2012, at 1:44 AM, Julien Nioche wrote:

> Hi Lewis,
> 
> Minor nitpick : the directory /runtime is not necessary as it is built with 
> ANT. Removing it would massively reduce the size of the archive. Could we fix 
> it for the final release?
> 
> All fine apart from this. The content of the src archive compiles fine, the 
> pom on the Maven repo looks good.
> 
> Thanks a lot 
> 
> Julien
> 
> 
> On 30 May 2012 21:59, lewis john mcgibbney  wrote:
> Good Evening Everyone,
> 
> A candidate for the Apache Nutch 1.5 RC3 is available at:
> 
> http://people.apache.org/~lewismc/apache-nutch-1.5-rc3/
> 
> The release candidate is a src.zip, bin.zip, src.tar.gz and bin.tar.gz
> archive of the sources in:
> 
> http://svn.apache.org/repos/asf/nutch/tags/release-1.5-rc3/
> 
> Further, a staged Maven repository of the 1.5 sources.jar and
> javadoc.jar is available here:
> 
> https://repository.apache.org/content/repositories/orgapachenutch-167/
> 
> Please vote on releasing this package as Apache Nutch 1.5.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Nutch PMC votes are cast.
> 
>  [ ] +1 Release this package as Apache Nutch 1.5
>  [ ] -1 Do not release this package because...
> 
> Many Thanks and heres to plenty more.
> 
> Kind Regards,
> Lewis
> 
> P.S. Here's my +1.
> 
> 
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: 1.5 RC2

2012-05-22 Thread Mattmann, Chris A (388J)
+1

Sent from my iPhone

On May 22, 2012, at 4:43 AM, "Lewis John Mcgibbney" 
mailto:lewis.mcgibb...@gmail.com>> wrote:

Hi,

As I say, I am able to stick time in tonight to roll this RC, however does 
anyone have a problem with me rolling the 2.0 RC tonight after the 1.5RC2?

I would like to get them out the way saving me time during this week if 
possible.

Thanks

Lewis

On Tue, May 22, 2012 at 10:35 AM, Lewis John Mcgibbney 
mailto:lewis.mcgibb...@gmail.com>> wrote:
OK doke this sounds fine to me then. I will make the relevant commits to the 
1.5 branch then work at it later this evening.

I'll make a new thread when the stuff is sorted out and we are ready to VOTE on 
the new RC.

Thanks

Lewis


On Tue, May 22, 2012 at 10:15 AM, Julien Nioche 
mailto:lists.digitalpeb...@gmail.com>> wrote:
Hi Lewis,

I am sure that Chris will have no problem with you doing the RC2. Chris? It 
would be a good thing to have more than one person who knows how to do it 
anyway :-)
Note that to generate a fresh pom.xml you need to

  *   get maven-ant-tasks-2.1.3.jar and put it in the ivy dir
  *   ant -lib ivy deploy

The resulting pom.xml file should reflect the content of the main ivy.xml. I 
have committed some minor changes to the pom template in trunk, this will need 
to be copied to the 1.5 branch as well. We recently discussed a move to Maven, 
another option would be to manage the dependencies with the Maven Ant task, 
which would save us the hassle of having to keep the ivy.xml and pom.xml in 
sync. We'll see

Thanks

Julien


--
[http://digitalpebble.com/img/logo.gif]
Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble




--
Lewis




--
Lewis



Re: 1.5 RC2

2012-05-22 Thread Mattmann, Chris A (388J)
+1 happy for Lewis to try I've been swamped!

Sent from my iPhone

On May 22, 2012, at 2:16 AM, "Julien Nioche" 
mailto:lists.digitalpeb...@gmail.com>> wrote:

Hi Lewis,

I am sure that Chris will have no problem with you doing the RC2. Chris? It 
would be a good thing to have more than one person who knows how to do it 
anyway :-)
Note that to generate a fresh pom.xml you need to

  *   get maven-ant-tasks-2.1.3.jar and put it in the ivy dir
  *   ant -lib ivy deploy

The resulting pom.xml file should reflect the content of the main ivy.xml. I 
have committed some minor changes to the pom template in trunk, this will need 
to be copied to the 1.5 branch as well. We recently discussed a move to Maven, 
another option would be to manage the dependencies with the Maven Ant task, 
which would save us the hassle of having to keep the ivy.xml and pom.xml in 
sync. We'll see

Thanks

Julien


--
[http://digitalpebble.com/img/logo.gif]
Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble



Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-05-09 Thread Mattmann, Chris A (388J)
Hey Julien,

On May 9, 2012, at 3:11 AM, Julien Nioche wrote:

> Hi Chris
> 
> Any chance you could do a RC2 for the trunk soonish? We've been a bit stuck 
> since mid April and it would be nice to move on. If not I can try and spin a 
> RC myself but it is likely to be hilarious :-)

Haha, no worries. I will try and get one going for this weekend. And I'm sure 
you'd do fine! :)

> 
> Re-Maven : I am not against moving to Maven at all : it would make it easier 
> to publish the artefacts + nice integration with Eclipse + most devs familiar 
> with it etc... not sure about the best way to deal with the plugins though - 
> treat them as modules? any thoughts on this?

Yeah this is something I would definitely like to explore for 1.6+ -- I think 
we could just do Maven pom.xml files for each plugin and then do a 
multi-aggregator core
project that built core first, then all the plugins post facto. 

I will file an issue to explore this for 1.6.

Thanks!

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Suitable naming for > Nutchgora branch?

2012-04-25 Thread Mattmann, Chris A (388J)
Great work Lewis, thanks!

Cheers,
Chris

On Apr 25, 2012, at 4:01 PM, Lewis John Mcgibbney wrote:

> Hi Everyone,
> 
> As you guys will have seen I've quickly polluted our dev list again 
> (sorry!!!) with set and classify for 2.1.
> 
> The open issues for 2.0 are ones which I think we could address within the 
> 2.0 release. This is merely my opinion, based upon the assertion that they 
> all contain patches which could be up for review. With the exception of 
> NUTCH-879 which is pretty alarming. I'll test shortly.
> 
> I'm now away to bed.
> 
> Best
> 
> Lewis
> 
> On Wed, Apr 25, 2012 at 3:06 PM, Mattmann, Chris A (388J) 
>  wrote:
> Hi Guys,
> 
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Suitable naming for > Nutchgora branch?

2012-04-25 Thread Mattmann, Chris A (388J)
Hi Guys,

Yep I think we've beat the dead horse here about the name :)

This is a good recent discussion/summary: http://s.apache.org/CoY
and I think it had some productive outcomes. I envision a world in
which we keep releasing the current 1.x series until we get up to 1.9,
and then hopefully in parallel release a set of 2.x (eventually release
2.9 if we get that far) and either 3.x is the merge of 1.x and 2.x, or 
1.x becomes 3.x and we leapfrog 2.x to 4.x, etc etc.

IOW, releasing from branches with active maintainers is absolutely
fine and encouraged within Apache. NutchGora right now has at least
Ferdy and Lewis (and you can count me in even though my support 
for the moment is limited to RM'ing) so that's ~3, the trunk has Julien, 
Markus, Lewis, 
myself and others so that's 4+ active peeps, so both branches have plenty
of people who care deeply about releasing Nutch and kicking butt. So
we're all good here.

Net: here's a productive next step for nutchgora. Let's simply release it.
There is nothing preventing us from doing that. If 3 +1s come in from
Nutch PMC members, we can release :) I'd be happy to RM it, as I 
stated in http://s.apache.org/CoY so let's move forward especially
now that there is a Gora 0.2 release (hat tip, Lewis).

Cheers,
Chris

P.S. Yes, and by the way, self-flails, let's release Nutch 1.5 and get
on with that too! *grin*

On Apr 25, 2012, at 6:22 AM, Julien Nioche wrote:

> 
> I must say that since the move of Nutchgora from trunk to branch it's kind of 
> odd that it's still referred to as 2.x. (For now that's okay I guess).
> 
> Moving it from the trunk made a lot of sense and has been abundantly 
> discussed on this list. We had one stable version which is actively 
> maintained and currently used by most people (1.x) and an experimental one 
> largely untested and used by a minority (2.x). Hopefully when nutchgora (for 
> which 2.x is a better name indeed) has had a couple of releases and is used 
> by a larger number of people it will naturally find its place as trunk but 
> for now since most releases are based on 1.x I think the latter should remain 
> the trunk
> 
> Julien
> 
> On Wed, Apr 25, 2012 at 10:46 AM, Lewis John Mcgibbney 
>  wrote:
> Good Morning,
> 
> Does anyone have a differing opinion on naming next development track for 
> Nutchgora branch 2.1?
> 
> Before I set and classify most issues it would be good to know.
> 
> Thank you
> 
> Lewis
> 
> -- 
> Lewis 
> 
> 
> 
> 
> 
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-04-19 Thread Mattmann, Chris A (388J)
Hey Julien thanks for the help below. I will try running some of the ant tasks 
(sorry I'm a Maven wonk ;) ) and get this working hopefully this week. I have
a big proposal deadline on Friday but should come up for air after that
heading into the weekend and get this done.

Cheers,
Chris

On Apr 19, 2012, at 3:56 AM, Julien Nioche wrote:

> Hi Chris
> 
> 
> >
> > -1 the versions of the deps for hadoop, tika and possibly others are not 
> > correct in the pom.xml found in the src archive and on the mvn repository, 
> > which will be a problem for whoever tries to use the pom.xml file e.g. in 
> > Eclipse or more annoyingly declare Nutch as a dependency with Ivy / Maven. 
> > Did you regenerate the pom file from the ivy one?
> 
> I didn't regenerate it -- but will try and do so for RC #2.
> 
> Should have been done automatically when calling 'ant deploy' - if not might 
> be that the maven task jar is missing from lib 
>  
> 
> >
> > I remember that we mentioned delivering the content of runtime/local in the 
> > binary archive instead of having the sources + runtime/deploy as well.
> [..snip...]
> >  I don't think it would take much time to do that, so what about doing it 
> > now? We could rename the archive into apache-nutch-1.5-local-bin maybe to 
> > make the content clearer.
> 
> +1 to the above, but I think we can just have it be apache-nutch-1.5-bin -- 
> no need to rename it to local. We can just
> reference this ML thread for documentation in the future.
> 
> 
> I've committed in trunk revision 1327896 a new ant task which will generate a 
> binary package as described above. You'll probably need to modify the code 
> for the tar / zip as well but this should give you a starting point
>  
> Thanks
> 
> Julien
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: NUTCH-1129

2012-04-17 Thread Mattmann, Chris A (388J)
Hey Lewis,

On Apr 17, 2012, at 3:35 AM, Lewis John Mcgibbney wrote:

> 3) We previously discussed implementing the Any23 parser plugin as a tika 
> wrapper, therefore it would look very similar to parse-tika?

I think it would be super awesome to add the Any23 parsing functionality as a 
Tika parser, and potentially
an extension to the MIME repository to detect microformats, etc. Then in Nutch, 
we could take advantage of
the any23 parser with the existing tika-parser interface.

Thoughts?

Thanks!

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-04-16 Thread Mattmann, Chris A (388J)
Hey Lewis,

Hmm, not sure on the MD5 and SHA -- they seem to validate for me
and seemed to work at least Sami (and Markus?). Guys, any idea what's
up with Lewis's verification step here? 

Lewis, you may try re-downloading and verifying them again, but wait
until RC #2 on that. I'll fix the NOTICE file for RC #2 as you mention below
and not sure why the extension was .tar.gz.tar.gz, I'll fix that too.

Cheers,
Chris

On Apr 16, 2012, at 3:12 AM, Lewis John Mcgibbney wrote:

> Hi Chris,
> 
> On Mon, Apr 16, 2012 at 6:43 AM, Mattmann, Chris A (388J) <
> chris.a.mattm...@jpl.nasa.gov> wrote:
> 
>> Hi Folks,
>> 
>> A candidate for the Nutch 1.5 release is available at:
>> 
>> http://people.apache.org/~mattmann/apache-nutch-1.5/rc1/
>> 
> 
> I used the KEYS file stored on SVN under the 1.5 tag (as below), and got
> the following when verifying the above RC (stored on your p.a.o area)
> 
> lewis@lewis-01:~/Desktop$ gpg --import KEYS
> gpg: key A7239D59: "Doug Cutting (Lucene guy) " not
> changed
> gpg: key 7C491924: public key "Piotr Kosiorowski "
> imported
> gpg: key 0B7E6CFA: public key "Sami Siren " imported
> gpg: key 57163A4D: public key "Dennis E. Kubes " imported
> gpg: key 24BCF054: public key "Chris A. Mattmann "
> imported
> gpg: Total number processed: 5
> gpg:   imported: 4
> gpg:  unchanged: 1
> gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
> 
> lewis@lewis-01:~/Desktop$ gpg --verify apache-nutch-1.5-bin.tar.tar.gz.asc
> gpg: no signed data
> gpg: can't hash datafile: file open error
> lewis@lewis-01:~/Desktop$ gpg --verify apache-nutch-1.5-bin.zip.asc
> gpg: Signature made Mon 16 Apr 2012 06:00:20 BST using DSA key ID B876884A
> gpg: Can't check signature: public key not found
> lewis@lewis-01:~/Desktop$ gpg --verify apache-nutch-1.5-src.tar.gz.asc
> gpg: Signature made Mon 16 Apr 2012 06:00:18 BST using DSA key ID B876884A
> gpg: Can't check signature: public key not found
> lewis@lewis-01:~/Desktop$ gpg --verify apache-nutch-1.5-src.zip.asc
> gpg: Signature made Mon 16 Apr 2012 06:00:22 BST using DSA key ID B876884A
> gpg: Can't check signature: public key not found
> lewis@lewis-01:~/Desktop$ md5sum apache-nutch-1.5-bin.tar.tar.gz.asc
> e32088205efd59ffc882c79add0bafae  apache-nutch-1.5-bin.tar.tar.gz.asc
> lewis@lewis-01:~/Desktop$ md5sum apache-nutch-1.5-bin.zip.asc
> ff7960b8540673a86756f6b3f53ffd79  apache-nutch-1.5-bin.zip.asc
> lewis@lewis-01:~/Desktop$ md5sum apache-nutch-1.5-src.tar.gz.asc
> 9da161bcd5ec0de3f702a12e6bfbf9e6  apache-nutch-1.5-src.tar.gz.asc
> lewis@lewis-01:~/Desktop$ md5sum apache-nutch-1.5-src.zip.asc
> 6750bbc93b028776fa888f988df3a614  apache-nutch-1.5-src.zip.asc
> 
> Some comments:
> 1) I don't think the tar should be appended twice for the
> apache-nutch-1.5-bin.tar.tar.gz artefact and accompanying sigs.
> 2) None of my other attempts to verify the other artefacts via gpg worked!
> 3) All attempts to verify via md5sum did not match the strings present in
> your p.a.o area!
> 4) Really really trivial, but in our NOTICE file, it stated a date of 2009.
> I should have picked this up a while ago when I updated the other dates in
> these files, this one seems to have slipped through the net.
> 
> 
>> The release candidate is a zip and tar.gz archive of the sources in:
>> 
>> http://svn.apache.org/repos/asf/nutch/tags/release-1.5/
>> 
> 
> Stuff in SVN tag looks OK apart from the stuff I mentioned above.
> 
> 
>> 
>> And a binary build suitable for deployment.
>> 
>> A staged Maven repository is available here:
>> 
>> https://repository.apache.org/content/repositories/orgapachenutch-054/
>> 
> 
> I've not got around to checking the gpg and md5sum verifications yet, as
> I'm waiting for someone to confirm that the above failed verifications are
> correct before I do so. I'm hoping that I've made a mistake somewhere.
> 
> 
>> 
>> [X ] -1 Do not release this package because...
>> 
>> Because of the above, unless I discover that I've done something wrong
> then I can't VOTE yes. I'm open to discussion on this, if someone can
> display that I've taken a wrong turn somewhere then I might change my VOTE
> however for the time being I need to call this one down.
> 
> Thanks for spinning the RC Chris.
> 
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-04-16 Thread Mattmann, Chris A (388J)
Hey Sami,

Thanks. I'll fix the 4 license headers you mention below as part of RC #2.

Cheers,
Chris

On Apr 16, 2012, at 3:02 AM, Sami Siren wrote:

> On Mon, Apr 16, 2012 at 8:43 AM, Mattmann, Chris A (388J)
>  wrote:
>> Hi Folks,
>> 
>> A candidate for the Nutch 1.5 release is available at:
>> 
>>  http://people.apache.org/~mattmann/apache-nutch-1.5/rc1/
>> 
>> The release candidate is a zip and tar.gz archive of the sources in:
>> 
>>  http://svn.apache.org/repos/asf/nutch/tags/release-1.5/
>> 
>> And a binary build suitable for deployment.
>> 
>> A staged Maven repository is available here:
>> 
>> https://repository.apache.org/content/repositories/orgapachenutch-054/
>> 
>> Please vote on releasing this package as Apache Nutch 1.5.
>> The vote is open for the next 72 hours and passes if a majority of at
>> least three +1 Nutch PMC votes are cast.
>> 
>>  [ ] +1 Release this package as Apache Nutch 1.5
>>  [ ] -1 Do not release this package because...
>> 
> 
> The basics are good:
> md5 and sha1 checksums for apache-nutch-1.5-bin.tar.gz and
> apache-nutch-1.5-src.tar.gz  match
> "ant clean test" completes succesfully for the source package
> completed a simple crawl with local mode and a small hadoop 1.0.2
> cluster by using the artifacts in the binary package
> 
> but it seems there are some license headers missing from source files:
> [rat:report]  
> ==/home/sam/nutch/apache-nutch-1.5/src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java
> [rat:report]  
> ==/home/sam/nutch/apache-nutch-1.5/src/plugin/creativecommons/src/web/web.xml
> [rat:report]  
> ==/home/sam/nutch/apache-nutch-1.5/src/plugin/protocol-httpclient/src/test/conf/httpclient-auth-test.xml
> [rat:report]  
> ==/home/sam/nutch/apache-nutch-1.5/src/plugin/protocol-httpclient/src/test/conf/nutch-site-test.xml
> 
> -1 because of missing license headers
> 
> --
> Sami Siren


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-04-16 Thread Mattmann, Chris A (388J)
Hi Julien,

On Apr 16, 2012, at 2:02 AM, Julien Nioche wrote:

> Thanks Chris, 
> 
> -1 the versions of the deps for hadoop, tika and possibly others are not 
> correct in the pom.xml found in the src archive and on the mvn repository, 
> which will be a problem for whoever tries to use the pom.xml file e.g. in 
> Eclipse or more annoyingly declare Nutch as a dependency with Ivy / Maven. 
> Did you regenerate the pom file from the ivy one?

I didn't regenerate it -- but will try and do so for RC #2.

> 
> I remember that we mentioned delivering the content of runtime/local in the 
> binary archive instead of having the sources + runtime/deploy as well. 
[..snip...]
>  I don't think it would take much time to do that, so what about doing it 
> now? We could rename the archive into apache-nutch-1.5-local-bin maybe to 
> make the content clearer.

+1 to the above, but I think we can just have it be apache-nutch-1.5-bin -- no 
need to rename it to local. We can just
reference this ML thread for documentation in the future.

I'll include the above 2 things when I re-roll an RC #2 hopefully in the next 
few days.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



[VOTE] Apache Nutch 1.5 release rc #1

2012-04-15 Thread Mattmann, Chris A (388J)
Hi Folks,

A candidate for the Nutch 1.5 release is available at:

  http://people.apache.org/~mattmann/apache-nutch-1.5/rc1/

The release candidate is a zip and tar.gz archive of the sources in:

  http://svn.apache.org/repos/asf/nutch/tags/release-1.5/

And a binary build suitable for deployment. 

A staged Maven repository is available here:

https://repository.apache.org/content/repositories/orgapachenutch-054/

Please vote on releasing this package as Apache Nutch 1.5.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Nutch PMC votes are cast.

  [ ] +1 Release this package as Apache Nutch 1.5
  [ ] -1 Do not release this package because...

Thanks!

Cheers,
Chris

P.S. Here's my +1.

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Nutch 1.x trunk release

2012-04-10 Thread Mattmann, Chris A (388J)
Hey Julien,

Yeah my weekend flew by -- this and the SIS RC are the top items on my
opensource TODO :)

Hopefully this week...

Cheers,
Chris

On Apr 10, 2012, at 8:07 AM, Julien Nioche wrote:

> Hi guys, 
> 
> Chris - any idea of if / when you'll have the time to do a RC for trunk?
> 
> Thanks
> 
> Julien
> 
> On 3 April 2012 15:30, Mattmann, Chris A (388J) 
>  wrote:
> Thanks Lewis!
> 
> Cheers,
> Chris
> 
> P.S. Hopefully by this weekend...
> 
> On Apr 3, 2012, at 7:23 AM, Lewis John Mcgibbney wrote:
> 
> > Hi,
> >
> > On Tue, Apr 3, 2012 at 3:12 PM, Markus Jelsma  
> > wrote:
> >
> >
> > Seems fine. Only updating KEYS is no longer necessary.
> >
> > Now sorted.
> >
> > Thanks whenever you can get round to this Chris.
> >
> > Best
> >
> > Lewis
> 
> 
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
> 
> 
> 
> 
> -- 
> 
> Open Source Solutions for Text Engineering
> 
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: NutchGora release, and Nutch 1.x trunk release

2012-04-03 Thread Mattmann, Chris A (388J)
Thanks Lewis!

Cheers,
Chris

P.S. Hopefully by this weekend...

On Apr 3, 2012, at 7:23 AM, Lewis John Mcgibbney wrote:

> Hi,
> 
> On Tue, Apr 3, 2012 at 3:12 PM, Markus Jelsma  
> wrote:
> 
> 
> Seems fine. Only updating KEYS is no longer necessary.
> 
> Now sorted.
> 
> Thanks whenever you can get round to this Chris.
> 
> Best
> 
> Lewis


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: NutchGora release, and Nutch 1.x trunk release

2012-04-03 Thread Mattmann, Chris A (388J)
Hi Markus,

On Apr 3, 2012, at 5:50 AM, Markus Jelsma wrote:

> Cool! 
> 
> Next time i'll ask infra to allow to supress notifications.
> 
> Chris, will you RM one RC? And if possible list the detailed steps/command in 
> the process in case you don't have to time RM 1.6 when the time comes. The 
> wiki is dated.

Happy to RM it. 

Check the wiki here:

http://wiki.apache.org/nutch/Release_HOWTO

Lewis and I updated this after the last release. It's more or less what's 
required to 
release the project and what I run. It's also really similar to the OODT 
release 
process:

https://cwiki.apache.org/confluence/display/OODT/Release+Process

Was there something specific that you weren't seeing there?

> 
> I'm looking forward to yet another big release with lots of fixes and 
> improvements!

Agreed, thanks everyone!

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: NutchGora release, and Nutch 1.x trunk release

2012-03-08 Thread Mattmann, Chris A (388J)
Hey Guys,

OK, sounds good. Looks like we need to wait for the Tika 1.1 release (seems to 
be going
well so far), and then try and push Gora 0.2 (which I know Lewis is pushing, 
and which 
I'm happy to RM once we're ready there). So, maybe I'll shoot for next weekend
or the weekend after to push Nutch 1.5 and 2.0 RCs.

Cheers,
Chris

On Mar 8, 2012, at 7:23 AM, Lewis John Mcgibbney wrote:

> Yeah I agree Chris & Markus.
> 
> On the Nutchgora note, I would like to see Gora 0.2. released before hand, as 
> we have a blocking issue NUTCH-1205 with Ivy retrieving alien Gora 
> 0.2-SNAPSHOT dependencies from repository.apache.org. We should be able to 
> overcome this issue by releasing Gora 0.2 to maven central then just pulling 
> those dependencies with Ivy in Nutchgora rather than messing about with 
> chain/multiple/snapshot resolvers in the Ivy configuration.
> 
> My 2 cents
> 
> On Thu, Mar 8, 2012 at 3:03 PM, Markus Jelsma  
> wrote:
> +1
> 
> 1.5 has, again, many fixes and improvements, just as 1.4 had over 1.3. But i'd
> like to integrate Tika 1.1 after its pending release.
> 
> Cheers
> 
> On Thursday 08 March 2012 15:38:15 Mattmann, Chris A (388J) wrote:
> > Hey Guys,
> >
> > I've got some cycles this weekend -- anyone up for a 1.5 release off trunk
> > (stable), and a NutchGora branch release? I suggested this before [1]
> > regarding NutchGora. I'm inclined to say let's do the following:
> >
> > 1. NutchGora: apache-nutch-2.0 - release 2.x series based on this branch
> > 2. Nutch: apache-nutch-1.x - stable "trunk" branch
> >
> > Then, when the time comes, we can try and create a:
> >
> > 3. Nutch: apache-nutch-3.x - merge of 1.x and 2.x feature branches
> >
> > Would this make sense? Anyways we don't have to decide anything now that
> > we can't undo later, but are folks OK with me doing an RC for NutchGora and
> > for 1.x this weekend?
> >
> > Cheers,
> > Chris
> >
> > [1] http://s.apache.org/GD2
> >
> > ++
> > Chris Mattmann, Ph.D.
> > Senior Computer Scientist
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 171-266B, Mailstop: 171-246
> > Email: chris.a.mattm...@nasa.gov
> > WWW:   http://sunset.usc.edu/~mattmann/
> > ++
> > Adjunct Assistant Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++
> 
> --
> Markus Jelsma - CTO - Openindex
> 
> 
> 
> -- 
> Lewis 
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



NutchGora release, and Nutch 1.x trunk release

2012-03-08 Thread Mattmann, Chris A (388J)
Hey Guys,

I've got some cycles this weekend -- anyone up for a 1.5 release off trunk 
(stable), and
a NutchGora branch release? I suggested this before [1] regarding NutchGora.
I'm inclined to say let's do the following:

1. NutchGora: apache-nutch-2.0 - release 2.x series based on this branch
2. Nutch: apache-nutch-1.x - stable "trunk" branch

Then, when the time comes, we can try and create a:

3. Nutch: apache-nutch-3.x - merge of 1.x and 2.x feature branches

Would this make sense? Anyways we don't have to decide anything now that
we can't undo later, but are folks OK with me doing an RC for NutchGora and for
1.x this weekend?

Cheers,
Chris

[1] http://s.apache.org/GD2

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Apply to solve issue

2012-03-06 Thread Mattmann, Chris A (388J)
Hi Yang,

I'd be willing to mentor this project. I tagged with GSOC, so it's now eligible 
on the ASF ComDev list
for a project. Please contact d...@community.apache.org to get the info on how 
to apply. Here is 
some of it:

http://community.apache.org/gsoc.html

I'd be happy to mentor:

https://issues.apache.org/jira/browse/NUTCH-366

Cheers,
Chris

On Mar 6, 2012, at 5:25 AM, Lewis John Mcgibbney wrote:

> Hi Yang,
> 
> I think this would be a missed opportunity if we didn't take you up on this 
> offer.
> I can only assume that the development community are short on time, hence why 
> no-one has replied to this thread.
> 
> Is there any reason that you wish to attack this particular issue? Without 
> providing justification for the choice of project it may be harder to drum up 
> mentor support.
> 
> Does anyone have a better suggestions for possible issues?
> 
> Thanks
> 
> Lewis
> 
> 2012/3/4 Xiaolong Yang 
> Hello,
>I am a college student, while a new people of nutch.
> 
> I want to apply for GSOC as a student and work under project such as 
> NUTCH-366. Can there anyone who are willing to become my mentor or 
> 
> give me any suggestions?
> 
>I am also a new people use of maillist, if disturb, please forgive me!
> 
> 
> Best Wishes
> 
> 
> 
> 
> my mail: yangxiaolong2...@gmail.com
> 
> 
> 
> -- 
> Lewis 
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Fwd: Google Summer of Code 2012 upcoming

2012-03-04 Thread Mattmann, Chris A (388J)
Guys, FYI...in case anyone is thinking of GSoC, deadlines are approaching. 
Process
is described below...

Thanks!

Cheers,
Chris

Begin forwarded message:

> From: Ulrich Stärk 
> Date: March 4, 2012 9:01:07 AM PST
> To: "p...@apache.org" 
> Cc: "d...@community.apache.org" 
> Subject: Google Summer of Code 2012 upcoming
> Reply-To: "priv...@hadoop.apache.org" 
> 
> Hello PMCs,
> 
> Google Summer of Code is the ideal opportunity for you to attract new
> contributors to your projects.
> 
> If you want to participate with your project you NOW need to
> 
> - understand what it means to be a mentor [1]
> - propose your project ideas. Just label your issues with gsoc2012 in JIRA and
>  they will show up at [2]. See also [1].
> - subscribe to code-awa...@apache.org (restricted to potential mentors, meant 
> to be used
>  as a private list - general discussions on the public
>  d...@community.apache.org list as much as possible please)
> 
> The ASF will apply as a participating organization with GSoC, your project
> doesn't need to do that. See [3] for more information. Note that the ASF isn't
> accepted yet, nevertheless you *really* should start recording your ideas now.
> 
> Last year we had 38 students completing GSoC successfully, some of which are
> now active contributors to the projects they worked on. Let's make this a
> success again this year!
> 
> On behalf of the GSoC 2012 admins,
> 
> Uli
> 
> [1] http://community.apache.org/guide-to-being-a-mentor.html
> [2] http://s.apache.org/gsoc2012tasks
> [3] http://community.apache.org/gsoc.html
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Fwd: [blog post] Accumulo, Nutch, and Gora

2012-02-28 Thread Mattmann, Chris A (388J)
FYI...awesome!

Begin forwarded message:

> From: Jason Trost 
> Date: February 28, 2012 5:41:23 PM PST
> To: "common-u...@hadoop.apache.org" 
> Subject: [blog post] Accumulo, Nutch, and Gora
> Reply-To: "common-u...@hadoop.apache.org" 
> 
> Blog post for anyone who's interested.  I cover a basic howto for
> getting Nutch to use Apache Gora to store web crawl data in Accumulo.
> 
> Let me know if you have any questions.
> 
> Accumulo, Nutch, and GORA
> http://www.covert.io/post/18414889381/accumulo-nutch-and-gora
> 
> --Jason


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [DISCUSS] Nutchgora 2.0 release

2012-02-20 Thread Mattmann, Chris A (388J)
+1 guys. Just let me know when you are ready and I can RM it.

Cheers,
Chris

On Feb 20, 2012, at 8:01 AM, Lewis John Mcgibbney wrote:

> Hi,
> 
> Not ignoring Chris' comments, but addressing the points below first, please 
> see comments.
> 
> On Mon, Feb 20, 2012 at 2:57 PM, Ferdy Galema  
> wrote:
> Aside from the licensing issue, the only thing I really see as a blocker or 
> as something we need to deal with first is Nutch-1205 (upgrade Gora libs). 
> What are we going to do with that one? 
> I'm going to have another crack with these Ivy resolvers, really quite hard 
> to debug. I can only assume the unresolved dependencies are picked up 
> somewhere upstream! As I said I'm going to try and crack this one maybe today 
> if I get the time.
>  
> 
> About the Nutch API (webapp), my colleague and I have some ideas about how to 
> improve it, in such as way that it is really easy to use. It won't definitely 
> be ready in a upcoming release, especially when there will be a release very 
> soon. Please see the issue[1] for details. I'm not sure what to do with the 
> current webapp implementation, but my suggestion is to to just leave it be as 
> it. (Perhaps mark it as a work-in-progress)
> 
> This sounds really encouraging. Somewhere in my crazy pot of thoughts was to 
> progress with establishing this task as a GSoC project. In reflection, I 
> think it would be excellent if the work could be dev/user community driven as 
> it would cater exactly for what we need and want.
> 
> Please see here for the most up-to-date work I could get in this stuff. I 
> updated it slightly to reflect some recent findings. I'll report back when I 
> get more time on the blocker you mention above.
> 
> http://wiki.apache.org/nutch/NutchAdministrationUserInterface


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [DISCUSS] Nutchgora 2.0 release

2012-02-18 Thread Mattmann, Chris A (388J)
Hey Lewis,

I'd be +1 to roll a Nutchgora 2.0 release.

I could see dealing with this in two ways, neither of which I like better than 
the other:

1. Release the nutchgora branch as "apache-nutch-2.0", and then nutchgora 
becomes
the 2.0 branch of the system (and we could create branch-2.0) The 1.x trunk 
branch, as it evolves and gets closer to 
2.0, the last release of it is 1.9, then we do 3.0, which could either be: 
  - a merge or combination of 1.x features and 2.x features
  - simply the next path for 1.x, and independent of 2.x

2. Call the artifact, "apache-nutchgora-2.0", independent of the current trunk 
artifact and its release cycle.

Either way, is fine with me.

Cheers,
Chris

On Feb 17, 2012, at 7:23 AM, Lewis John Mcgibbney wrote:

> Hi Guys,
> 
> Here we are again :0)
> 
> What are the perceptions with aiming for a 2.0 release? We have one blocking 
> issue, the webapp, which I got no response from the community at large about. 
> I would like to see this addressed but this is another issue.
> 
> Speaking with the future in mind, we are hoping to get a Gora 0.2 release out 
> of the door, once a licensing issue is dealt with (the only blocker) and a 
> few other things. Therefore would it be realistic to aim for a Nutch 2.0 
> release shortly after that?
> 
> My justification for raising this thread again, is that we are seeing (some) 
> more users interested in this branch/code, I think it is a real shame that we 
> have not been able to get a release yet. I would really like to get more 
> people using the code and hopefully getting involved in identifying bugs, and 
> fixing them if possible.
> 
> The question has been open for ages, so I just wonder if anything has changed 
> now that Gora is doing better as of recent.
> 
> Thanks
> 
> Lewis
> 
> -- 
> Lewis 
> 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Fwd: [Announce] Google Summer of Code 2012

2012-02-05 Thread Mattmann, Chris A (388J)
FYI

Begin forwarded message:

> From: Ross Gardler 
> Date: February 5, 2012 1:45:18 PM PST
> To: "d...@community.apache.org" 
> Subject: RE: [Announce] Google Summer of Code 2012
> Reply-To: "d...@community.apache.org" 
> 
> For those new to GSoC you might want to review the roles defined at
> http://community.apache.org/mentoringprogramme.html and the GSoC specific
> info at http://community.apache.org/gsoc.html (yet to be updated for 2012)
> 
> Sent from my mobile device, please forgive errors and brevity.
> On Feb 5, 2012 8:31 PM, "Franklin, Matthew B."  wrote:


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Fwd: [Announce] Google Summer of Code 2012

2012-02-05 Thread Mattmann, Chris A (388J)
Any Nutch Devs interested in a GSoC student?

Begin forwarded message:

> From: Luciano Resende 
> Date: February 4, 2012 10:40:03 AM PST
> To: "d...@community.apache.org" , code-awards 
> 
> Subject: Fwd: [Announce] Google Summer of Code 2012
> Reply-To: "d...@community.apache.org" 
> 
> -- Forwarded message --
> From: Carol Smith 
> Date: Sat, Feb 4, 2012 at 8:44 AM
> Subject: [Announce] Google Summer of Code 2012
> To: Google Summer of Code Discuss
> 
> 
> 
> Hi all,
> 
> We're pleased to announce that Google Summer of Code will be happening
> for its eighth year this year. Please check out the blog post [1]
> about the program and read the FAQs [2] and Timeline [3] on Melange
> for more information.
> 
> Please consider translating the presentations and/or flyers into your
> native language and submitting them directly to me to post on the
> wiki. Localization for our material is integral to reaching the widest
> possible audience around the world.
> 
> [1] - 
> http://google-opensource.blogspot.com/2012/02/google-summer-of-code-2012-is-on.html
> [2] - 
> http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2012/faqs
> [3] - http://www.google-melange.com/gsoc/events/google/gsoc2012
> 
> Cheers,
> Carol
> 
> 
> -- 
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: % of different content types out there on the web

2012-01-31 Thread Mattmann, Chris A (388J)
Hi Markus,

Thanks for the FYI. Any idea at specific %'s for those unwanted suffixes 
compared
to the size of the entire corpus?

Cheers,
Chris

On Jan 31, 2012, at 4:39 AM, Markus Jelsma wrote:

> We only crawl HTML and PDF files for a lot of cc-TLD's so we only have data 
> on 
> those two. However, we also explicitly filter out all/most unwanted suffixes. 
> We do have a lot of suffixes that we encountered so far.
> 
> On Saturday 28 January 2012 03:01:26 Mattmann, Chris A (388J) wrote:
>> (sorry for the cross post)
>> 
>> Hey Guys,
>> 
>> I'm trying to find a good citation or estimate (if anyone has done one)
>> that estimates the breakout (by % or some other metric) of content types
>> out there out the web (with a whole web crawl or a meaningful
>> representative dataset) that are non HTML.
>> 
>> Anyone have any ideas about this?
>> 
>> Thanks!
>> 
>> Cheers,
>> Chris
>> 
>> ++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattm...@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++
> 
> -- 
> Markus Jelsma - CTO - Openindex


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



% of different content types out there on the web

2012-01-27 Thread Mattmann, Chris A (388J)
(sorry for the cross post)

Hey Guys,

I'm trying to find a good citation or estimate (if anyone has done one) that 
estimates
the breakout (by % or some other metric) of content types out there out the web
(with a whole web crawl or a meaningful representative dataset) that are non 
HTML.

Anyone have any ideas about this?

Thanks!

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [DISCUSS] Issues with Fetcher

2012-01-21 Thread Mattmann, Chris A (388J)
Hi Ken,

On Jan 21, 2012, at 10:33 AM, Ken Krugler wrote:
> 
> My own personal favorite area would be to integrate with crawler-commons.

+1. Would you crawler-commons guys be interested in bringing that code to 
Apache?
How about bringing it over to Nutch? 

Would that be something you'd be interested in?

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



  1   2   3   >