Re: [VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-26 Thread Mattmann, Chris A (388J)
Hey Andrzej,

Okey dokey, np! Let's get the patch in first :) I can cut as many RCs as needed.

Cheers,
Chris

On 4/26/10 11:30 AM, "Andrzej Bialecki"  wrote:

On 2010-04-26 17:19, Mattmann, Chris A (388J) wrote:
> Hi Grant,
>
> Thanks. I think it actually makes sense to finish off 1.1, and since there is 
> overlap with the Nutch PMC and the Lucene PMC and since the thread started in 
> Lucene before the TLP, I think it would be great e.g., if Andrzej, and Sami 
> could check the release and that way we still have the continuity and can 
> safely push it out as the last Nutch rel under the Lucene umbrella...
>
> Then all releases post 1.1 can cleanly be done under the auspices of the new 
> PMC :)

I know that Dennis Kubes just discovered a bug in SegmentMerger (he may
report on it in a moment) - this bug has been there for a while, it's
likely the cause of the mysterious "out of disk space" errors, and it
manifests itself only with input files larger than HDFS block size
(64MB). Since 1.1 is likely the final release of Nutch 1.x I think it
would make sense to fix this bug before we release ...

--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-26 Thread Andrzej Bialecki
On 2010-04-26 17:19, Mattmann, Chris A (388J) wrote:
> Hi Grant,
> 
> Thanks. I think it actually makes sense to finish off 1.1, and since there is 
> overlap with the Nutch PMC and the Lucene PMC and since the thread started in 
> Lucene before the TLP, I think it would be great e.g., if Andrzej, and Sami 
> could check the release and that way we still have the continuity and can 
> safely push it out as the last Nutch rel under the Lucene umbrella...
> 
> Then all releases post 1.1 can cleanly be done under the auspices of the new 
> PMC :)

I know that Dennis Kubes just discovered a bug in SegmentMerger (he may
report on it in a moment) - this bug has been there for a while, it's
likely the cause of the mysterious "out of disk space" errors, and it
manifests itself only with input files larger than HDFS block size
(64MB). Since 1.1 is likely the final release of Nutch 1.x I think it
would make sense to fix this bug before we release ...

-- 
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: [VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-26 Thread Mattmann, Chris A (388J)
Hi Grant,

Thanks. I think it actually makes sense to finish off 1.1, and since there is 
overlap with the Nutch PMC and the Lucene PMC and since the thread started in 
Lucene before the TLP, I think it would be great e.g., if Andrzej, and Sami 
could check the release and that way we still have the continuity and can 
safely push it out as the last Nutch rel under the Lucene umbrella...

Then all releases post 1.1 can cleanly be done under the auspices of the new 
PMC :)

Cheers,
Chris


On 4/26/10 5:34 AM, "Grant Ignersoll"  wrote:

Might I suggest, that since Nutch is now a TLP that you delay this release by a 
few weeks and have the vote done under the auspices of the Nutch PMC?

Cheers,
Grant

On Apr 26, 2010, at 1:55 AM, Mattmann, Chris A (388J) wrote:

> Hi Folks,
>
> I have posted an updated candidate for the Apache Nutch 1.1 release. The
> source code is at:
>
> http://people.apache.org/~mattmann/apache-nutch-1.1/rc2/
>
> The major difference between this release and rc #1 is the application of
> NUTCH-812 - Crawl.java incorrectly uses the Generator API resulting in NPE -
> as well as some commits by Sami Siren to fix missing ASL license headers.
>
> For more detailed information, see the included CHANGES.txt file for details
> on release contents and latest changes. The release was made using the Nutch
> release process, documented on the Wiki here:
>
> http://bit.ly/d5ugid
>
> A Nutch 1.1 tag is at:
>
> http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1/
>
> 
> There was a request by Sami Siren that the tutorial be updated to reflect
> the fact that this release is a source-only release, as well as a request to
> integrate RAT into the build, however, in the interest of getting this 1.1
> out and getting going on the Nutch TLP, my proposal is:
>
> * update the docs independent of this release (the tutorial as it exists
> right now says 0.7 on it anyways and doesn't look like it's been updated in
> a while, so I think users can live with what's there and support on
> u...@nutch.apache.org or d...@nutch.apache.org until it's updated)
>
> * begin source only releases in general since we've long had the debate as
> to the size of the Nutch release. Most folks that use Nutch are likely
> familiar with running ant IMHO.
>
> * run RAT and integrate into the build
>
> 
>
> Please vote on releasing these packages as Apache Nutch 1.1. The vote is
> open for the next 72 hours.
>
> Since Nutch is now a TLP and has its own PMC, there is a question of who are
> the binding release VOTES in this particular thread. My gut reaction is that
> since I started this release while we were under the Lucene PMC, for
> continuity purposes, only votes from Lucene PMC are binding, but everyone
> (especially newly minted Nutch PMC members!) are  welcome to check the
> release candidate and voice their approval or disapproval. The vote passes
> if at least three binding +1 votes are cast.
>
> [ ] +1 Release the packages as Apache Nutch 1.1.
>
> [ ] -1 Do not release the packages because...
>
> Thanks!
>
> Cheers,
> Chris
>
> P.S. Here is my +1.
>
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.mattm...@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>





++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Running ANT; was -- Re: [VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-26 Thread Mattmann, Chris A (388J)
Hi David,

Thanks. In fact, running ant is probably simpler than running Nutch. The steps 
would be:


 *   what OS are you on (Ant is available for all of them to my knowledge)?
 *   if you need ant, grab a distro from ant.apache.org, otherwise, I'll assume 
that you've got ant installed and callable from the command line.
 *   unpack the nutch src distribution, cd into that directory, type "ant job", 
and there you go.

HTH! You could try it out by taking the Nutch src code from SVN at: 
http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1, and then trying the 
steps above.

Cheers,
Chris


On 4/26/10 7:24 AM, "David M. Cole"  wrote:

At 10:55 PM -0700 4/25/10, Mattmann, Chris A (388J) wrote:
>Most folks that use Nutch are likely
>familiar with running ant IMHO.

I guess then I fall into the category of "not most folks." Have been
running Nutch for about 14 months and I haven't a clue how to run ant.

If there's a place to vote to suggest that compiled versions still be
distributed, I vote for that.

Thanks.

\dmc

--
*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+
David M. Coled...@colegroup.com
Editor & Publisher, NewsInc. V: (650) 557-2993
Consultant: The Cole Group    F: (650) 475-8479
*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+



++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Apache Nutch 1.1 Release Candidate #2

2010-04-26 Thread Grant Ingersoll
Might I suggest, that since Nutch is now a TLP that you delay this release by a 
few weeks and have the vote done under the auspices of the Nutch PMC?

Cheers,
Grant

On Apr 26, 2010, at 1:55 AM, Mattmann, Chris A (388J) wrote:

> Hi Folks,
> 
> I have posted an updated candidate for the Apache Nutch 1.1 release. The
> source code is at:
> 
> http://people.apache.org/~mattmann/apache-nutch-1.1/rc2/
> 
> The major difference between this release and rc #1 is the application of
> NUTCH-812 - Crawl.java incorrectly uses the Generator API resulting in NPE -
> as well as some commits by Sami Siren to fix missing ASL license headers.
> 
> For more detailed information, see the included CHANGES.txt file for details
> on release contents and latest changes. The release was made using the Nutch
> release process, documented on the Wiki here:
> 
> http://bit.ly/d5ugid
> 
> A Nutch 1.1 tag is at:
> 
> http://svn.apache.org/repos/asf/lucene/nutch/tags/1.1/
> 
> 
> There was a request by Sami Siren that the tutorial be updated to reflect
> the fact that this release is a source-only release, as well as a request to
> integrate RAT into the build, however, in the interest of getting this 1.1
> out and getting going on the Nutch TLP, my proposal is:
> 
> * update the docs independent of this release (the tutorial as it exists
> right now says 0.7 on it anyways and doesn't look like it's been updated in
> a while, so I think users can live with what's there and support on
> u...@nutch.apache.org or d...@nutch.apache.org until it's updated)
> 
> * begin source only releases in general since we've long had the debate as
> to the size of the Nutch release. Most folks that use Nutch are likely
> familiar with running ant IMHO.
> 
> * run RAT and integrate into the build
> 
> 
> 
> Please vote on releasing these packages as Apache Nutch 1.1. The vote is
> open for the next 72 hours.
> 
> Since Nutch is now a TLP and has its own PMC, there is a question of who are
> the binding release VOTES in this particular thread. My gut reaction is that
> since I started this release while we were under the Lucene PMC, for
> continuity purposes, only votes from Lucene PMC are binding, but everyone
> (especially newly minted Nutch PMC members!) are  welcome to check the
> release candidate and voice their approval or disapproval. The vote passes
> if at least three binding +1 votes are cast.
> 
> [ ] +1 Release the packages as Apache Nutch 1.1.
> 
> [ ] -1 Do not release the packages because...
> 
> Thanks!
> 
> Cheers,
> Chris
> 
> P.S. Here is my +1.
> 
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.mattm...@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
> 
> 
>