Re: [VOTE] Move 2.0 out of trunk

2011-09-20 Thread Andrzej Bialecki

On 18/09/2011 02:21, Julien Nioche wrote:

Hi,

Following the discussions [1] on the dev-list about the future of Nutch
2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk
to a separate branch, promote 1.4 to trunk and consider 2.0 as
unmaintained. The arguments for / against can be found in the thread I
mentioned.

The vote is open for the next 72 hours.

[ ] +1 : Shelve 2.0 and move 1.4 to trunk
[] 0 : No opinion
[] -1 : Bad idea.  Please give justification.


+1 - at this time it's clear that 2.0 didn't pan out as we expected, and 
we should restart from the 1.x for a usable platform, and continue 
redesign from that codebase.


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: [VOTE] Move 2.0 out of trunk

2011-09-19 Thread Julien Nioche
Here is my vote :

 +1 : Shelve 2.0 and move 1.4 to trunk

Julien

On 18 September 2011 10:21, Julien Nioche lists.digitalpeb...@gmail.comwrote:

 Hi,

 Following the discussions [1] on the dev-list about the future of Nutch
 2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk to a
 separate branch, promote 1.4 to trunk and consider 2.0 as unmaintained. The
 arguments for / against can be found in the thread I mentioned.

 The vote is open for the next 72 hours.

 [ ] +1 : Shelve 2.0 and move 1.4 to trunk
 [] 0 : No opinion
 [] -1 : Bad idea.  Please give justification.

 Thanks

 Julien

 [1]
 http://www.mail-archive.com/gora-dev@incubator.apache.org/msg00483.htmlhttp://mail-archives.apache.org/mod_mbox/nutch-dev/201109.mbox/%3cca+-fm0tj2kvuco0wwkxbj6hsamxx5819ujv7lco2vo2kd2z...@mail.gmail.com%3E

 --
 *
 *Open Source Solutions for Text Engineering

 http://digitalpebble.blogspot.com/
 http://www.digitalpebble.com




-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com


Re: [VOTE] Move 2.0 out of trunk

2011-09-19 Thread Alexis
My vote is thumbs down: -1

I am only involved in Nutch 2.0 and that would be put the back burner...

Please read these articles if you struggle with using Nutch 2.0, and give
feedback so that we can improve the doc/code/architecture.

Nutch 2.0 (trunk)
http://techvineyard.blogspot.com/2010/12/build-nutch-20.html

Gora
http://techvineyard.blogspot.com/2011/02/gora-orm-framework-for-hadoop-jobs.html

I'm glad to hear that there at least 2 people in the community that do
business in their field and proudly use a Nutch-based crawler together with
Cassandra to store the data through Gora. That would not have been possible
with Nutch 1.x version.

Maybe this has been widely discussed already. IMOO, crawl segments are
hard-to-maintain and easily lost. If you want to do that HDFS is what you
are looking for. Even Yahoo has given up and is now using Microsoft updated
crawl information in order to implement search. They use HBase which is, by
the way, Nutch 2.0 compatible.

Take at look:
http://developer.yahoo.com/events/hadoopsummit2011/agenda.html#22 (sorry I
don't think any video of the summit is available yet, not sure why)

Alexis


On Mon, Sep 19, 2011 at 1:05 AM, Julien Nioche 
lists.digitalpeb...@gmail.com wrote:

Here is my vote :

  +1 : Shelve 2.0 and move 1.4 to trunk

 Julien


 On 18 September 2011 10:21, Julien Nioche 
 lists.digitalpeb...@gmail.comwrote:

 Hi,

 Following the discussions [1] on the dev-list about the future of Nutch
 2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk to a
 separate branch, promote 1.4 to trunk and consider 2.0 as unmaintained. The
 arguments for / against can be found in the thread I mentioned.

 The vote is open for the next 72 hours.

 [ ] +1 : Shelve 2.0 and move 1.4 to trunk
 [] 0 : No opinion
 [] -1 : Bad idea.  Please give justification.

 Thanks

 Julien

 [1]
 http://www.mail-archive.com/gora-dev@incubator.apache.org/msg00483.htmlhttp://mail-archives.apache.org/mod_mbox/nutch-dev/201109.mbox/%3cca+-fm0tj2kvuco0wwkxbj6hsamxx5819ujv7lco2vo2kd2z...@mail.gmail.com%3E

 --
 *
 *Open Source Solutions for Text Engineering

 http://digitalpebble.blogspot.com/
 http://www.digitalpebble.com




 --
 *
 *Open Source Solutions for Text Engineering

 http://digitalpebble.blogspot.com/
 http://www.digitalpebble.com



Re: [VOTE] Move 2.0 out of trunk

2011-09-19 Thread Julien Nioche
Hi Alexis,

A few comments below :

My vote is thumbs down: -1

 I am only involved in Nutch 2.0 and that would be put the back burner...


It has never left it so that's not much of a change :-) Nutch 2.0 (and GORA)
has had more than a year to gather momentum and it hasn't.

More seriously,  as Chris explained people will still be able to work on 2.0
if they want to, the code is moved, not RE-moved. The other aspect of the
change is that we won't keep necessarily 1.x sync with 2.0 - it has been a
complete pain to have to maintain two branches at the same time and most
people (judging by the votes) are fed up with it. We are making good
progress on 1.x and 2.0 should not be hold us back.

Again if people have the time and inclination to work on 2.0 then they will
still be able to do so.

[...]



 I'm glad to hear that there at least 2 people in the community that do
 business in their field and proudly use a Nutch-based crawler together with
 Cassandra to store the data through Gora. That would not have been possible
 with Nutch 1.x version.


Not clear what you mean by not possible with Nutch 1. From a functionality
point of view there is nothing in 2.0 that you can't do with 1.x, the
reverse is not true (e.g. multiple outputs for parse) + 2.0 has a large
number of bugs and is not fit for use in production

I am sure that there are more than 2 users of  Nutch 2.0 out there but
that's after more than a year of having Nutch in trunk and is quite small
compared to the number of users of 1.x



 Maybe this has been widely discussed already. IMOO, crawl segments are
 hard-to-maintain and easily lost. If you want to do that HDFS is what you
 are looking for. Even Yahoo has given up and is now using Microsoft updated
 crawl information in order to implement search. They use HBase which is, by
 the way, Nutch 2.0 compatible.


 Take at look:
 http://developer.yahoo.com/events/hadoopsummit2011/agenda.html#22 (sorry I
 don't think any video of the summit is available yet, not sure why)


The advantages in having a single crawl table are well known and this is why
we wanted to do that in 2.0. Again, if people want to get involved and
improve it they will be able to do so.

Thanks

Julien


 On Mon, Sep 19, 2011 at 1:05 AM, Julien Nioche 
 lists.digitalpeb...@gmail.com wrote:

 Here is my vote :

  +1 : Shelve 2.0 and move 1.4 to trunk

 Julien


 On 18 September 2011 10:21, Julien Nioche 
 lists.digitalpeb...@gmail.comwrote:

 Hi,

 Following the discussions [1] on the dev-list about the future of Nutch
 2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk to a
 separate branch, promote 1.4 to trunk and consider 2.0 as unmaintained. The
 arguments for / against can be found in the thread I mentioned.

 The vote is open for the next 72 hours.

 [ ] +1 : Shelve 2.0 and move 1.4 to trunk
 [] 0 : No opinion
 [] -1 : Bad idea.  Please give justification.

 Thanks

 Julien

 [1]
 http://www.mail-archive.com/gora-dev@incubator.apache.org/msg00483.htmlhttp://mail-archives.apache.org/mod_mbox/nutch-dev/201109.mbox/%3cca+-fm0tj2kvuco0wwkxbj6hsamxx5819ujv7lco2vo2kd2z...@mail.gmail.com%3E

 --
 *
 *Open Source Solutions for Text Engineering

 http://digitalpebble.blogspot.com/
 http://www.digitalpebble.com




 --
 *
 *Open Source Solutions for Text Engineering

 http://digitalpebble.blogspot.com/
 http://www.digitalpebble.com





-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com


Re: [VOTE] Move 2.0 out of trunk

2011-09-19 Thread Radim Kolar
  I'm glad to hear that there at least 2 people in the community that 
do business in their field and proudly use a Nutch-based crawler 
together with
 Cassandra to store the data through Gora. That would not have been 
possible with Nutch 1.x version.
what about to drop Gora, because it is progressing too slowly and make 
Nutch 2.x only cassandra/hadoop db based ?


Re: [DISCUSS] What will happen to Nutch Gora aka Nutchbase (was Re: [VOTE] Move 2.0 out of trunk)

2011-09-19 Thread Mattmann, Chris A (388J)
Note to all: please use the [DISCUSS] thread format to discuss the VOTE, and 
please don't reply all to the VOTE thread and sully up the VOTE tallies with 
discussion.

Radim,

Thanks for your email. What you propose has been suggested as an option. 
The best way to help see it happen sooner rather than later is to get involved 
and/or contribute towards discussion, design, code, etc., for the issues that 
you are interested in. We welcome any contributions in this area. The nutchgora 
branch will still be there, and if there's a desire to have a nutchcassandra or 
nutchhbase 
pure branch, and you have some spare cycles to help see it come about, we would 
welcome it.

Cheers,
Chris

On Sep 19, 2011, at 7:30 AM, Radim Kolar wrote:

 I'm glad to hear that there at least 2 people in the community that 
 do business in their field and proudly use a Nutch-based crawler 
 together with
 Cassandra to store the data through Gora. That would not have been 
 possible with Nutch 1.x version.
 what about to drop Gora, because it is progressing too slowly and make 
 Nutch 2.x only cassandra/hadoop db based ?


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [DISCUSS] What will happen to Nutch Gora aka Nutchbase (was Re: [VOTE] Move 2.0 out of trunk)

2011-09-19 Thread Mattmann, Chris A (388J)
Hi Radim,

On Sep 19, 2011, at 9:22 AM, Radim Kolar wrote:

 The nutchgora branch will still be there, and if there's a desire to 
 have a nutchcassandra or nutchhbase pure branch, and you have some spare 
 cycles to help see it come about, we would welcome it.
 
 
 it needs to be done in more long term strategic way.
 
 1. research what ppl expect from Nutch 2?
 2. what gora backends they used/ want to use
 3. to drop gora or not

Sure, in fact, there have been several ongoing conversations related to this 
already for 
over a year now.

See these threads:

http://s.apache.org/HhP
http://s.apache.org/zJX
http://s.apache.org/4tC
http://s.apache.org/BkM
http://s.apache.org/ka
http://s.apache.org/Rbi
http://s.apache.org/XZe
http://s.apache.org/X8F
http://s.apache.org/bKr
http://s.apache.org/gu
http://s.apache.org/gN9
http://s.apache.org/OCZ
http://s.apache.org/QID
http://s.apache.org/xk
http://s.apache.org/gw
http://s.apache.org/p6w

Feel free to contribute to the discussion.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Move 2.0 out of trunk

2011-09-18 Thread Markus Jelsma
+1

 Hi,
 
 Following the discussions [1] on the dev-list about the future of Nutch
 2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk to
 a separate branch, promote 1.4 to trunk and consider 2.0 as unmaintained.
 The arguments for / against can be found in the thread I mentioned.
 
 The vote is open for the next 72 hours.
 
 [ ] +1 : Shelve 2.0 and move 1.4 to trunk
 [] 0 : No opinion
 [] -1 : Bad idea.  Please give justification.
 
 Thanks
 
 Julien
 
 [1]
 http://www.mail-archive.com/gora-dev@incubator.apache.org/msg00483.htmlht
 tp://mail-archives.apache.org/mod_mbox/nutch-dev/201109.mbox/%3CCA+-fM0tJ2K
 vuco0wwkxbj6hsamxx5819ujv7lco2vo2kd2z...@mail.gmail.com%3E


Re: [VOTE] Move 2.0 out of trunk

2011-09-18 Thread Mattmann, Chris A (388J)
On Sep 18, 2011, at 2:21 AM, Julien Nioche wrote:

 Hi, 
 
 Following the discussions [1] on the dev-list about the future of Nutch 2.0, 
 I would like to call for a vote on moving Nutch 2.0 from the trunk to a 
 separate branch, promote 1.4 to trunk and consider 2.0 as unmaintained. The 
 arguments for / against can be found in the thread I mentioned.
 
 The vote is open for the next 72 hours. 
 
 [X ] +1 : Shelve 2.0 and move 1.4 to trunk
 [] 0 : No opinion
 [] -1 : Bad idea.  Please give justification.
 

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Move 2.0 out of trunk

2011-09-18 Thread lewis john mcgibbney
Hi,

[X ] +1 : Shelve 2.0 and move 1.4 to trunk
[] 0 : No opinion
[] -1 : Bad idea.  Please give justification.

Thank you

On Sun, Sep 18, 2011 at 3:48 PM, Mattmann, Chris A (388J) 
chris.a.mattm...@jpl.nasa.gov wrote:

 On Sep 18, 2011, at 2:21 AM, Julien Nioche wrote:

  Hi,
 
  Following the discussions [1] on the dev-list about the future of Nutch
 2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk to a
 separate branch, promote 1.4 to trunk and consider 2.0 as unmaintained. The
 arguments for / against can be found in the thread I mentioned.
 
  The vote is open for the next 72 hours.
 
  [X ] +1 : Shelve 2.0 and move 1.4 to trunk
  [] 0 : No opinion
  [] -1 : Bad idea.  Please give justification.
 

 Cheers,
 Chris

 ++
 Chris Mattmann, Ph.D.
 Senior Computer Scientist
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 171-266B, Mailstop: 171-246
 Email: chris.a.mattm...@nasa.gov
 WWW:   http://sunset.usc.edu/~mattmann/
 ++
 Adjunct Assistant Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++




-- 
*Lewis*


Re: [VOTE] Move 2.0 out of trunk

2011-09-18 Thread Radim Kolar

-1

I don't want to mark release 2.0 as unmaintained. Cassandra backend 
works really well for us and fixed performance problems with hadoop 
database. Instead of moving it out trunk, recruit more ppl should come 
and fix open problems. don't give up.


[DISCUSS] What will happen to Nutch Gora aka Nutchbase (was Re: [VOTE] Move 2.0 out of trunk)

2011-09-18 Thread Mattmann, Chris A (388J)
Hi Radim,

Thanks for your feedback. Just to dispel the thought that this VOTE will 
remove the Nutch-with-Gora version of SVN, it won't remove it (not that it 
could 
ever fully remove it anyways since SVN is a version control system it 
Nutch-with-Gora 
will always be around in some form or fashion.

Simply, we are VOTE'ing on a proposal that will move the current Nutch 
trunk at http://svn.apache.org/repos/asf/nutch/trunk to 
http://svn.apache.org/repos/asf/nutch/branches/nutchgora 
and then will merge the current 1.4-development branch at 
http://svn.apache.org/repos/asf/nutch/branches/branch-1.4 
into trunk.

If folks want to leverage Nutch with Gora, and/or contribute to it there, I 
will consider those folks 
candidates for committers as I would anyone that's contributing to trunk 
and I would hope the rest of the Nutch dev community would also. Then, if you 
have the time 
and resources, and others do too, you can selectively move in the relevant 
parts of the system 
into trunk (and help maintain them where it makes sense) as you and the rest of 
the 
community (dev and users) see fit. Commit early, commit often. Discussions with 
the rest 
of the community. Starting small, growing big. All parts of developing in the 
Apache way.

However, the current set of active Nutch committers have found that using their 
expertise 
to maintain the 1.x series of Nutch release (pre-Gora) to be a more productive 
use of their 
time since none of those active Nutch committers are Gora experts (including 
myself). We 
are trying to learn though, at least I know I am. So, given that, we are 
proposing to make 
the Nutch active branch of development (called trunk in SVN terms) the branch 
that 
all of us know how to maintain, and that furthermore, we are getting the most 
questions 
and activity from the user community regarding. 

Hope that helps to clarify.

Cheers,
Chris



On Sep 18, 2011, at 4:08 PM, Radim Kolar wrote:

 -1
 
 I don't want to mark release 2.0 as unmaintained. Cassandra backend 
 works really well for us and fixed performance problems with hadoop 
 database. Instead of moving it out trunk, recruit more ppl should come 
 and fix open problems. don't give up.


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: [VOTE] Move 2.0 out of trunk

2011-09-18 Thread Dennis Kubes

+1

On 09/18/2011 04:21 AM, Julien Nioche wrote:

Hi,

Following the discussions [1] on the dev-list about the future of 
Nutch 2.0, I would like to call for a vote on moving Nutch 2.0 from 
the trunk to a separate branch, promote 1.4 to trunk and consider 2.0 
as unmaintained. The arguments for / against can be found in the 
thread I mentioned.


The vote is open for the next 72 hours.

[ ] +1 : Shelve 2.0 and move 1.4 to trunk
[] 0 : No opinion
[] -1 : Bad idea.  Please give justification.

Thanks

Julien

[1] 
http://www.mail-archive.com/gora-dev@incubator.apache.org/msg00483.html http://mail-archives.apache.org/mod_mbox/nutch-dev/201109.mbox/%3cca+-fm0tj2kvuco0wwkxbj6hsamxx5819ujv7lco2vo2kd2z...@mail.gmail.com%3E


--
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com