Re: [VOTE] Move 2.0 out of trunk
On 18/09/2011 02:21, Julien Nioche wrote: Hi, Following the discussions [1] on the dev-list about the future of Nutch 2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk to a separate branch, promote 1.4 to trunk and consider 2.0 as unmaintained. The arguments for / against can be found in the thread I mentioned. The vote is open for the next 72 hours. [ ] +1 : Shelve 2.0 and move 1.4 to trunk [] 0 : No opinion [] -1 : Bad idea. Please give justification. +1 - at this time it's clear that 2.0 didn't pan out as we expected, and we should restart from the 1.x for a usable platform, and continue redesign from that codebase. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: [VOTE] Move 2.0 out of trunk
Here is my vote : +1 : Shelve 2.0 and move 1.4 to trunk Julien On 18 September 2011 10:21, Julien Nioche lists.digitalpeb...@gmail.comwrote: Hi, Following the discussions [1] on the dev-list about the future of Nutch 2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk to a separate branch, promote 1.4 to trunk and consider 2.0 as unmaintained. The arguments for / against can be found in the thread I mentioned. The vote is open for the next 72 hours. [ ] +1 : Shelve 2.0 and move 1.4 to trunk [] 0 : No opinion [] -1 : Bad idea. Please give justification. Thanks Julien [1] http://www.mail-archive.com/gora-dev@incubator.apache.org/msg00483.htmlhttp://mail-archives.apache.org/mod_mbox/nutch-dev/201109.mbox/%3cca+-fm0tj2kvuco0wwkxbj6hsamxx5819ujv7lco2vo2kd2z...@mail.gmail.com%3E -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com
Re: [VOTE] Move 2.0 out of trunk
My vote is thumbs down: -1 I am only involved in Nutch 2.0 and that would be put the back burner... Please read these articles if you struggle with using Nutch 2.0, and give feedback so that we can improve the doc/code/architecture. Nutch 2.0 (trunk) http://techvineyard.blogspot.com/2010/12/build-nutch-20.html Gora http://techvineyard.blogspot.com/2011/02/gora-orm-framework-for-hadoop-jobs.html I'm glad to hear that there at least 2 people in the community that do business in their field and proudly use a Nutch-based crawler together with Cassandra to store the data through Gora. That would not have been possible with Nutch 1.x version. Maybe this has been widely discussed already. IMOO, crawl segments are hard-to-maintain and easily lost. If you want to do that HDFS is what you are looking for. Even Yahoo has given up and is now using Microsoft updated crawl information in order to implement search. They use HBase which is, by the way, Nutch 2.0 compatible. Take at look: http://developer.yahoo.com/events/hadoopsummit2011/agenda.html#22 (sorry I don't think any video of the summit is available yet, not sure why) Alexis On Mon, Sep 19, 2011 at 1:05 AM, Julien Nioche lists.digitalpeb...@gmail.com wrote: Here is my vote : +1 : Shelve 2.0 and move 1.4 to trunk Julien On 18 September 2011 10:21, Julien Nioche lists.digitalpeb...@gmail.comwrote: Hi, Following the discussions [1] on the dev-list about the future of Nutch 2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk to a separate branch, promote 1.4 to trunk and consider 2.0 as unmaintained. The arguments for / against can be found in the thread I mentioned. The vote is open for the next 72 hours. [ ] +1 : Shelve 2.0 and move 1.4 to trunk [] 0 : No opinion [] -1 : Bad idea. Please give justification. Thanks Julien [1] http://www.mail-archive.com/gora-dev@incubator.apache.org/msg00483.htmlhttp://mail-archives.apache.org/mod_mbox/nutch-dev/201109.mbox/%3cca+-fm0tj2kvuco0wwkxbj6hsamxx5819ujv7lco2vo2kd2z...@mail.gmail.com%3E -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com
Re: [VOTE] Move 2.0 out of trunk
Hi Alexis, A few comments below : My vote is thumbs down: -1 I am only involved in Nutch 2.0 and that would be put the back burner... It has never left it so that's not much of a change :-) Nutch 2.0 (and GORA) has had more than a year to gather momentum and it hasn't. More seriously, as Chris explained people will still be able to work on 2.0 if they want to, the code is moved, not RE-moved. The other aspect of the change is that we won't keep necessarily 1.x sync with 2.0 - it has been a complete pain to have to maintain two branches at the same time and most people (judging by the votes) are fed up with it. We are making good progress on 1.x and 2.0 should not be hold us back. Again if people have the time and inclination to work on 2.0 then they will still be able to do so. [...] I'm glad to hear that there at least 2 people in the community that do business in their field and proudly use a Nutch-based crawler together with Cassandra to store the data through Gora. That would not have been possible with Nutch 1.x version. Not clear what you mean by not possible with Nutch 1. From a functionality point of view there is nothing in 2.0 that you can't do with 1.x, the reverse is not true (e.g. multiple outputs for parse) + 2.0 has a large number of bugs and is not fit for use in production I am sure that there are more than 2 users of Nutch 2.0 out there but that's after more than a year of having Nutch in trunk and is quite small compared to the number of users of 1.x Maybe this has been widely discussed already. IMOO, crawl segments are hard-to-maintain and easily lost. If you want to do that HDFS is what you are looking for. Even Yahoo has given up and is now using Microsoft updated crawl information in order to implement search. They use HBase which is, by the way, Nutch 2.0 compatible. Take at look: http://developer.yahoo.com/events/hadoopsummit2011/agenda.html#22 (sorry I don't think any video of the summit is available yet, not sure why) The advantages in having a single crawl table are well known and this is why we wanted to do that in 2.0. Again, if people want to get involved and improve it they will be able to do so. Thanks Julien On Mon, Sep 19, 2011 at 1:05 AM, Julien Nioche lists.digitalpeb...@gmail.com wrote: Here is my vote : +1 : Shelve 2.0 and move 1.4 to trunk Julien On 18 September 2011 10:21, Julien Nioche lists.digitalpeb...@gmail.comwrote: Hi, Following the discussions [1] on the dev-list about the future of Nutch 2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk to a separate branch, promote 1.4 to trunk and consider 2.0 as unmaintained. The arguments for / against can be found in the thread I mentioned. The vote is open for the next 72 hours. [ ] +1 : Shelve 2.0 and move 1.4 to trunk [] 0 : No opinion [] -1 : Bad idea. Please give justification. Thanks Julien [1] http://www.mail-archive.com/gora-dev@incubator.apache.org/msg00483.htmlhttp://mail-archives.apache.org/mod_mbox/nutch-dev/201109.mbox/%3cca+-fm0tj2kvuco0wwkxbj6hsamxx5819ujv7lco2vo2kd2z...@mail.gmail.com%3E -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com
Re: [VOTE] Move 2.0 out of trunk
I'm glad to hear that there at least 2 people in the community that do business in their field and proudly use a Nutch-based crawler together with Cassandra to store the data through Gora. That would not have been possible with Nutch 1.x version. what about to drop Gora, because it is progressing too slowly and make Nutch 2.x only cassandra/hadoop db based ?
Re: [DISCUSS] What will happen to Nutch Gora aka Nutchbase (was Re: [VOTE] Move 2.0 out of trunk)
Note to all: please use the [DISCUSS] thread format to discuss the VOTE, and please don't reply all to the VOTE thread and sully up the VOTE tallies with discussion. Radim, Thanks for your email. What you propose has been suggested as an option. The best way to help see it happen sooner rather than later is to get involved and/or contribute towards discussion, design, code, etc., for the issues that you are interested in. We welcome any contributions in this area. The nutchgora branch will still be there, and if there's a desire to have a nutchcassandra or nutchhbase pure branch, and you have some spare cycles to help see it come about, we would welcome it. Cheers, Chris On Sep 19, 2011, at 7:30 AM, Radim Kolar wrote: I'm glad to hear that there at least 2 people in the community that do business in their field and proudly use a Nutch-based crawler together with Cassandra to store the data through Gora. That would not have been possible with Nutch 1.x version. what about to drop Gora, because it is progressing too slowly and make Nutch 2.x only cassandra/hadoop db based ? ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: [DISCUSS] What will happen to Nutch Gora aka Nutchbase (was Re: [VOTE] Move 2.0 out of trunk)
Hi Radim, On Sep 19, 2011, at 9:22 AM, Radim Kolar wrote: The nutchgora branch will still be there, and if there's a desire to have a nutchcassandra or nutchhbase pure branch, and you have some spare cycles to help see it come about, we would welcome it. it needs to be done in more long term strategic way. 1. research what ppl expect from Nutch 2? 2. what gora backends they used/ want to use 3. to drop gora or not Sure, in fact, there have been several ongoing conversations related to this already for over a year now. See these threads: http://s.apache.org/HhP http://s.apache.org/zJX http://s.apache.org/4tC http://s.apache.org/BkM http://s.apache.org/ka http://s.apache.org/Rbi http://s.apache.org/XZe http://s.apache.org/X8F http://s.apache.org/bKr http://s.apache.org/gu http://s.apache.org/gN9 http://s.apache.org/OCZ http://s.apache.org/QID http://s.apache.org/xk http://s.apache.org/gw http://s.apache.org/p6w Feel free to contribute to the discussion. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: [VOTE] Move 2.0 out of trunk
+1 Hi, Following the discussions [1] on the dev-list about the future of Nutch 2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk to a separate branch, promote 1.4 to trunk and consider 2.0 as unmaintained. The arguments for / against can be found in the thread I mentioned. The vote is open for the next 72 hours. [ ] +1 : Shelve 2.0 and move 1.4 to trunk [] 0 : No opinion [] -1 : Bad idea. Please give justification. Thanks Julien [1] http://www.mail-archive.com/gora-dev@incubator.apache.org/msg00483.htmlht tp://mail-archives.apache.org/mod_mbox/nutch-dev/201109.mbox/%3CCA+-fM0tJ2K vuco0wwkxbj6hsamxx5819ujv7lco2vo2kd2z...@mail.gmail.com%3E
Re: [VOTE] Move 2.0 out of trunk
On Sep 18, 2011, at 2:21 AM, Julien Nioche wrote: Hi, Following the discussions [1] on the dev-list about the future of Nutch 2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk to a separate branch, promote 1.4 to trunk and consider 2.0 as unmaintained. The arguments for / against can be found in the thread I mentioned. The vote is open for the next 72 hours. [X ] +1 : Shelve 2.0 and move 1.4 to trunk [] 0 : No opinion [] -1 : Bad idea. Please give justification. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: [VOTE] Move 2.0 out of trunk
Hi, [X ] +1 : Shelve 2.0 and move 1.4 to trunk [] 0 : No opinion [] -1 : Bad idea. Please give justification. Thank you On Sun, Sep 18, 2011 at 3:48 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: On Sep 18, 2011, at 2:21 AM, Julien Nioche wrote: Hi, Following the discussions [1] on the dev-list about the future of Nutch 2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk to a separate branch, promote 1.4 to trunk and consider 2.0 as unmaintained. The arguments for / against can be found in the thread I mentioned. The vote is open for the next 72 hours. [X ] +1 : Shelve 2.0 and move 1.4 to trunk [] 0 : No opinion [] -1 : Bad idea. Please give justification. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -- *Lewis*
Re: [VOTE] Move 2.0 out of trunk
-1 I don't want to mark release 2.0 as unmaintained. Cassandra backend works really well for us and fixed performance problems with hadoop database. Instead of moving it out trunk, recruit more ppl should come and fix open problems. don't give up.
[DISCUSS] What will happen to Nutch Gora aka Nutchbase (was Re: [VOTE] Move 2.0 out of trunk)
Hi Radim, Thanks for your feedback. Just to dispel the thought that this VOTE will remove the Nutch-with-Gora version of SVN, it won't remove it (not that it could ever fully remove it anyways since SVN is a version control system it Nutch-with-Gora will always be around in some form or fashion. Simply, we are VOTE'ing on a proposal that will move the current Nutch trunk at http://svn.apache.org/repos/asf/nutch/trunk to http://svn.apache.org/repos/asf/nutch/branches/nutchgora and then will merge the current 1.4-development branch at http://svn.apache.org/repos/asf/nutch/branches/branch-1.4 into trunk. If folks want to leverage Nutch with Gora, and/or contribute to it there, I will consider those folks candidates for committers as I would anyone that's contributing to trunk and I would hope the rest of the Nutch dev community would also. Then, if you have the time and resources, and others do too, you can selectively move in the relevant parts of the system into trunk (and help maintain them where it makes sense) as you and the rest of the community (dev and users) see fit. Commit early, commit often. Discussions with the rest of the community. Starting small, growing big. All parts of developing in the Apache way. However, the current set of active Nutch committers have found that using their expertise to maintain the 1.x series of Nutch release (pre-Gora) to be a more productive use of their time since none of those active Nutch committers are Gora experts (including myself). We are trying to learn though, at least I know I am. So, given that, we are proposing to make the Nutch active branch of development (called trunk in SVN terms) the branch that all of us know how to maintain, and that furthermore, we are getting the most questions and activity from the user community regarding. Hope that helps to clarify. Cheers, Chris On Sep 18, 2011, at 4:08 PM, Radim Kolar wrote: -1 I don't want to mark release 2.0 as unmaintained. Cassandra backend works really well for us and fixed performance problems with hadoop database. Instead of moving it out trunk, recruit more ppl should come and fix open problems. don't give up. ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: [VOTE] Move 2.0 out of trunk
+1 On 09/18/2011 04:21 AM, Julien Nioche wrote: Hi, Following the discussions [1] on the dev-list about the future of Nutch 2.0, I would like to call for a vote on moving Nutch 2.0 from the trunk to a separate branch, promote 1.4 to trunk and consider 2.0 as unmaintained. The arguments for / against can be found in the thread I mentioned. The vote is open for the next 72 hours. [ ] +1 : Shelve 2.0 and move 1.4 to trunk [] 0 : No opinion [] -1 : Bad idea. Please give justification. Thanks Julien [1] http://www.mail-archive.com/gora-dev@incubator.apache.org/msg00483.html http://mail-archives.apache.org/mod_mbox/nutch-dev/201109.mbox/%3cca+-fm0tj2kvuco0wwkxbj6hsamxx5819ujv7lco2vo2kd2z...@mail.gmail.com%3E -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com