Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
Glad to see were making progress here. Same with me, I am ready to move on with the project and move out of this 'rut' we have been in with trunk. Thanks On Sat, Sep 17, 2011 at 6:56 PM, Mattmann, Chris A (388J) < chris.a.mattm...@jpl.nasa.gov> wrote: > Hey Markus, > > No worries. I actually have no dog in this fight to be honest. > > I want Gora to be successful, and I want Nutch to be successful. > I haven't contributed much to Nutch 2.0 trunk but I have been > to the 1.x series branch. I wish I knew more about Gora's internals (and > am trying to learn) so I could help more with it. I think it will make a > lot > of sense to use it at some point. > > At the same time, I'm all for making 1.x releases and naturally getting to > 2.0 over time based on our current progress and understanding. I'm also > super excited about the 1.x versions of Nutch and when I think about it > the reality is that they've always been Nutch trunk even though we > artificially tried to turn the nutchbase brancn into it. > > So to wrap it up, I'm totally fine with 1.x moving into trunk and with > executing > the plan I proposed a while back: > > ---snip > 1. branch the current trunk as > https://svn.apache.org/repos/asf/nutch/branches/nutchgora > 2. grab latest stable branch (e.g., > https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and > *replace* the Nutch trunk with it, and bump the version # to 1.7-dev > 3. active development on stable becomes active development in trunk and > nutchgora still > exists in case anyone ever resurrects it. > ---snip > > Of course, it's not 1.6 (I was optimistic about getting there in 6 months > ;) ), but it's really 1.4. > And we don't need to bump to -dev since we're already in full dev with the > 1.4 cycle. > > So, I'm ready for a VOTE. Feel free to call one (or have Julien do it), and > I'll VOTE +1. > > Cheers, > Chris > > > On Sep 17, 2011, at 10:18 AM, Markus Jelsma wrote: > > > Hi Chris, > > > > I initially respawned this thread with the suggestion to not to wait > until > > january orso before the vote. Hence my apologies for being impatient and > > pessimistic about trunk :) > > > > Cheers, > > > >> Hey Julien, > >> > >> My option E was pretty much equivalent to B except I specified a time > frame > >> (next 6 months). Are we just saying that we'll accelerate the time frame > >> to say, umm, next week or the week after? :) > >> > >> If so, fine by me. Since I moved nutchbase into the trunk at one point, > I'd > >> be happy once we've VOTEd and decided to be the one to execute moving it > >> out. > >> > >> And yes, PMC votes will be binding and we'll do majority takes it, fine > by > >> me. > >> > >> Cheers, > >> Chris > >> > >> On Sep 17, 2011, at 1:45 AM, Julien Nioche wrote: > >>> Let's keep it simple. Let's vote for option B (i.e. shelve 2.0), if > most > >>> people are in favour then we don't need to look into other options at > >>> all. If not, we'll see what alternatives or arguments come up and vote > >>> on these later. > >>> > >>> I assume that only PMC votes will be binding and the majority takes it? > >>> > >>> Julien > >>> > >>> On 16 September 2011 22:30, Mattmann, Chris A (388J) > >>> wrote: Why don't we just collect VOTEs > >>> for each of the options a-e, and then figure out based on that if there > >>> is a majority. If there's no majority, we can widdle it down to say the > >>> top 2-3, and then VOTE on those, looking for majority again. > >>> > >>> Cheers, > >>> Chris > >>> > >>> On Sep 16, 2011, at 11:44 AM, Markus Jelsma wrote: > Option B) Shelve trunk in a branch and promote 1.4 to trunk. We can > always choose to hardwire HBASE (option D) later. > > Markus > > > Am happy to call for a vote on the future of Nutch 2.0 if you want. > > Shall we reduce the various options described before to a single one? > > > > Julien > > > > On 15 September 2011 19:55, Markus Jelsma > > wrote: > >>> Hi Guys, > >>> > >>> I thought I'd chime in on this thread. My comments below: > I understand and share your frustration, however you need to bear > in > >> > >> mind > >> > that things are done only if people volunteer and have time - > usually taken from their holiday, weekends, evenings. Chris (who > is the de > >> > >> facto > >> > release master for Nutch and Gora) has not had the time and nobody > else has volunteered to do it. > >>> > >>> Yep I haven't had the time to push a Gora 0.1.1-incubating release > >>> that will address the Maven issues. However it is on my roadmap for > >>> open > >> > >> source > >> > >>> stuff to get done in the next month, so that's a good thing. But > >>> yes, > >> > >> that > >> > >>> portion of my open source work is all volunteer time, so sometimes > >>> other things take priority. > >>> > > As it happens, yesterday was t
Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
Hey Markus, No worries. I actually have no dog in this fight to be honest. I want Gora to be successful, and I want Nutch to be successful. I haven't contributed much to Nutch 2.0 trunk but I have been to the 1.x series branch. I wish I knew more about Gora's internals (and am trying to learn) so I could help more with it. I think it will make a lot of sense to use it at some point. At the same time, I'm all for making 1.x releases and naturally getting to 2.0 over time based on our current progress and understanding. I'm also super excited about the 1.x versions of Nutch and when I think about it the reality is that they've always been Nutch trunk even though we artificially tried to turn the nutchbase brancn into it. So to wrap it up, I'm totally fine with 1.x moving into trunk and with executing the plan I proposed a while back: ---snip 1. branch the current trunk as https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab latest stable branch (e.g., https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and *replace* the Nutch trunk with it, and bump the version # to 1.7-dev 3. active development on stable becomes active development in trunk and nutchgora still exists in case anyone ever resurrects it. ---snip Of course, it's not 1.6 (I was optimistic about getting there in 6 months ;) ), but it's really 1.4. And we don't need to bump to -dev since we're already in full dev with the 1.4 cycle. So, I'm ready for a VOTE. Feel free to call one (or have Julien do it), and I'll VOTE +1. Cheers, Chris On Sep 17, 2011, at 10:18 AM, Markus Jelsma wrote: > Hi Chris, > > I initially respawned this thread with the suggestion to not to wait until > january orso before the vote. Hence my apologies for being impatient and > pessimistic about trunk :) > > Cheers, > >> Hey Julien, >> >> My option E was pretty much equivalent to B except I specified a time frame >> (next 6 months). Are we just saying that we'll accelerate the time frame >> to say, umm, next week or the week after? :) >> >> If so, fine by me. Since I moved nutchbase into the trunk at one point, I'd >> be happy once we've VOTEd and decided to be the one to execute moving it >> out. >> >> And yes, PMC votes will be binding and we'll do majority takes it, fine by >> me. >> >> Cheers, >> Chris >> >> On Sep 17, 2011, at 1:45 AM, Julien Nioche wrote: >>> Let's keep it simple. Let's vote for option B (i.e. shelve 2.0), if most >>> people are in favour then we don't need to look into other options at >>> all. If not, we'll see what alternatives or arguments come up and vote >>> on these later. >>> >>> I assume that only PMC votes will be binding and the majority takes it? >>> >>> Julien >>> >>> On 16 September 2011 22:30, Mattmann, Chris A (388J) >>> wrote: Why don't we just collect VOTEs >>> for each of the options a-e, and then figure out based on that if there >>> is a majority. If there's no majority, we can widdle it down to say the >>> top 2-3, and then VOTE on those, looking for majority again. >>> >>> Cheers, >>> Chris >>> >>> On Sep 16, 2011, at 11:44 AM, Markus Jelsma wrote: Option B) Shelve trunk in a branch and promote 1.4 to trunk. We can always choose to hardwire HBASE (option D) later. Markus > Am happy to call for a vote on the future of Nutch 2.0 if you want. > Shall we reduce the various options described before to a single one? > > Julien > > On 15 September 2011 19:55, Markus Jelsma > wrote: >>> Hi Guys, >>> >>> I thought I'd chime in on this thread. My comments below: I understand and share your frustration, however you need to bear in >> >> mind >> that things are done only if people volunteer and have time - usually taken from their holiday, weekends, evenings. Chris (who is the de >> >> facto >> release master for Nutch and Gora) has not had the time and nobody else has volunteered to do it. >>> >>> Yep I haven't had the time to push a Gora 0.1.1-incubating release >>> that will address the Maven issues. However it is on my roadmap for >>> open >> >> source >> >>> stuff to get done in the next month, so that's a good thing. But >>> yes, >> >> that >> >>> portion of my open source work is all volunteer time, so sometimes >>> other things take priority. >>> > As it happens, yesterday was the 1 year anniversary of the last > successful Hudson/Jenkins build... If that actually worked, we > could point people towards it as a useful recipe for how to get a > build working off trunk. I haven't been following Nutch too > closely, but it always strikes me as really odd, that there's a > nightly build and it doesn't bother anybody that it fails all the > time (and that there isn't a nightly build for the stable >
Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
Hi Chris, I initially respawned this thread with the suggestion to not to wait until january orso before the vote. Hence my apologies for being impatient and pessimistic about trunk :) Cheers, > Hey Julien, > > My option E was pretty much equivalent to B except I specified a time frame > (next 6 months). Are we just saying that we'll accelerate the time frame > to say, umm, next week or the week after? :) > > If so, fine by me. Since I moved nutchbase into the trunk at one point, I'd > be happy once we've VOTEd and decided to be the one to execute moving it > out. > > And yes, PMC votes will be binding and we'll do majority takes it, fine by > me. > > Cheers, > Chris > > On Sep 17, 2011, at 1:45 AM, Julien Nioche wrote: > > Let's keep it simple. Let's vote for option B (i.e. shelve 2.0), if most > > people are in favour then we don't need to look into other options at > > all. If not, we'll see what alternatives or arguments come up and vote > > on these later. > > > > I assume that only PMC votes will be binding and the majority takes it? > > > > Julien > > > > On 16 September 2011 22:30, Mattmann, Chris A (388J) > > wrote: Why don't we just collect VOTEs > > for each of the options a-e, and then figure out based on that if there > > is a majority. If there's no majority, we can widdle it down to say the > > top 2-3, and then VOTE on those, looking for majority again. > > > > Cheers, > > Chris > > > > On Sep 16, 2011, at 11:44 AM, Markus Jelsma wrote: > > > Option B) Shelve trunk in a branch and promote 1.4 to trunk. We can > > > always choose to hardwire HBASE (option D) later. > > > > > > Markus > > > > > >> Am happy to call for a vote on the future of Nutch 2.0 if you want. > > >> Shall we reduce the various options described before to a single one? > > >> > > >> Julien > > >> > > >> On 15 September 2011 19:55, Markus Jelsma wrote: > > Hi Guys, > > > > I thought I'd chime in on this thread. My comments below: > > > I understand and share your frustration, however you need to bear > > > in > > >>> > > >>> mind > > >>> > > > that things are done only if people volunteer and have time - > > > usually taken from their holiday, weekends, evenings. Chris (who > > > is the de > > >>> > > >>> facto > > >>> > > > release master for Nutch and Gora) has not had the time and nobody > > > else has volunteered to do it. > > > > Yep I haven't had the time to push a Gora 0.1.1-incubating release > > that will address the Maven issues. However it is on my roadmap for > > open > > >>> > > >>> source > > >>> > > stuff to get done in the next month, so that's a good thing. But > > yes, > > >>> > > >>> that > > >>> > > portion of my open source work is all volunteer time, so sometimes > > other things take priority. > > > > >> As it happens, yesterday was the 1 year anniversary of the last > > >> successful Hudson/Jenkins build... If that actually worked, we > > >> could point people towards it as a useful recipe for how to get a > > >> build working off trunk. I haven't been following Nutch too > > >> closely, but it always strikes me as really odd, that there's a > > >> nightly build and it doesn't bother anybody that it fails all the > > >> time (and that there isn't a nightly build for the stable > > >> branches). > > > > > > The real issue behind all this is what we should do with Nutch 2.0. > > >>> > > >>> What > > >>> > > > follows is only my opinion and I would love to hear what others > > > have to say on this subject. > > > > > > Since we (actually mostly Dogacan) wrote 2.0 and delegated the > > > storage > > >>> > > >>> to > > >>> > > > Gora, the latter hasn't really taken off since incubation. There > > > have been some modest contributions to it but it does not seem to > > > be used much and there is virtually nothing happening on it in > > > terms of development. More worryingly, the people who initially > > > contributed to > > >>> > > >>> it > > >>> > > > are not very active on the project (such is life, new jobs, > > > different projects, etc...) anymore·. As for Nutch 2.0, it hasn't > > > made any progress in the last 12 months : we still have the same > > > bugs, the > > >>> > > >>> tests > > >>> > > > do not work, the build has to be done manually etc... > > > > Yep. > > > > > At the same time, there has been a new lease of life into Nutch as > > > a whole : there is definitely more activity on the mailing lists, > > > new users, new active committers etc... and quite a few bugfixes > > > and improvements - most of them backported from what had been done > > > in the trunk and people seem fairly happy with what we can do with > > > 1.4 > > > > Totally agreed. I'm actually not super surprised -- ever since 1.1, > > I >
Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
Hey Julien, My option E was pretty much equivalent to B except I specified a time frame (next 6 months). Are we just saying that we'll accelerate the time frame to say, umm, next week or the week after? :) If so, fine by me. Since I moved nutchbase into the trunk at one point, I'd be happy once we've VOTEd and decided to be the one to execute moving it out. And yes, PMC votes will be binding and we'll do majority takes it, fine by me. Cheers, Chris On Sep 17, 2011, at 1:45 AM, Julien Nioche wrote: > Let's keep it simple. Let's vote for option B (i.e. shelve 2.0), if most > people are in favour then we don't need to look into other options at all. If > not, we'll see what alternatives or arguments come up and vote on these later. > > I assume that only PMC votes will be binding and the majority takes it? > > Julien > > On 16 September 2011 22:30, Mattmann, Chris A (388J) > wrote: > Why don't we just collect VOTEs for each of the options a-e, and then > figure out based on that if there is a majority. If there's no majority, we > can widdle it down to say the top 2-3, and then VOTE on those, looking > for majority again. > > Cheers, > Chris > > On Sep 16, 2011, at 11:44 AM, Markus Jelsma wrote: > > > Option B) Shelve trunk in a branch and promote 1.4 to trunk. We can always > > choose to hardwire HBASE (option D) later. > > > > Markus > > > >> Am happy to call for a vote on the future of Nutch 2.0 if you want. Shall > >> we reduce the various options described before to a single one? > >> > >> Julien > >> > >> On 15 September 2011 19:55, Markus Jelsma > >> wrote: > Hi Guys, > > I thought I'd chime in on this thread. My comments below: > > I understand and share your frustration, however you need to bear in > >>> > >>> mind > >>> > > that things are done only if people volunteer and have time - usually > > taken from their holiday, weekends, evenings. Chris (who is the de > >>> > >>> facto > >>> > > release master for Nutch and Gora) has not had the time and nobody > > else has volunteered to do it. > > Yep I haven't had the time to push a Gora 0.1.1-incubating release that > will address the Maven issues. However it is on my roadmap for open > >>> > >>> source > >>> > stuff to get done in the next month, so that's a good thing. But yes, > >>> > >>> that > >>> > portion of my open source work is all volunteer time, so sometimes > other things take priority. > > >> As it happens, yesterday was the 1 year anniversary of the last > >> successful Hudson/Jenkins build... If that actually worked, we > >> could point people towards it as a useful recipe for how to get a > >> build working off trunk. I haven't been following Nutch too > >> closely, but it always strikes me as really odd, that there's a > >> nightly build and it doesn't bother anybody that it fails all the > >> time (and that there isn't a nightly build for the stable > >> branches). > > > > The real issue behind all this is what we should do with Nutch 2.0. > >>> > >>> What > >>> > > follows is only my opinion and I would love to hear what others have > > to say on this subject. > > > > Since we (actually mostly Dogacan) wrote 2.0 and delegated the > > storage > >>> > >>> to > >>> > > Gora, the latter hasn't really taken off since incubation. There have > > been some modest contributions to it but it does not seem to be used > > much and there is virtually nothing happening on it in terms of > > development. More worryingly, the people who initially contributed to > >>> > >>> it > >>> > > are not very active on the project (such is life, new jobs, different > > projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any > > progress in the last 12 months : we still have the same bugs, the > >>> > >>> tests > >>> > > do not work, the build has to be done manually etc... > > Yep. > > > At the same time, there has been a new lease of life into Nutch as a > > whole : there is definitely more activity on the mailing lists, new > > users, new active committers etc... and quite a few bugfixes and > > improvements - most of them backported from what had been done in the > > trunk and people seem fairly happy with what we can do with 1.4 > > Totally agreed. I'm actually not super surprised -- ever since 1.1, I > >>> > >>> kind > >>> > of felt that maintaining a stable 1.X branch of Nutch (in parallel to > the 2.0 efforts) was really going to pay off since there was renewed > interest from users in leveraging (and furthermore accepting) the > nuances of 1.X. > > > So the question is : what shall we do with 2.0? Here are a few > > possibilities > > > > > > a) put some effort into it, fix the bugs and make so that it can be > >>> > >>> used > >>> > > instead of 1.x > > b) shel
Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
Let's keep it simple. Let's vote for option B (i.e. shelve 2.0), if most people are in favour then we don't need to look into other options at all. If not, we'll see what alternatives or arguments come up and vote on these later. I assume that only PMC votes will be binding and the majority takes it? Julien On 16 September 2011 22:30, Mattmann, Chris A (388J) < chris.a.mattm...@jpl.nasa.gov> wrote: > Why don't we just collect VOTEs for each of the options a-e, and then > figure out based on that if there is a majority. If there's no majority, we > can widdle it down to say the top 2-3, and then VOTE on those, looking > for majority again. > > Cheers, > Chris > > On Sep 16, 2011, at 11:44 AM, Markus Jelsma wrote: > > > Option B) Shelve trunk in a branch and promote 1.4 to trunk. We can > always > > choose to hardwire HBASE (option D) later. > > > > Markus > > > >> Am happy to call for a vote on the future of Nutch 2.0 if you want. > Shall > >> we reduce the various options described before to a single one? > >> > >> Julien > >> > >> On 15 September 2011 19:55, Markus Jelsma >wrote: > Hi Guys, > > I thought I'd chime in on this thread. My comments below: > > I understand and share your frustration, however you need to bear in > >>> > >>> mind > >>> > > that things are done only if people volunteer and have time - usually > > taken from their holiday, weekends, evenings. Chris (who is the de > >>> > >>> facto > >>> > > release master for Nutch and Gora) has not had the time and nobody > > else has volunteered to do it. > > Yep I haven't had the time to push a Gora 0.1.1-incubating release > that > will address the Maven issues. However it is on my roadmap for open > >>> > >>> source > >>> > stuff to get done in the next month, so that's a good thing. But yes, > >>> > >>> that > >>> > portion of my open source work is all volunteer time, so sometimes > other things take priority. > > >> As it happens, yesterday was the 1 year anniversary of the last > >> successful Hudson/Jenkins build... If that actually worked, we > >> could point people towards it as a useful recipe for how to get a > >> build working off trunk. I haven't been following Nutch too > >> closely, but it always strikes me as really odd, that there's a > >> nightly build and it doesn't bother anybody that it fails all the > >> time (and that there isn't a nightly build for the stable > >> branches). > > > > The real issue behind all this is what we should do with Nutch 2.0. > >>> > >>> What > >>> > > follows is only my opinion and I would love to hear what others have > > to say on this subject. > > > > Since we (actually mostly Dogacan) wrote 2.0 and delegated the > > storage > >>> > >>> to > >>> > > Gora, the latter hasn't really taken off since incubation. There have > > been some modest contributions to it but it does not seem to be used > > much and there is virtually nothing happening on it in terms of > > development. More worryingly, the people who initially contributed to > >>> > >>> it > >>> > > are not very active on the project (such is life, new jobs, different > > projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any > > progress in the last 12 months : we still have the same bugs, the > >>> > >>> tests > >>> > > do not work, the build has to be done manually etc... > > Yep. > > > At the same time, there has been a new lease of life into Nutch as a > > whole : there is definitely more activity on the mailing lists, new > > users, new active committers etc... and quite a few bugfixes and > > improvements - most of them backported from what had been done in the > > trunk and people seem fairly happy with what we can do with 1.4 > > Totally agreed. I'm actually not super surprised -- ever since 1.1, I > >>> > >>> kind > >>> > of felt that maintaining a stable 1.X branch of Nutch (in parallel to > the 2.0 efforts) was really going to pay off since there was renewed > interest from users in leveraging (and furthermore accepting) the > nuances of 1.X. > > > So the question is : what shall we do with 2.0? Here are a few > > possibilities > > > > > > a) put some effort into it, fix the bugs and make so that it can be > >>> > >>> used > >>> > > instead of 1.x > > b) shelve it and leave it for enthusiasts to play with + make 1.x the > > trunk again > > c) do nothing : keep 2.0 and 1.x in parallel (but having to maintain > >>> > >>> two > >>> > > branches is quite a pain) > > d) abandon the idea of a neutral storage layer with Gora and hardwire > >>> > >>> it > >>> > > to e.g. HBase > > > > Option (a) has not happened in the last 12 months and I am not very > > hopeful about it. > > > > What do you guys think? > > I'd
Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
Why don't we just collect VOTEs for each of the options a-e, and then figure out based on that if there is a majority. If there's no majority, we can widdle it down to say the top 2-3, and then VOTE on those, looking for majority again. Cheers, Chris On Sep 16, 2011, at 11:44 AM, Markus Jelsma wrote: > Option B) Shelve trunk in a branch and promote 1.4 to trunk. We can always > choose to hardwire HBASE (option D) later. > > Markus > >> Am happy to call for a vote on the future of Nutch 2.0 if you want. Shall >> we reduce the various options described before to a single one? >> >> Julien >> >> On 15 September 2011 19:55, Markus Jelsma wrote: Hi Guys, I thought I'd chime in on this thread. My comments below: > I understand and share your frustration, however you need to bear in >>> >>> mind >>> > that things are done only if people volunteer and have time - usually > taken from their holiday, weekends, evenings. Chris (who is the de >>> >>> facto >>> > release master for Nutch and Gora) has not had the time and nobody > else has volunteered to do it. Yep I haven't had the time to push a Gora 0.1.1-incubating release that will address the Maven issues. However it is on my roadmap for open >>> >>> source >>> stuff to get done in the next month, so that's a good thing. But yes, >>> >>> that >>> portion of my open source work is all volunteer time, so sometimes other things take priority. >> As it happens, yesterday was the 1 year anniversary of the last >> successful Hudson/Jenkins build... If that actually worked, we >> could point people towards it as a useful recipe for how to get a >> build working off trunk. I haven't been following Nutch too >> closely, but it always strikes me as really odd, that there's a >> nightly build and it doesn't bother anybody that it fails all the >> time (and that there isn't a nightly build for the stable >> branches). > > The real issue behind all this is what we should do with Nutch 2.0. >>> >>> What >>> > follows is only my opinion and I would love to hear what others have > to say on this subject. > > Since we (actually mostly Dogacan) wrote 2.0 and delegated the > storage >>> >>> to >>> > Gora, the latter hasn't really taken off since incubation. There have > been some modest contributions to it but it does not seem to be used > much and there is virtually nothing happening on it in terms of > development. More worryingly, the people who initially contributed to >>> >>> it >>> > are not very active on the project (such is life, new jobs, different > projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any > progress in the last 12 months : we still have the same bugs, the >>> >>> tests >>> > do not work, the build has to be done manually etc... Yep. > At the same time, there has been a new lease of life into Nutch as a > whole : there is definitely more activity on the mailing lists, new > users, new active committers etc... and quite a few bugfixes and > improvements - most of them backported from what had been done in the > trunk and people seem fairly happy with what we can do with 1.4 Totally agreed. I'm actually not super surprised -- ever since 1.1, I >>> >>> kind >>> of felt that maintaining a stable 1.X branch of Nutch (in parallel to the 2.0 efforts) was really going to pay off since there was renewed interest from users in leveraging (and furthermore accepting) the nuances of 1.X. > So the question is : what shall we do with 2.0? Here are a few > possibilities > > > a) put some effort into it, fix the bugs and make so that it can be >>> >>> used >>> > instead of 1.x > b) shelve it and leave it for enthusiasts to play with + make 1.x the > trunk again > c) do nothing : keep 2.0 and 1.x in parallel (but having to maintain >>> >>> two >>> > branches is quite a pain) > d) abandon the idea of a neutral storage layer with Gora and hardwire >>> >>> it >>> > to e.g. HBase > > Option (a) has not happened in the last 12 months and I am not very > hopeful about it. > > What do you guys think? I'd suggest an option e). Evolve and keep releasing 1.X over the next 6 months, and keep 2.0 in the trunk. After 6 months, see how close 1.X is >>> >>> to >>> actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we get to ~1.6 over the next 6 months and there is still no active development >>> >>> on >>> 2.0, I'd propose we do this at that point in time: 1. branch the current trunk as https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab latest stable branch (e.g., https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and >>> >>> *re
Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
Option B) Shelve trunk in a branch and promote 1.4 to trunk. We can always choose to hardwire HBASE (option D) later. Markus > Am happy to call for a vote on the future of Nutch 2.0 if you want. Shall > we reduce the various options described before to a single one? > > Julien > > On 15 September 2011 19:55, Markus Jelsma wrote: > > > Hi Guys, > > > > > > I thought I'd chime in on this thread. My comments below: > > > > I understand and share your frustration, however you need to bear in > > > > mind > > > > > > that things are done only if people volunteer and have time - usually > > > > taken from their holiday, weekends, evenings. Chris (who is the de > > > > facto > > > > > > release master for Nutch and Gora) has not had the time and nobody > > > > else has volunteered to do it. > > > > > > Yep I haven't had the time to push a Gora 0.1.1-incubating release that > > > will address the Maven issues. However it is on my roadmap for open > > > > source > > > > > stuff to get done in the next month, so that's a good thing. But yes, > > > > that > > > > > portion of my open source work is all volunteer time, so sometimes > > > other things take priority. > > > > > > >> As it happens, yesterday was the 1 year anniversary of the last > > > >> successful Hudson/Jenkins build... If that actually worked, we > > > >> could point people towards it as a useful recipe for how to get a > > > >> build working off trunk. I haven't been following Nutch too > > > >> closely, but it always strikes me as really odd, that there's a > > > >> nightly build and it doesn't bother anybody that it fails all the > > > >> time (and that there isn't a nightly build for the stable > > > >> branches). > > > > > > > > The real issue behind all this is what we should do with Nutch 2.0. > > > > What > > > > > > follows is only my opinion and I would love to hear what others have > > > > to say on this subject. > > > > > > > > Since we (actually mostly Dogacan) wrote 2.0 and delegated the > > > > storage > > > > to > > > > > > Gora, the latter hasn't really taken off since incubation. There have > > > > been some modest contributions to it but it does not seem to be used > > > > much and there is virtually nothing happening on it in terms of > > > > development. More worryingly, the people who initially contributed to > > > > it > > > > > > are not very active on the project (such is life, new jobs, different > > > > projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any > > > > progress in the last 12 months : we still have the same bugs, the > > > > tests > > > > > > do not work, the build has to be done manually etc... > > > > > > Yep. > > > > > > > At the same time, there has been a new lease of life into Nutch as a > > > > whole : there is definitely more activity on the mailing lists, new > > > > users, new active committers etc... and quite a few bugfixes and > > > > improvements - most of them backported from what had been done in the > > > > trunk and people seem fairly happy with what we can do with 1.4 > > > > > > Totally agreed. I'm actually not super surprised -- ever since 1.1, I > > > > kind > > > > > of felt that maintaining a stable 1.X branch of Nutch (in parallel to > > > the 2.0 efforts) was really going to pay off since there was renewed > > > interest from users in leveraging (and furthermore accepting) the > > > nuances of 1.X. > > > > > > > So the question is : what shall we do with 2.0? Here are a few > > > > possibilities > > > > > > > > > > > > a) put some effort into it, fix the bugs and make so that it can be > > > > used > > > > > > instead of 1.x > > > > b) shelve it and leave it for enthusiasts to play with + make 1.x the > > > > trunk again > > > > c) do nothing : keep 2.0 and 1.x in parallel (but having to maintain > > > > two > > > > > > branches is quite a pain) > > > > d) abandon the idea of a neutral storage layer with Gora and hardwire > > > > it > > > > > > to e.g. HBase > > > > > > > > Option (a) has not happened in the last 12 months and I am not very > > > > hopeful about it. > > > > > > > > What do you guys think? > > > > > > I'd suggest an option e). Evolve and keep releasing 1.X over the next 6 > > > months, and keep 2.0 in the trunk. After 6 months, see how close 1.X is > > > > to > > > > > actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we > > > get to ~1.6 over the next 6 months and there is still no active > > > development > > > > on > > > > > 2.0, I'd propose we do this at that point in time: > > > > > > 1. branch the current trunk as > > > https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab > > > latest stable branch (e.g., > > > https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and > > > > *replace* > > > > > the Nutch trunk with it, and bump the version # to 1.7-dev 3. active > > > development on stable becomes active development in trunk and nutchgora > > > still exists in case
Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
Hi Julien, I didn't want to skip ship with this one, but it seems that the binding community has already spoken their mind, and I for one shadow your suggestion. It's clear that trunk as it currently exists is not bleeding edge, there have been too many broken fronts to launch a concentrated code development attack on that it has simply not happened at all. We all seem to be using 1.4 well and I am extremely impressed and very happy with the way development is going. We are making a steady effort as a community to address issues and the common community interests are usually being met with reasonable support from anyone who can help out. If anything, Trunk is a bit of a headache and although some of us want to see it working (me included), I don't think it is within the communities best interests. I'm ready for a vote. And yes I think voting should be reduced. Based on past threads, it seemed to be a bit too complex, and the subsequent outcome was that nothing was really done and trunk was still broken. Maybe once Gora has matured a bit Nutch trunk will re-emerge as an attractive model. Thank you On Fri, Sep 16, 2011 at 5:26 PM, Julien Nioche < lists.digitalpeb...@gmail.com> wrote: > Am happy to call for a vote on the future of Nutch 2.0 if you want. Shall > we reduce the various options described before to a single one? > > Julien > > On 15 September 2011 19:55, Markus Jelsma wrote: > >> >> > Hi Guys, >> > >> > I thought I'd chime in on this thread. My comments below: >> > > I understand and share your frustration, however you need to bear in >> mind >> > > that things are done only if people volunteer and have time - usually >> > > taken from their holiday, weekends, evenings. Chris (who is the de >> facto >> > > release master for Nutch and Gora) has not had the time and nobody >> else >> > > has volunteered to do it. >> > >> > Yep I haven't had the time to push a Gora 0.1.1-incubating release that >> > will address the Maven issues. However it is on my roadmap for open >> source >> > stuff to get done in the next month, so that's a good thing. But yes, >> that >> > portion of my open source work is all volunteer time, so sometimes other >> > things take priority. >> > >> > >> As it happens, yesterday was the 1 year anniversary of the last >> > >> successful Hudson/Jenkins build... If that actually worked, we could >> > >> point people towards it as a useful recipe for how to get a build >> > >> working off trunk. I haven't been following Nutch too closely, but >> it >> > >> always strikes me as really odd, that there's a nightly build and it >> > >> doesn't bother anybody that it fails all the time (and that there >> > >> isn't a nightly build for the stable branches). >> > > >> > > The real issue behind all this is what we should do with Nutch 2.0. >> What >> > > follows is only my opinion and I would love to hear what others have >> to >> > > say on this subject. >> > > >> > > Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage >> to >> > > Gora, the latter hasn't really taken off since incubation. There have >> > > been some modest contributions to it but it does not seem to be used >> > > much and there is virtually nothing happening on it in terms of >> > > development. More worryingly, the people who initially contributed to >> it >> > > are not very active on the project (such is life, new jobs, different >> > > projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any >> > > progress in the last 12 months : we still have the same bugs, the >> tests >> > > do not work, the build has to be done manually etc... >> > >> > Yep. >> > >> > > At the same time, there has been a new lease of life into Nutch as a >> > > whole : there is definitely more activity on the mailing lists, new >> > > users, new active committers etc... and quite a few bugfixes and >> > > improvements - most of them backported from what had been done in the >> > > trunk and people seem fairly happy with what we can do with 1.4 >> > >> > Totally agreed. I'm actually not super surprised -- ever since 1.1, I >> kind >> > of felt that maintaining a stable 1.X branch of Nutch (in parallel to >> the >> > 2.0 efforts) was really going to pay off since there was renewed >> interest >> > from users in leveraging (and furthermore accepting) the nuances of 1.X. >> > >> > > So the question is : what shall we do with 2.0? Here are a few >> > > possibilities >> > > >> > > >> > > a) put some effort into it, fix the bugs and make so that it can be >> used >> > > instead of 1.x >> > > b) shelve it and leave it for enthusiasts to play with + make 1.x the >> > > trunk again >> > > c) do nothing : keep 2.0 and 1.x in parallel (but having to maintain >> two >> > > branches is quite a pain) >> > > d) abandon the idea of a neutral storage layer with Gora and hardwire >> it >> > > to e.g. HBase >> > > >> > > Option (a) has not happened in the last 12 months and I am not very >> > > hopeful about it. >> > > >> > > Wh
Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
Am happy to call for a vote on the future of Nutch 2.0 if you want. Shall we reduce the various options described before to a single one? Julien On 15 September 2011 19:55, Markus Jelsma wrote: > > > Hi Guys, > > > > I thought I'd chime in on this thread. My comments below: > > > I understand and share your frustration, however you need to bear in > mind > > > that things are done only if people volunteer and have time - usually > > > taken from their holiday, weekends, evenings. Chris (who is the de > facto > > > release master for Nutch and Gora) has not had the time and nobody else > > > has volunteered to do it. > > > > Yep I haven't had the time to push a Gora 0.1.1-incubating release that > > will address the Maven issues. However it is on my roadmap for open > source > > stuff to get done in the next month, so that's a good thing. But yes, > that > > portion of my open source work is all volunteer time, so sometimes other > > things take priority. > > > > >> As it happens, yesterday was the 1 year anniversary of the last > > >> successful Hudson/Jenkins build... If that actually worked, we could > > >> point people towards it as a useful recipe for how to get a build > > >> working off trunk. I haven't been following Nutch too closely, but it > > >> always strikes me as really odd, that there's a nightly build and it > > >> doesn't bother anybody that it fails all the time (and that there > > >> isn't a nightly build for the stable branches). > > > > > > The real issue behind all this is what we should do with Nutch 2.0. > What > > > follows is only my opinion and I would love to hear what others have to > > > say on this subject. > > > > > > Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage > to > > > Gora, the latter hasn't really taken off since incubation. There have > > > been some modest contributions to it but it does not seem to be used > > > much and there is virtually nothing happening on it in terms of > > > development. More worryingly, the people who initially contributed to > it > > > are not very active on the project (such is life, new jobs, different > > > projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any > > > progress in the last 12 months : we still have the same bugs, the > tests > > > do not work, the build has to be done manually etc... > > > > Yep. > > > > > At the same time, there has been a new lease of life into Nutch as a > > > whole : there is definitely more activity on the mailing lists, new > > > users, new active committers etc... and quite a few bugfixes and > > > improvements - most of them backported from what had been done in the > > > trunk and people seem fairly happy with what we can do with 1.4 > > > > Totally agreed. I'm actually not super surprised -- ever since 1.1, I > kind > > of felt that maintaining a stable 1.X branch of Nutch (in parallel to the > > 2.0 efforts) was really going to pay off since there was renewed interest > > from users in leveraging (and furthermore accepting) the nuances of 1.X. > > > > > So the question is : what shall we do with 2.0? Here are a few > > > possibilities > > > > > > > > > a) put some effort into it, fix the bugs and make so that it can be > used > > > instead of 1.x > > > b) shelve it and leave it for enthusiasts to play with + make 1.x the > > > trunk again > > > c) do nothing : keep 2.0 and 1.x in parallel (but having to maintain > two > > > branches is quite a pain) > > > d) abandon the idea of a neutral storage layer with Gora and hardwire > it > > > to e.g. HBase > > > > > > Option (a) has not happened in the last 12 months and I am not very > > > hopeful about it. > > > > > > What do you guys think? > > > > I'd suggest an option e). Evolve and keep releasing 1.X over the next 6 > > months, and keep 2.0 in the trunk. After 6 months, see how close 1.X is > to > > actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we get > > to ~1.6 over the next 6 months and there is still no active development > on > > 2.0, I'd propose we do this at that point in time: > > > > 1. branch the current trunk as > > https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab latest > > stable branch (e.g., > > https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and > *replace* > > the Nutch trunk with it, and bump the version # to 1.7-dev 3. active > > development on stable becomes active development in trunk and nutchgora > > still exists in case anyone ever resurrects it. > > > > That way, we give another 6 months to see how it shakes out and > potentially > > allow for 1 or 2 or 3 more stable releases before switching those over to > > trunk. > > > > Thoughts? > > Yes. I don't believe we should wait until january before discussing this > topic > again. I, for example, cannot spend considerable extra time on the issues i > put in 1.4, also due to the fact that it's not entirely stable. > > There are many things i can write about this topic right now but don't
Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
On Thu, Sep 15, 2011 at 9:55 PM, Markus Jelsma wrote: > There are many things i can write about this topic right now but don't feel > it's neccessary. The choice is difficult and perhaps painful but when the > voting round is opened by our project lead, i will vote for promoting 1.x back > to trunk. +1, Same here -- Sami Siren
Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
> Hi Guys, > > I thought I'd chime in on this thread. My comments below: > > I understand and share your frustration, however you need to bear in mind > > that things are done only if people volunteer and have time - usually > > taken from their holiday, weekends, evenings. Chris (who is the de facto > > release master for Nutch and Gora) has not had the time and nobody else > > has volunteered to do it. > > Yep I haven't had the time to push a Gora 0.1.1-incubating release that > will address the Maven issues. However it is on my roadmap for open source > stuff to get done in the next month, so that's a good thing. But yes, that > portion of my open source work is all volunteer time, so sometimes other > things take priority. > > >> As it happens, yesterday was the 1 year anniversary of the last > >> successful Hudson/Jenkins build... If that actually worked, we could > >> point people towards it as a useful recipe for how to get a build > >> working off trunk. I haven't been following Nutch too closely, but it > >> always strikes me as really odd, that there's a nightly build and it > >> doesn't bother anybody that it fails all the time (and that there > >> isn't a nightly build for the stable branches). > > > > The real issue behind all this is what we should do with Nutch 2.0. What > > follows is only my opinion and I would love to hear what others have to > > say on this subject. > > > > Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to > > Gora, the latter hasn't really taken off since incubation. There have > > been some modest contributions to it but it does not seem to be used > > much and there is virtually nothing happening on it in terms of > > development. More worryingly, the people who initially contributed to it > > are not very active on the project (such is life, new jobs, different > > projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any > > progress in the last 12 months : we still have the same bugs, the tests > > do not work, the build has to be done manually etc... > > Yep. > > > At the same time, there has been a new lease of life into Nutch as a > > whole : there is definitely more activity on the mailing lists, new > > users, new active committers etc... and quite a few bugfixes and > > improvements - most of them backported from what had been done in the > > trunk and people seem fairly happy with what we can do with 1.4 > > Totally agreed. I'm actually not super surprised -- ever since 1.1, I kind > of felt that maintaining a stable 1.X branch of Nutch (in parallel to the > 2.0 efforts) was really going to pay off since there was renewed interest > from users in leveraging (and furthermore accepting) the nuances of 1.X. > > > So the question is : what shall we do with 2.0? Here are a few > > possibilities > > > > > > a) put some effort into it, fix the bugs and make so that it can be used > > instead of 1.x > > b) shelve it and leave it for enthusiasts to play with + make 1.x the > > trunk again > > c) do nothing : keep 2.0 and 1.x in parallel (but having to maintain two > > branches is quite a pain) > > d) abandon the idea of a neutral storage layer with Gora and hardwire it > > to e.g. HBase > > > > Option (a) has not happened in the last 12 months and I am not very > > hopeful about it. > > > > What do you guys think? > > I'd suggest an option e). Evolve and keep releasing 1.X over the next 6 > months, and keep 2.0 in the trunk. After 6 months, see how close 1.X is to > actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we get > to ~1.6 over the next 6 months and there is still no active development on > 2.0, I'd propose we do this at that point in time: > > 1. branch the current trunk as > https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab latest > stable branch (e.g., > https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and *replace* > the Nutch trunk with it, and bump the version # to 1.7-dev 3. active > development on stable becomes active development in trunk and nutchgora > still exists in case anyone ever resurrects it. > > That way, we give another 6 months to see how it shakes out and potentially > allow for 1 or 2 or 3 more stable releases before switching those over to > trunk. > > Thoughts? Yes. I don't believe we should wait until january before discussing this topic again. I, for example, cannot spend considerable extra time on the issues i put in 1.4, also due to the fact that it's not entirely stable. There are many things i can write about this topic right now but don't feel it's neccessary. The choice is difficult and perhaps painful but when the voting round is opened by our project lead, i will vote for promoting 1.x back to trunk. My apologies for my impatience and pessimism. > > BTW, I have a couple contributions from my CS572: Search Engines class from > a year ago that I'd love to port into the Nutch stable branch including > Hubs/Authorities ran
Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
Hi Guys, I thought I'd chime in on this thread. My comments below: > I understand and share your frustration, however you need to bear in mind > that things are done only if people volunteer and have time - usually taken > from their holiday, weekends, evenings. Chris (who is the de facto release > master for Nutch and Gora) has not had the time and nobody else has > volunteered to do it. Yep I haven't had the time to push a Gora 0.1.1-incubating release that will address the Maven issues. However it is on my roadmap for open source stuff to get done in the next month, so that's a good thing. But yes, that portion of my open source work is all volunteer time, so sometimes other things take priority. > > >> As it happens, yesterday was the 1 year anniversary of the last >> successful Hudson/Jenkins build... If that actually worked, we could >> point people towards it as a useful recipe for how to get a build >> working off trunk. I haven't been following Nutch too closely, but it >> always strikes me as really odd, that there's a nightly build and it >> doesn't bother anybody that it fails all the time (and that there >> isn't a nightly build for the stable branches). >> > > The real issue behind all this is what we should do with Nutch 2.0. What > follows is only my opinion and I would love to hear what others have to say > on this subject. > > Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to > Gora, the latter hasn't really taken off since incubation. There have been > some modest contributions to it but it does not seem to be used much and > there is virtually nothing happening on it in terms of development. More > worryingly, the people who initially contributed to it are not very active > on the project (such is life, new jobs, different projects, etc...) > anymore·. As for Nutch 2.0, it hasn't made any progress in the last 12 > months : we still have the same bugs, the tests do not work, the build has > to be done manually etc... Yep. > > At the same time, there has been a new lease of life into Nutch as a whole : > there is definitely more activity on the mailing lists, new users, new > active committers etc... and quite a few bugfixes and improvements - most > of them backported from what had been done in the trunk and people seem > fairly happy with what we can do with 1.4 Totally agreed. I'm actually not super surprised -- ever since 1.1, I kind of felt that maintaining a stable 1.X branch of Nutch (in parallel to the 2.0 efforts) was really going to pay off since there was renewed interest from users in leveraging (and furthermore accepting) the nuances of 1.X. > > So the question is : what shall we do with 2.0? Here are a few possibilities > : > > a) put some effort into it, fix the bugs and make so that it can be used > instead of 1.x > b) shelve it and leave it for enthusiasts to play with + make 1.x the trunk > again > c) do nothing : keep 2.0 and 1.x in parallel (but having to maintain two > branches is quite a pain) > d) abandon the idea of a neutral storage layer with Gora and hardwire it to > e.g. HBase > > Option (a) has not happened in the last 12 months and I am not very hopeful > about it. > > What do you guys think? I'd suggest an option e). Evolve and keep releasing 1.X over the next 6 months, and keep 2.0 in the trunk. After 6 months, see how close 1.X is to actually being 2.0 (e.g., did we release a 1.4, a 1.5, a 1.6?) If we get to ~1.6 over the next 6 months and there is still no active development on 2.0, I'd propose we do this at that point in time: 1. branch the current trunk as https://svn.apache.org/repos/asf/nutch/branches/nutchgora 2. grab latest stable branch (e.g., https://svn.apache.org/repos/asf/nutch/branches/branch-1.6) and *replace* the Nutch trunk with it, and bump the version # to 1.7-dev 3. active development on stable becomes active development in trunk and nutchgora still exists in case anyone ever resurrects it. That way, we give another 6 months to see how it shakes out and potentially allow for 1 or 2 or 3 more stable releases before switching those over to trunk. Thoughts? BTW, I have a couple contributions from my CS572: Search Engines class from a year ago that I'd love to port into the Nutch stable branch including Hubs/Authorities ranking and some other goodies. I'll try and work on those over the next few months, I'm just letting everyone know now so I don't forget again :-) Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
Hi, Without changing the flow of conversation and the points which have already been touched upon, I would like to add: I am really split here between a couple of decisions. I like the abstraction that Gora provides, even though it is somewhat of a pain to configure, this also presents a barrier to adoption for dev's. This being said, Gora is a fundamental component for Nutch 2.0 and once you get to grips with the config and the flexibility which it offers you are then presented with an excellent setup for Nutch 2.0. I understand people's concerns and why they would wish to hardwire to HBase however I would like to point to a (rather lengthy) thread I found last night as I was thinking about my position in this whole affair [1]. In essence this reflects exactly what Julien has mentioned below as well as adding a hellish lot more! I am also with Markus on this one, however there is also no point in me being anything other than totally honest, some of the bugs in trunk 2.0 we are talking about are pretty substantial (I don't even know them all), especially when the API changes are taken into account, therefore I would be learning as I chipped in my part... this would inevitably lead to slower progression on Nutch 2.0 than we all would hope for. Bearing in mind several dev's other commitments both in and out of the ASF. Is this something which can be tolerated or are we to put suggestions in place which adhere to the release early release often ethos and try to get something out of the door. If we could get an official release for Nutch 2.0 then it would mean community testing could commence and instead of improvement suggestions resulting within JIRA tickets we would be getting bugs specifically for 2.0 as independent issues, this would inevitably lead to a better trunk development environment for us all. One inverse aspect of veering towards option A) is that we had a small amount of resistance when Nutch 1.3 was release... would making Nutch 2.0 mainstream, the de facto for Nutch users be a step too far for some of them? I am a firm believer that we should do whatever necessary to get trunk building under Hudson. It seems like a waste of resources that we have the potential to have a stable build environment but it is not being taken advantage of. Obviously I am unaware of exactly what is preventing this, hence my keenness to get it sorted out, but surely we all must agree that this would be beneficial, from a mental point of view as well. If we see that trunk is building successfully then there might be a better feeling about people developing not only on trunk 2.0 but also on Gora and other components upon which trunk 2.0 depends. Further to this, is there any consensus to get a jenkins build established for branch 1.X? It is quite clear that this is our working development strand therefore would this not make sense? I have been looking through the wiki [2] and any committer can get it set up once the PMC chair makes some minor requests on people.apache,org Finally, with regards to the ant/ivy configuration, I am quite happy with the current set up, if someone puts forward a reasonable argument for changing to ant/maven or any other configuration then I will certainly be interested if it adds value to the project. I must agree that changing something which is not broken is far from the direction I had envisaged we were moving... quite the opposite infact. [1] http://www.mail-archive.com/dev@nutch.apache.org/msg00216.html [2] http://wiki.apache.org/general/Hudson On Wed, Aug 10, 2011 at 10:20 AM, Markus Jelsma wrote: > Julien, devs, users, > > I'd like to see bugs fixed in 2.0 but some of them are way out of my league > or > would cost me an absurd amount of time. I'd also really like to use Gora > but > Gora must be maintained. Gora will play a fundamental role in 2.0 and if > something is broken there it is not trivial to fix it for us Nutch devs as > it > is yet another component to worry about. > > Tika goes well, it's worked on and there is good enough progress to rely on > from our perspective. If this is not going to be the case with Gora we > should > maybe decide to drop it and hardwire HBASE in it. > > Maintaining 1.x and 2.x is a pain indeed. I'd prefer option A) but i'm not > sure the currently active Nutch devs are going to fix it just like that. > > Cheers, > > > > > > a) put some effort into it, fix the bugs and make so that it can be used > > instead of 1.x > > b) shelve it and leave it for enthusiasts to play with + make 1.x the > trunk > > again > > c) do nothing : keep 2.0 and 1.x in parallel (but having to maintain two > > branches is quite a pain) > > d) abandon the idea of a neutral storage layer with Gora and hardwire it > to > > e.g. HBase > > > > Option (a) has not happened in the last 12 months and I am not very > hopeful > > about it. > > > > What do you guys think? > > > > Julien > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536
Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
Julien, devs, users, I'd like to see bugs fixed in 2.0 but some of them are way out of my league or would cost me an absurd amount of time. I'd also really like to use Gora but Gora must be maintained. Gora will play a fundamental role in 2.0 and if something is broken there it is not trivial to fix it for us Nutch devs as it is yet another component to worry about. Tika goes well, it's worked on and there is good enough progress to rely on from our perspective. If this is not going to be the case with Gora we should maybe decide to drop it and hardwire HBASE in it. Maintaining 1.x and 2.x is a pain indeed. I'd prefer option A) but i'm not sure the currently active Nutch devs are going to fix it just like that. Cheers, On Tuesday 09 August 2011 17:10:12 Julien Nioche wrote: > Hi Kirby, > > Grumble, Grumble. (adding dev@nutch, as that is more than likely > > > where this discussion really belongs)... > > am adding gora-...@incubator.apache.org as well > > > It'd be really nice if folks could just follow the commands in the > > nightly build, and get a build pushed out. I've pointed this out > > previously, and was told this would be fixed "shortly" (right after > > GORA-0.1 finally got released, but not published in public maven repo, > > which as far as I know, it still isn't published, but I stopped > > checking on it). > > I understand and share your frustration, however you need to bear in mind > that things are done only if people volunteer and have time - usually taken > from their holiday, weekends, evenings. Chris (who is the de facto release > master for Nutch and Gora) has not had the time and nobody else has > volunteered to do it. > > > As it happens, yesterday was the 1 year anniversary of the last > > successful Hudson/Jenkins build... If that actually worked, we could > > point people towards it as a useful recipe for how to get a build > > working off trunk. I haven't been following Nutch too closely, but it > > always strikes me as really odd, that there's a nightly build and it > > doesn't bother anybody that it fails all the time (and that there > > isn't a nightly build for the stable branches). > > The real issue behind all this is what we should do with Nutch 2.0. What > follows is only my opinion and I would love to hear what others have to say > on this subject. > > Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to > Gora, the latter hasn't really taken off since incubation. There have been > some modest contributions to it but it does not seem to be used much and > there is virtually nothing happening on it in terms of development. More > worryingly, the people who initially contributed to it are not very active > on the project (such is life, new jobs, different projects, etc...) > anymore·. As for Nutch 2.0, it hasn't made any progress in the last 12 > months : we still have the same bugs, the tests do not work, the build has > to be done manually etc... > > At the same time, there has been a new lease of life into Nutch as a whole > : there is definitely more activity on the mailing lists, new users, new > active committers etc... and quite a few bugfixes and improvements - most > of them backported from what had been done in the trunk and people seem > fairly happy with what we can do with 1.4 > > So the question is : what shall we do with 2.0? Here are a few > possibilities > > > a) put some effort into it, fix the bugs and make so that it can be used > instead of 1.x > b) shelve it and leave it for enthusiasts to play with + make 1.x the trunk > again > c) do nothing : keep 2.0 and 1.x in parallel (but having to maintain two > branches is quite a pain) > d) abandon the idea of a neutral storage layer with Gora and hardwire it to > e.g. HBase > > Option (a) has not happened in the last 12 months and I am not very hopeful > about it. > > What do you guys think? > > Julien -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
Hi Tom, > I have been using Nutch 1.x for the last 9 months or so and it works well > for large scale crawls up to around a billion pages. However, the inherent > lack of random access in HDFS really starts to become a burden on our hadoop > cluster when going through the whole generate/update/fetch cycle. Being able > to circumvent HDFS and store data directly in Cassandra/HBase/SQL via GORA > is an exciting development in Nutch 2, so I have an interest in making it > succeed. > I assume that you are referring to the fact that after a while the generation and update steps end up taking most of the time compared to the fetching / parsing. One way around this is to generate multiple segments in a single generate and update them all with the crawldb in one go, see the options for the Generator. > > > That said, I too, have been frustrated by the state of affairs on Nutch 2. > I am willing to help. > Good to hear that. > I see that Nutch is mainly an ant/ivy build process, but there is an > attempt at using Maven? IMO, ant/ivy seems a bit dated and I am really much > more comfortable working with Maven. Would there be an interest in > completely moving to Maven as the build tool of choice? > [Oh no, one of these endless discussions again :-( ] The consensus among the people actively involved in the project was that ANT+IVY was a better option than plain Maven, due notably to the fact that the ANT scripts were already written and the effort could be used in a more fruitful way doing something else. There are comments on the mailing lists from people who are used to Maven but some of them seem to be happy with the pom file used to publish the artefacts, while others end up using IvyDE for Eclipse and the ANT scripts and realise that it works fine. I don't think that Ivy is dated at all and, again, would rather see people contributing useful code instead of spending time trying to fix things that are not broken. I'd personally be completely against using Maven on its own but would consider ANT+MAVEN tasks for managing the modules + dependencies and the publication of artefacts. We currently have Ivy for the dependencies and modules and Maven for the publication, using the Maven tasks could be used for both and would simplify things a little bit while preserving most of the ANT script. As usual suggestions and contributions are welcome. Julien
RE: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
Hi All, I have been using Nutch 1.x for the last 9 months or so and it works well for large scale crawls up to around a billion pages. However, the inherent lack of random access in HDFS really starts to become a burden on our hadoop cluster when going through the whole generate/update/fetch cycle. Being able to circumvent HDFS and store data directly in Cassandra/HBase/SQL via GORA is an exciting development in Nutch 2, so I have an interest in making it succeed. That said, I too, have been frustrated by the state of affairs on Nutch 2. I am willing to help. I see that Nutch is mainly an ant/ivy build process, but there is an attempt at using Maven? IMO, ant/ivy seems a bit dated and I am really much more comfortable working with Maven. Would there be an interest in completely moving to Maven as the build tool of choice? From: Kirby Bohling [mailto:kirby.bohl...@gmail.com] Sent: Tuesday, August 09, 2011 8:31 AM To: dev@nutch.apache.org Cc: gora-...@incubator.apache.org Subject: Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk] Julien, On Tue, Aug 9, 2011 at 10:10 AM, Julien Nioche mailto:lists.digitalpeb...@gmail.com>> wrote: Hi Kirby, Grumble, Grumble. (adding dev@nutch, as that is more than likely where this discussion really belongs)... am adding gora-...@incubator.apache.org<mailto:gora-...@incubator.apache.org> as well It'd be really nice if folks could just follow the commands in the nightly build, and get a build pushed out. I've pointed this out previously, and was told this would be fixed "shortly" (right after GORA-0.1 finally got released, but not published in public maven repo, which as far as I know, it still isn't published, but I stopped checking on it). I understand and share your frustration, however you need to bear in mind that things are done only if people volunteer and have time - usually taken from their holiday, weekends, evenings. Chris (who is the de facto release master for Nutch and Gora) has not had the time and nobody else has volunteered to do it. I don't mean to be a complainer, I'd happily try and contribute fixes on this one, but most of this would likely have to be done on Hudson/Jenkins. I think you're addressing a larger issue than I really meant. My point was, somehow a developer does a build on their desktop, and however that is done should be duplicated on Hudson/Jenkins. If you need the trunk of gora, then is it possible to checkout it out, build it and install it to a local repo, and then build Nutch via Hudson/Jenkins? Whatever it takes to get a build should be what the CI server is doing. The repeatable, but failing builds is what really confuses and frustrates me. The nightly/CI build should be automating what devs on their desktop to ensure it'll work on a clean setup. Right now, it just tells you that for the last year, the totally obvious steps will lead to a failure. I can figure out all of the configuration issues for Hudson/Jenkins to make it work, if somebody can push that into the Apache version. However, I think answering your questions first would be a good idea. My totally non-binding +1 for setting up a CI/Nightly build for the various stable branches too, the only one I found on Apache was for trunk. As it happens, yesterday was the 1 year anniversary of the last successful Hudson/Jenkins build... If that actually worked, we could point people towards it as a useful recipe for how to get a build working off trunk. I haven't been following Nutch too closely, but it always strikes me as really odd, that there's a nightly build and it doesn't bother anybody that it fails all the time (and that there isn't a nightly build for the stable branches). The real issue behind all this is what we should do with Nutch 2.0. What follows is only my opinion and I would love to hear what others have to say on this subject. Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to Gora, the latter hasn't really taken off since incubation. There have been some modest contributions to it but it does not seem to be used much and there is virtually nothing happening on it in terms of development. More worryingly, the people who initially contributed to it are not very active on the project (such is life, new jobs, different projects, etc...) anymore*. As for Nutch 2.0, it hasn't made any progress in the last 12 months : we still have the same bugs, the tests do not work, the build has to be done manually etc... At the same time, there has been a new lease of life into Nutch as a whole : there is definitely more activity on the mailing lists, new users, new active committers etc... and quite a few bugfixes and improvements - most of them backported from what had been done in the trunk and people seem fairly happy with
Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
Julien, On Tue, Aug 9, 2011 at 10:10 AM, Julien Nioche < lists.digitalpeb...@gmail.com> wrote: > Hi Kirby, > > Grumble, Grumble. (adding dev@nutch, as that is more than likely >> where this discussion really belongs)... >> > > am adding gora-...@incubator.apache.org as well > > >> It'd be really nice if folks could just follow the commands in the >> nightly build, and get a build pushed out. I've pointed this out >> previously, and was told this would be fixed "shortly" (right after >> GORA-0.1 finally got released, but not published in public maven repo, >> which as far as I know, it still isn't published, but I stopped >> checking on it). >> > > I understand and share your frustration, however you need to bear in mind > that things are done only if people volunteer and have time - usually taken > from their holiday, weekends, evenings. Chris (who is the de facto release > master for Nutch and Gora) has not had the time and nobody else has > volunteered to do it. > I don't mean to be a complainer, I'd happily try and contribute fixes on this one, but most of this would likely have to be done on Hudson/Jenkins. I think you're addressing a larger issue than I really meant. My point was, somehow a developer does a build on their desktop, and however that is done should be duplicated on Hudson/Jenkins. If you need the trunk of gora, then is it possible to checkout it out, build it and install it to a local repo, and then build Nutch via Hudson/Jenkins? Whatever it takes to get a build should be what the CI server is doing. The repeatable, but failing builds is what really confuses and frustrates me. The nightly/CI build should be automating what devs on their desktop to ensure it'll work on a clean setup. Right now, it just tells you that for the last year, the totally obvious steps will lead to a failure. I can figure out all of the configuration issues for Hudson/Jenkins to make it work, if somebody can push that into the Apache version. However, I think answering your questions first would be a good idea. My totally non-binding +1 for setting up a CI/Nightly build for the various stable branches too, the only one I found on Apache was for trunk. > >> As it happens, yesterday was the 1 year anniversary of the last >> successful Hudson/Jenkins build... If that actually worked, we could >> point people towards it as a useful recipe for how to get a build >> working off trunk. I haven't been following Nutch too closely, but it >> always strikes me as really odd, that there's a nightly build and it >> doesn't bother anybody that it fails all the time (and that there >> isn't a nightly build for the stable branches). >> > > The real issue behind all this is what we should do with Nutch 2.0. What > follows is only my opinion and I would love to hear what others have to say > on this subject. > > Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to > Gora, the latter hasn't really taken off since incubation. There have been > some modest contributions to it but it does not seem to be used much and > there is virtually nothing happening on it in terms of development. More > worryingly, the people who initially contributed to it are not very active > on the project (such is life, new jobs, different projects, etc...) > anymore·. As for Nutch 2.0, it hasn't made any progress in the last 12 > months : we still have the same bugs, the tests do not work, the build has > to be done manually etc... > > At the same time, there has been a new lease of life into Nutch as a whole > : there is definitely more activity on the mailing lists, new users, new > active committers etc... and quite a few bugfixes and improvements - most > of them backported from what had been done in the trunk and people seem > fairly happy with what we can do with 1.4 > > So the question is : what shall we do with 2.0? Here are a few > possibilities : > > a) put some effort into it, fix the bugs and make so that it can be used > instead of 1.x > b) shelve it and leave it for enthusiasts to play with + make 1.x the trunk > again > c) do nothing : keep 2.0 and 1.x in parallel (but having to maintain two > branches is quite a pain) > d) abandon the idea of a neutral storage layer with Gora and hardwire it to > e.g. HBase > > Option (a) has not happened in the last 12 months and I am not very hopeful > about it. > > What do you guys think? > I know nothing about the 2.0 branch, and can't really contribute to that conversation (that job issue interferes will all my free time). Kirby > Julien > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com >
Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
Hi Kirby, Grumble, Grumble. (adding dev@nutch, as that is more than likely > where this discussion really belongs)... > am adding gora-...@incubator.apache.org as well > It'd be really nice if folks could just follow the commands in the > nightly build, and get a build pushed out. I've pointed this out > previously, and was told this would be fixed "shortly" (right after > GORA-0.1 finally got released, but not published in public maven repo, > which as far as I know, it still isn't published, but I stopped > checking on it). > I understand and share your frustration, however you need to bear in mind that things are done only if people volunteer and have time - usually taken from their holiday, weekends, evenings. Chris (who is the de facto release master for Nutch and Gora) has not had the time and nobody else has volunteered to do it. > As it happens, yesterday was the 1 year anniversary of the last > successful Hudson/Jenkins build... If that actually worked, we could > point people towards it as a useful recipe for how to get a build > working off trunk. I haven't been following Nutch too closely, but it > always strikes me as really odd, that there's a nightly build and it > doesn't bother anybody that it fails all the time (and that there > isn't a nightly build for the stable branches). > The real issue behind all this is what we should do with Nutch 2.0. What follows is only my opinion and I would love to hear what others have to say on this subject. Since we (actually mostly Dogacan) wrote 2.0 and delegated the storage to Gora, the latter hasn't really taken off since incubation. There have been some modest contributions to it but it does not seem to be used much and there is virtually nothing happening on it in terms of development. More worryingly, the people who initially contributed to it are not very active on the project (such is life, new jobs, different projects, etc...) anymore·. As for Nutch 2.0, it hasn't made any progress in the last 12 months : we still have the same bugs, the tests do not work, the build has to be done manually etc... At the same time, there has been a new lease of life into Nutch as a whole : there is definitely more activity on the mailing lists, new users, new active committers etc... and quite a few bugfixes and improvements - most of them backported from what had been done in the trunk and people seem fairly happy with what we can do with 1.4 So the question is : what shall we do with 2.0? Here are a few possibilities : a) put some effort into it, fix the bugs and make so that it can be used instead of 1.x b) shelve it and leave it for enthusiasts to play with + make 1.x the trunk again c) do nothing : keep 2.0 and 1.x in parallel (but having to maintain two branches is quite a pain) d) abandon the idea of a neutral storage layer with Gora and hardwire it to e.g. HBase Option (a) has not happened in the last 12 months and I am not very hopeful about it. What do you guys think? Julien -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com