On Wed, Apr 7, 2010 at 20:32, Andrzej Bialecki <a...@getopt.org> wrote: > On 2010-04-07 18:54, Doğacan Güney wrote: >> Hey everyone, >> >> On Tue, Apr 6, 2010 at 20:23, Andrzej Bialecki <a...@getopt.org> wrote: >>> On 2010-04-06 15:43, Julien Nioche wrote: >>>> Hi guys, >>>> >>>> I gather that we'll jump straight to 2.0 after 1.1 and that 2.0 will be >>>> based on what is currently referred to as NutchBase. Shall we create a >>>> branch for 2.0 in the Nutch SVN repository and have a label accordingly for >>>> JIRA so that we can file issues / feature requests on 2.0? Do you think >>>> that >>>> the current NutchBase could be used as a basis for the 2.0 branch? >>> >>> I'm not sure what is the status of the nutchbase - it's missed a lot of >>> fixes and changes in trunk since it's been last touched ... >>> >> >> I know... But I still intend to finish it, I just need to schedule >> some time for it. >> >> My vote would be to go with nutchbase. > > Hmm .. this puzzles me, do you think we should port changes from 1.1 to > nutchbase? I thought we should do it the other way around, i.e. merge > nutchbase bits to trunk. >
Hmm, I am a bit out of touch with the latest changes but I know that the differences between trunk and nutchbase are unfortunately rather large right now. If merging nutchbase back into trunk would be easier then sure, let's do that. > >>>> * support for HBase : via ORM or not (see >>>> NUTCH-808<https://issues.apache.org/jira/browse/NUTCH-808> >>>> ) >>> >>> This IMHO is promising, this could open the doors to small-to-medium >>> installations that are currently too cumbersome to handle. >>> >> >> Yeah, there is already a simple ORM within nutchbase that is >> avro-based and should >> be generic enough to also support MySQL, cassandra and berkeleydb. But >> any good ORM will >> be a very good addition. > > Again, the advantage of DataNucleus is that we don't have to handcraft > all the mid- to low-level mappings, just the mid-level ones (JOQL or > whatever) - the cost of maintenance is lower, and the number of backends > that are supported out of the box is larger. Of course, this is just > IMHO - we won't know for sure until we try to use both your custom ORM > and DataNucleus... I am obviously a bit biased here but I have no strong feelings really. DataNucleus is an excellent project. What I like about avro-based approach is the essentially free MapReduce support we get and the fact that supporting another language is easy. So, we can expose partial hbase data through a server and a python-client can easily read/write to it, thanks to avro. That being said, I am all for DataNucleus or something else. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > -- Doğacan Güney