Hi, >From what I know Avro deprecation is for RPC communication. The Put/Delete/ etc operations are serialized with Avro instead of the usual Writables. Regardless of what serialization the RPC sub-system uses, the data stored by the operations (Put/Get/Delete) is viewed as byte array. If we store Avro objects as binary blobs in HBase then we have no issues.
Cheers, 2012/6/12 Mihai Soloi <mihai.so...@gmail.com>: > On 12.06.2012 11:30, Eric Charles wrote: >> >> Hi Mihai, >> >> Glad to hear your exams are over (I hope they went fine) :) > > Hi Eric, > > Thanks, they went very well, I got high marks. > >> >> As Ioan said, Avro serialization HBase will be deprecated in favor of >> Protobuf (if I understand well...). > > > I think Avro could be changed rather easily with Protobuf as they're both > doing basically the same thing, only that Avro uses JSON schemas and can be > used with any other language, which is of no of value to the project. > >> >> I also like Avro because it gives you serialization & storage format in >> one box, but is this what we want? The key point here is more an effective >> access to the persisted data. > > > If the data is passed through Avro we'll have it serialized and > deserialization is basically handled by Avro, but we'll always have to > interact with the schemas. In Protobuf we have the objects compiled into our > classes, from what i gather it's mostly usefull for RPC, but Avro also has > the protocol in which by using the avro-maven-plugin you can generate you > own classes with which to interact. I can't say I'm an expert in either but > I fancy Avro. > >> >> >> There has been a few tentatives so far to marry HBase and Lucene (see [1], >> [2], [3] and [4] for example, see also [5] for a more recent article). >> > Thank you for the github links, i will look thouroughly through the > projects. I was already aware of Basene and Solandra(former Lucandra), they > have simillar aproaches. > >> The questions I am wondering: >> >> 1. Will you focus on a 'generic' solution (reusable outside James), or on >> a very specific one tuned/optimized only for James mailbox needs? > > I was thinking of writing generic code so that maybe it could be used > outside of James but the data format would be specific to James mailbox > needs, so the answer in the end is that it will be tuned for James. > >> >> 2. What strategy will you take (custom Directory or custom >> IndexReader/Writer, usage of Coprocessor or not...)? > > I was thinking that a custom Directory was the way to go, but I soon > realized that it's not as simple as it sounds and overriding the higher > level classes of IndexReader and IndexWriter would be more appropriate.(as > in article [5]) So by bypassing the Directory I would have to make use of > Hbase Coprocessors. As far as I can think of it, a RegionObserver would be > employed to gather frequently performed on data for the Lucene queries and > Endpoints. > > > > [1] https://github.com/akkumar/hbasene > [2] https://github.com/thkoch2001/lucehbase > [3] https://github.com/jasonrutherglen/HBASE-SEARCH > [4] https://github.com/jasonrutherglen/LUCENE-FOR-HBASE > [5] http://www.infoq.com/articles/LuceneHbase > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org > For additional commands, e-mail: server-dev-h...@james.apache.org > -- Ioan Eugen Stan / http://axemblr.com / Tools for Clouds --------------------------------------------------------------------- To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org