Re: GSoC: Avro Serialization over HBase

Ioan Eugen Stan Tue, 12 Jun 2012 05:15:07 -0700

Hi,

>From what I know Avro deprecation is for RPC communication. The
Put/Delete/ etc operations are serialized with Avro instead of the
usual Writables. Regardless of what serialization the RPC sub-system
uses, the data stored by the operations (Put/Get/Delete) is viewed as
byte array. If we store Avro objects as binary blobs in HBase then we
have no issues.


Cheers,

2012/6/12 Mihai Soloi <[email protected]>:
> On 12.06.2012 11:30, Eric Charles wrote:
>>
>> Hi Mihai,
>>
>> Glad to hear your exams are over (I hope they went fine) :)
>
> Hi Eric,
>
> Thanks, they went very well, I got high marks.
>
>>
>> As Ioan said, Avro serialization HBase will be deprecated in favor of
>> Protobuf (if I understand well...).
>
>
> I think Avro could be changed rather easily with Protobuf as they're both
> doing basically the same thing, only that Avro uses JSON schemas and can be
> used with any other language, which is of no of value to the project.
>
>>
>> I also like Avro because it gives you serialization & storage format in
>> one box, but is this what we want? The key point here is more an effective
>> access to the persisted data.
>
>
> If the data is passed through Avro we'll have it serialized and
> deserialization is basically handled by Avro, but we'll always have to
> interact with the schemas. In Protobuf we have the objects compiled into our
> classes, from what i gather it's mostly usefull for RPC, but Avro also has
> the protocol in which by using the avro-maven-plugin you can generate you
> own classes with which to interact. I can't say I'm an expert in either but
> I fancy Avro.
>
>>
>>
>> There has been a few tentatives so far to marry HBase and Lucene (see [1],
>> [2], [3] and [4] for example, see also [5] for a more recent article).
>>
> Thank you for the github links, i will look thouroughly through the
> projects. I was already aware of Basene and Solandra(former Lucandra), they
> have simillar aproaches.
>
>> The questions I am wondering:
>>
>> 1. Will you focus on a 'generic' solution (reusable outside James), or on
>> a very specific one tuned/optimized only for James mailbox needs?
>
> I was thinking of writing generic code so that maybe it could be used
> outside of James but the data format would be specific to James mailbox
> needs, so the answer in the end is that it will be tuned for James.
>
>>
>> 2. What strategy will you take (custom Directory or custom
>> IndexReader/Writer, usage of Coprocessor or not...)?
>
> I was thinking that a custom Directory was the way to go, but I soon
> realized that it's not as simple as it sounds and overriding the higher
> level classes of IndexReader and IndexWriter would be more appropriate.(as
> in article [5]) So by bypassing the Directory I would have to make use of
> Hbase Coprocessors. As far as I can think of it, a RegionObserver would be
> employed to gather frequently performed on data for the Lucene queries and
> Endpoints.
>
>
>
> [1] https://github.com/akkumar/hbasene
> [2] https://github.com/thkoch2001/lucehbase
> [3] https://github.com/jasonrutherglen/HBASE-SEARCH
> [4] https://github.com/jasonrutherglen/LUCENE-FOR-HBASE
> [5] http://www.infoq.com/articles/LuceneHbase
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>



-- 
Ioan Eugen Stan / http://axemblr.com / Tools for Clouds

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: GSoC: Avro Serialization over HBase

Reply via email to