2012/6/12 Eric Charles <[email protected]>: > True. > What do you intend to store in Avro format (these bytes being retrieved by > any means on the RPC side)? > Thx, Eric >
Well, if we stay true to the article: the info about Terms and Fields [1] (see the end). I'm hoping Mihai can get a version up and running by mid term and see what we can improve after this (co-processors, etc). It should be a general enough implementation to be used outside. We won't get close to elastic search/plain lucene performance results, dough. We'll see. [1] http://www.infoq.com/articles/LuceneHbase > On 06/12/2012 02:14 PM, Ioan Eugen Stan wrote: >> >> Hi, >> >> From what I know Avro deprecation is for RPC communication. The >> Put/Delete/ etc operations are serialized with Avro instead of the >> usual Writables. Regardless of what serialization the RPC sub-system >> uses, the data stored by the operations (Put/Get/Delete) is viewed as >> byte array. If we store Avro objects as binary blobs in HBase then we >> have no issues. >> >> Cheers, >> >> 2012/6/12 Mihai Soloi<[email protected]>: >>> >>> On 12.06.2012 11:30, Eric Charles wrote: >>>> >>>> >>>> Hi Mihai, >>>> >>>> Glad to hear your exams are over (I hope they went fine) :) >>> >>> >>> Hi Eric, >>> >>> Thanks, they went very well, I got high marks. >>> >>>> >>>> As Ioan said, Avro serialization HBase will be deprecated in favor of >>>> Protobuf (if I understand well...). >>> >>> >>> >>> I think Avro could be changed rather easily with Protobuf as they're both >>> doing basically the same thing, only that Avro uses JSON schemas and can >>> be >>> used with any other language, which is of no of value to the project. >>> >>>> >>>> I also like Avro because it gives you serialization& storage format in >>>> >>>> one box, but is this what we want? The key point here is more an >>>> effective >>>> access to the persisted data. >>> >>> >>> >>> If the data is passed through Avro we'll have it serialized and >>> deserialization is basically handled by Avro, but we'll always have to >>> interact with the schemas. In Protobuf we have the objects compiled into >>> our >>> classes, from what i gather it's mostly usefull for RPC, but Avro also >>> has >>> the protocol in which by using the avro-maven-plugin you can generate you >>> own classes with which to interact. I can't say I'm an expert in either >>> but >>> I fancy Avro. >>> >>>> >>>> >>>> There has been a few tentatives so far to marry HBase and Lucene (see >>>> [1], >>>> [2], [3] and [4] for example, see also [5] for a more recent article). >>>> >>> Thank you for the github links, i will look thouroughly through the >>> projects. I was already aware of Basene and Solandra(former Lucandra), >>> they >>> have simillar aproaches. >>> >>>> The questions I am wondering: >>>> >>>> 1. Will you focus on a 'generic' solution (reusable outside James), or >>>> on >>>> a very specific one tuned/optimized only for James mailbox needs? >>> >>> >>> I was thinking of writing generic code so that maybe it could be used >>> outside of James but the data format would be specific to James mailbox >>> needs, so the answer in the end is that it will be tuned for James. >>> >>>> >>>> 2. What strategy will you take (custom Directory or custom >>>> IndexReader/Writer, usage of Coprocessor or not...)? >>> >>> >>> I was thinking that a custom Directory was the way to go, but I soon >>> realized that it's not as simple as it sounds and overriding the higher >>> level classes of IndexReader and IndexWriter would be more >>> appropriate.(as >>> in article [5]) So by bypassing the Directory I would have to make use of >>> Hbase Coprocessors. As far as I can think of it, a RegionObserver would >>> be >>> employed to gather frequently performed on data for the Lucene queries >>> and >>> Endpoints. >>> >>> >>> >>> [1] https://github.com/akkumar/hbasene >>> [2] https://github.com/thkoch2001/lucehbase >>> [3] https://github.com/jasonrutherglen/HBASE-SEARCH >>> [4] https://github.com/jasonrutherglen/LUCENE-FOR-HBASE >>> [5] http://www.infoq.com/articles/LuceneHbase >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >> >> >> > > -- > eric | http://about.echarles.net | @echarles > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > -- Ioan Eugen Stan / http://axemblr.com / Tools for Clouds --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
