The segments are completely random. The segments can have from no overlap to exact duplicates.
Anand On 27 August 2013 19:49, Ted Yu <[email protected]> wrote: > bq. Will hbase do some sort of deduplication? > > I don't think so. > > What is the granularity of segment overlap ? In the above example, it seems > to be 0.5 > > Cheers > > > On Tue, Aug 27, 2013 at 7:12 AM, Anand Nalya <[email protected]> > wrote: > > > Hi, > > > > I have a use case in which I need to store segments of mp3 files in > hbase. > > A song may come to the application in different ovelapping segments. For > > example, a 5 min song can have the following segments 0-1,0.5-2,2-4,3-5. > As > > seen, some of the data is duplicate (3-4 is present in the last 2 > > segments). > > > > What would be the ideal way of removing this duplicate storage? Will > snappy > > compression help here or do I need to write some logic over HBase? Also, > > what if I store a single segment multiple times. Will hbase do some sort > of > > deduplication? > > > > Regards, > > Anand > > >
