Re: Hadoop-HBase table hierarchical column scan

Kiru Pakkirisamy Fri, 09 Aug 2013 22:14:29 -0700

Lars,
We could be having anywhere between 1000-40000 columns in there.
I do have setup the bloomfilter on this column family as 'rowcol'. There are no 
writes at all in this app. 
Maybe, the bloomfilter algorithm is giving too many false positives (my columns 
are strings like T_123, T_34567).
Or the bloomfilter lookup on 50-60 prefixes is costlier than us creating a map 
of all the columns and looking up against another map.
(maybe this is what is happening and I should not be using this filtering)
For now,  I am going to make tables with composite keys. 
Will profile later and debug this.
 
Regards,
- kiru

Kiru Pakkirisamy | webcloudtech.wordpress.com

________________________________
 From: lars hofhansl <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Friday, August 9, 2013 9:38 PM
Subject: Re: Hadoop-HBase table hierarchical column scan

It all depends on how many other columns you have, whether the skip-scanning 
the filter does is beneficial or not.
It should not worsen the performance, though. If it does we should do some 
profiling and find out why.

-- Lars

________________________________
From: Kiru Pakkirisamy <[email protected]>
To: "[email protected]" <[email protected]>; lars hofhansl 
<[email protected]>; Kiru Pakkirisamy <[email protected]> 
Sent: Friday, August 9, 2013 8:40 PM
Subject: Re: Hadoop-HBase table hierarchical column scan

I can confirm even after trying 0.94.10 that MultipleColumnPrefixFilter only 
worsens the performance.

Regards,
- kiru

Kiru Pakkirisamy | webcloudtech.wordpress.com

________________________________
From: Kiru Pakkirisamy <[email protected]>
To: "[email protected]" <[email protected]>; lars hofhansl 
<[email protected]> 
Sent: Friday, August 9, 2013 1:02 PM
Subject: Re: Hadoop-HBase table hierarchical column scan

The Prefix filters did not work for me. Actually, performance went down. But I 
am going to try with fix for HBASE-6870 (suggested by Ted) deployed to our 
Performance cluster.

Regards,
- kiru

Kiru Pakkirisamy | webcloudtech.wordpress.com

________________________________
From: lars hofhansl <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Friday, August 9, 2013 12:55 PM
Subject: Re: Hadoop-HBase table hierarchical column scan

Take a look at ColumnRangeFilter, (probably better in your case) 
ColumnPrefixFilter, or MultipleColumnPrefixFilter.
Especially the latter two let you efficiently filter on prefixes of columns.

Note that if typically scan a subset of the columns, placing these prefixes 
into the row key will be more efficient, as the scanner can then avoid a full 
scan.

-- Lars
________________________________
From: Narlin M <[email protected]>
To: [email protected] 
Sent: Friday, August 9, 2013 12:44 PM
Subject: Hadoop-HBase table hierarchical column scan

I am fairly new to the hadoop-hbase environment having started working on
it very recently, so I hope I am wording the question correctly.

I am trying to read data from a hadoop-hbase table which has only one
column family named 'DFLT'. This family contains hierarchical column
qualifiers "/source:int64/name:string". I want to read the name column for
a particular source value, say 10. How can I achieve this using the Scan
class?

I tried setting up the scan object as follows:

...

byte[] family = Bytes.toBytes("DFLT");
byte[] qualifier = Bytes.toBytes("source:name");

Scan scan = new Scan();
scan.addColumn(family, qualifier);

FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL);

SingleColumnValueFilter filter = new SingleColumnValueFilter(family,
Bytes.toBytes("source"), CompareFilter.CompareOp.EQUAL,Bytes.toBytes(10));

list.addFilter(filter);

scan.setFilter(list);

...

But I do not get any data back with this setup. I am guessing that I am not
setting up the hierarchical qualifiers correctly. Any and all pointers will
be appreciated.

Thanks, Narlin M.

Re: Hadoop-HBase table hierarchical column scan

Reply via email to