It looks like the hbase shell (beginning with 0.96) parses column
names as FAMILY:QUALIFIER[:FORMATTER] due to work from HBASE-6592.
As a result, the shell basically doesn't support specifying any
columns (for gets/puts/scans/etc) that include a colon in the
qualifier. I filed HBASE-13788.
For your case, I suspect the data was properly imported, but when you
tried to scan for "x:twitter:username" it instead scanned for
"x:twitter" and found nothing.
Dave
P.S. Here's some related help text from the shell.
Besides the default 'toStringBinary' format, 'get' also supports
custom formatting by
column. A user can define a FORMATTER by adding it to the column name
in the get
specification. The FORMATTER can be stipulated:
1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g,
toInt, toString)
2. or as a custom class followed by method name: e.g.
'c(MyFormatterClass).format'.
Example formatting cf:qualifier1 and cf:qualifier2 both as Integers:
hbase> get 't1', 'r1' {COLUMN => ['cf:qualifier1:toInt',
'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] }
Note that you can specify a FORMATTER by column only (cf:qualifer).
You cannot specify
a FORMATTER for all columns of a column family.
On Wed, May 27, 2015 at 10:23 AM, <[email protected]> wrote:
> On Wed, May 27, 2015, at 11:35 AM, Dave Latham wrote:
>> Sounds like quite a puzzle.
>>
>> You mentioned that you can read data written through manual Puts from
>> the shell - but not data from the Import. There must be something
>> different about the data itself once it's in the table. Can you
>> compare a row that was imported to a row that was manually written -
>> or show them to us?
>
> Hmph, I may have spoken too soon. I know I tested this at one point and
> it worked, but now I'm getting different results:
>
> On the new cluster, I created a duplicate test table:
> hbase(main):043:0> create 'content3', {NAME => 'x', BLOOMFILTER =>
> 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION =>
> 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536',
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}
>
> Then I pull some data from the imported table:
> hbase(main):045:0> scan 'content', {LIMIT=>1,
> STARTROW=>'A:9223370612089311807:twtr:57013379'}
> ROW COLUMN+CELL
> ....
> A:9223370612089311807:twtr:570133798827921408
> column=x:twitter:username, timestamp=1424775595345, value=BERITA &
> INFORMASI!
>
> Then put it:
> hbase(main):046:0> put
> 'content3','A:9223370612089311807:twtr:570133798827921408',
> 'x:twitter:username', 'BERITA & INFORMASI!'
>
> But then when I query it, I see that I've lost the column qualifier
> ":username":
> hbase(main):046:0> scan 'content3'
> ROW COLUMN+CELL
> A:9223370612089311807:twtr:570133798827921408 column=x:twitter,
> timestamp=1432745301788, value=BERITA & INFORMASI!
>
> Even though I'm missing one of the qualifiers, I can at least filter on
> columns in this sample table.
>
> So now I'm even more baffled :(
>
> Z
>