[ https://issues.apache.org/jira/browse/HBASE-25350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ClownfishYang updated HBASE-25350: ---------------------------------- Description: Hbase version:1.1.2 We used Hbase as a real-time database and then used hive external tables for our queries, but found that there was a problem with the data query for one table. {code:java} // sql, result id in (12045075, 12045076,...) SELECT id FROM t1 LIMIT 10 // not result SELECT id FROM t1 WHERE id = '12045075' LIMIT 10 // create table CREATE EXTERNAL TABLE `t1`( `__key` string COMMENT '', `id` string COMMENT '主键ID') COMMENT '' ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping'=':key,f:id', 'serialization.format'='1') TBLPROPERTIES ( 'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', 'hbase.table.name'='t1', 'numFiles'='0', 'numRows'='0', 'rawDataSize'='0', 'totalSize'='0', 'transient_lastDdlTime'='1606804842'){code} During this period, I added space or like, but the cause of the problem could not be verified. I began to suspect that it was hbase. {code:java} // hbase table desc describe 't1' Table t1 is ENABLED t1, {TABLE_ATTRIBUTES => {METADATA => {'COMPACTION_ENABLED' => 'true'}} COLUMN FAMILIES DESCRIPTION {NAME => 'f', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => '6 04800 SECONDS (7 DAYS)', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} // scan row scan 't1', {COLUMNS => 'f:id', LIMIT => 10} // result ROW COLUMN+CELL 12044083 column=f:id, timestamp=1606293182000, value=12044083 12044084 column=f:id, timestamp=1606293183000, value=12044084 12044085 column=f:id, timestamp=1606293185000, value=12044085 12044086 column=f:id, timestamp=1606293190000, value=12044086 12044087 column=f:id, timestamp=1606293192000, value=12044087 12044088 column=f:id, timestamp=1606293197000, value=12044088 12044089 column=f:id, timestamp=1606293198000, value=12044089 12044090 column=f:id, timestamp=1606293204000, value=12044090 12044091 column=f:id, timestamp=1606293207000, value=12044091 12044092 column=f:id, timestamp=1606293208000, value=12044092 // get row, not result get 't1', "12044083" , {COLUMNS => 'f:id'}{code} First of all, only row and ID queries will have this problem, and other column queries are normal.Now I think we have reason to suspect that there are invisible escape characters or something in the data, but how do I know? The worst part is that I've used the Java API to make the call, and the returned data doesn't find any invisible escape characters on the row or ID. was: Hbase version:1.1.2 We used Hbase as a real-time database and then used hive external tables for our queries, but found that there was a problem with the data query for one table. {code:java} // sql, result id in (12045075, 12045076,...) SELECT id FROM t1 LIMIT 10 // not result SELECT id FROM t1 WHERE id = '12045075' LIMIT 10 // create table CREATE EXTERNAL TABLE `t1`( `__key` string COMMENT '', `id` string COMMENT '主键ID') COMMENT '' ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping'=':key,f:id', 'serialization.format'='1') TBLPROPERTIES ( 'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', 'hbase.table.name'='t1', 'numFiles'='0', 'numRows'='0', 'rawDataSize'='0', 'totalSize'='0', 'transient_lastDdlTime'='1606804842'){code} During this period, I added space or like, but the cause of the problem could not be verified. I began to suspect that it was hbase. {code:java} // scan row scan 't1', {COLUMNS => 'f:id', LIMIT => 10} // result ROW COLUMN+CELL 12044083 column=f:id, timestamp=1606293182000, value=12044083 12044084 column=f:id, timestamp=1606293183000, value=12044084 12044085 column=f:id, timestamp=1606293185000, value=12044085 12044086 column=f:id, timestamp=1606293190000, value=12044086 12044087 column=f:id, timestamp=1606293192000, value=12044087 12044088 column=f:id, timestamp=1606293197000, value=12044088 12044089 column=f:id, timestamp=1606293198000, value=12044089 12044090 column=f:id, timestamp=1606293204000, value=12044090 12044091 column=f:id, timestamp=1606293207000, value=12044091 12044092 column=f:id, timestamp=1606293208000, value=12044092 // get row, not result get 't1', "12044083" , {COLUMNS => 'f:id'}{code} First of all, only row and ID queries will have this problem, and other column queries are normal.Now I think we have reason to suspect that there are invisible escape characters or something in the data, but how do I know? The worst part is that I've used the Java API to make the call, and the returned data doesn't find any invisible escape characters on the row or ID. > The scan command gives a row, but get does not have this row. > ------------------------------------------------------------- > > Key: HBASE-25350 > URL: https://issues.apache.org/jira/browse/HBASE-25350 > Project: HBase > Issue Type: Bug > Affects Versions: 1.2.11 > Reporter: ClownfishYang > Priority: Major > > Hbase version:1.1.2 > > We used Hbase as a real-time database and then used hive external tables for > our queries, but found that there was a problem with the data query for one > table. > > {code:java} > // sql, result id in (12045075, 12045076,...) > SELECT id FROM t1 LIMIT 10 > // not result > SELECT id FROM t1 WHERE id = '12045075' LIMIT 10 > // create table > CREATE EXTERNAL TABLE `t1`( > `__key` string COMMENT '', > `id` string COMMENT '主键ID') > COMMENT '' > ROW FORMAT SERDE > 'org.apache.hadoop.hive.hbase.HBaseSerDe' > STORED BY > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ( > 'hbase.columns.mapping'=':key,f:id', > 'serialization.format'='1') > TBLPROPERTIES ( > 'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', > 'hbase.table.name'='t1', > 'numFiles'='0', > 'numRows'='0', > 'rawDataSize'='0', > 'totalSize'='0', > 'transient_lastDdlTime'='1606804842'){code} > During this period, I added space or like, but the cause of the problem could > not be verified. I began to suspect that it was hbase. > {code:java} > // hbase table desc > describe 't1' > Table t1 is ENABLED > t1, {TABLE_ATTRIBUTES => {METADATA => {'COMPACTION_ENABLED' => 'true'}} > COLUMN FAMILIES DESCRIPTION {NAME => 'f', BLOOMFILTER => 'ROW', VERSIONS => > '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING > => 'NONE', TTL => '6 04800 SECONDS (7 DAYS)', COMPRESSION => 'SNAPPY', > MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', > REPLICATION_SCOPE => '0'} > // scan row > scan 't1', {COLUMNS => 'f:id', LIMIT => 10} > // result > ROW COLUMN+CELL > > 12044083 column=f:id, timestamp=1606293182000, > value=12044083 > 12044084 column=f:id, timestamp=1606293183000, > value=12044084 > 12044085 column=f:id, timestamp=1606293185000, > value=12044085 > 12044086 column=f:id, timestamp=1606293190000, > value=12044086 > 12044087 column=f:id, timestamp=1606293192000, > value=12044087 > 12044088 column=f:id, timestamp=1606293197000, > value=12044088 > 12044089 column=f:id, timestamp=1606293198000, > value=12044089 > 12044090 column=f:id, timestamp=1606293204000, > value=12044090 > 12044091 column=f:id, timestamp=1606293207000, > value=12044091 > 12044092 column=f:id, timestamp=1606293208000, > value=12044092 > // get row, not result > get 't1', "12044083" , {COLUMNS => 'f:id'}{code} > First of all, only row and ID queries will have this problem, and other > column queries are normal.Now I think we have reason to suspect that there > are invisible escape characters or something in the data, but how do I know? > The worst part is that I've used the Java API to make the call, and the > returned data doesn't find any invisible escape characters on the row or ID. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)