[jira] [Commented] (HIVE-3179) HBase Handler doesn't handle NULLs properly
[ https://issues.apache.org/jira/browse/HIVE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631832#comment-13631832 ] Hudson commented on HIVE-3179: -- Integrated in Hive-trunk-hadoop2 #160 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/160/]) HIVE-3179 HBase Handler doesn't handle NULLs properly (Lars Francke via Navis) (Revision 1467874) Result = FAILURE navis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467874 Files : * /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java * /hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java HBase Handler doesn't handle NULLs properly --- Key: HIVE-3179 URL: https://issues.apache.org/jira/browse/HIVE-3179 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0, 0.10.0 Reporter: Lars Francke Priority: Critical Fix For: 0.12.0 Attachments: HIVE-3179.1.patch We found a quite severe issue in the HBase Handler which actually means that Hive potentially returns incorrect data if a column has NULL values in HBase (which means the cell doesn't even exist) In HBase Shell: {noformat} create 'hive_hbase_test', 'test' put 'hive_hbase_test', '1', 'test:c1', 'c1-1' put 'hive_hbase_test', '1', 'test:c2', 'c2-1' put 'hive_hbase_test', '1', 'test:c3', 'c3-1' put 'hive_hbase_test', '2', 'test:c1', 'c1-2' {noformat} In Hive: {noformat} DROP TABLE IF EXISTS hive_hbase_test; CREATE EXTERNAL TABLE hive_hbase_test ( id int, c1 string, c2 string, c3 string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key#s,test:c1#s,test:c2#s,test:c3#s) TBLPROPERTIES(hbase.table.name = hive_hbase_test); hive select * from hive_hbase_test; OK 1 c1-1c2-1c3-1 2 c1-2NULLNULL hive select c1 from hive_hbase_test; c1-1 c1-2 hive select c1, c2 from hive_hbase_test; c1-1 c2-1 c1-2 NULL {noformat} So far everything is correct but now: {noformat} hive select c1, c2, c2 from hive_hbase_test; c1-1 c2-1c2-1 c1-2 NULLc2-1 {noformat} Selecting c2 twice works the first time but the second time we actually get the value from the previous row. {noformat} hive select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test; c1-1 c3-1c2-1c2-1c3-1c3-1c1-1 c1-2 NULLNULLc2-1c3-1c3-1c1-2 {noformat} We've narrowed this down to an early initialization of {{fieldsInited\[fieldID] = true}} in {{LazyHBaseRow#uncheckedGetField}} and we'll try to provide a patch which surely needs review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3179) HBase Handler doesn't handle NULLs properly
[ https://issues.apache.org/jira/browse/HIVE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632134#comment-13632134 ] Hudson commented on HIVE-3179: -- Integrated in Hive-trunk-h0.21 #2065 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2065/]) HIVE-3179 HBase Handler doesn't handle NULLs properly (Lars Francke via Navis) (Revision 1467874) Result = FAILURE navis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467874 Files : * /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java * /hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java HBase Handler doesn't handle NULLs properly --- Key: HIVE-3179 URL: https://issues.apache.org/jira/browse/HIVE-3179 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0, 0.10.0 Reporter: Lars Francke Priority: Critical Fix For: 0.12.0 Attachments: HIVE-3179.1.patch We found a quite severe issue in the HBase Handler which actually means that Hive potentially returns incorrect data if a column has NULL values in HBase (which means the cell doesn't even exist) In HBase Shell: {noformat} create 'hive_hbase_test', 'test' put 'hive_hbase_test', '1', 'test:c1', 'c1-1' put 'hive_hbase_test', '1', 'test:c2', 'c2-1' put 'hive_hbase_test', '1', 'test:c3', 'c3-1' put 'hive_hbase_test', '2', 'test:c1', 'c1-2' {noformat} In Hive: {noformat} DROP TABLE IF EXISTS hive_hbase_test; CREATE EXTERNAL TABLE hive_hbase_test ( id int, c1 string, c2 string, c3 string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key#s,test:c1#s,test:c2#s,test:c3#s) TBLPROPERTIES(hbase.table.name = hive_hbase_test); hive select * from hive_hbase_test; OK 1 c1-1c2-1c3-1 2 c1-2NULLNULL hive select c1 from hive_hbase_test; c1-1 c1-2 hive select c1, c2 from hive_hbase_test; c1-1 c2-1 c1-2 NULL {noformat} So far everything is correct but now: {noformat} hive select c1, c2, c2 from hive_hbase_test; c1-1 c2-1c2-1 c1-2 NULLc2-1 {noformat} Selecting c2 twice works the first time but the second time we actually get the value from the previous row. {noformat} hive select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test; c1-1 c3-1c2-1c2-1c3-1c3-1c1-1 c1-2 NULLNULLc2-1c3-1c3-1c1-2 {noformat} We've narrowed this down to an early initialization of {{fieldsInited\[fieldID] = true}} in {{LazyHBaseRow#uncheckedGetField}} and we'll try to provide a patch which surely needs review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3179) HBase Handler doesn't handle NULLs properly
[ https://issues.apache.org/jira/browse/HIVE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575232#comment-13575232 ] Brock Noland commented on HIVE-3179: Mark, How did the tests turn out? HBase Handler doesn't handle NULLs properly --- Key: HIVE-3179 URL: https://issues.apache.org/jira/browse/HIVE-3179 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0, 0.10.0 Reporter: Lars Francke Priority: Critical Attachments: HIVE-3179.1.patch We found a quite severe issue in the HBase Handler which actually means that Hive potentially returns incorrect data if a column has NULL values in HBase (which means the cell doesn't even exist) In HBase Shell: {noformat} create 'hive_hbase_test', 'test' put 'hive_hbase_test', '1', 'test:c1', 'c1-1' put 'hive_hbase_test', '1', 'test:c2', 'c2-1' put 'hive_hbase_test', '1', 'test:c3', 'c3-1' put 'hive_hbase_test', '2', 'test:c1', 'c1-2' {noformat} In Hive: {noformat} DROP TABLE IF EXISTS hive_hbase_test; CREATE EXTERNAL TABLE hive_hbase_test ( id int, c1 string, c2 string, c3 string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key#s,test:c1#s,test:c2#s,test:c3#s) TBLPROPERTIES(hbase.table.name = hive_hbase_test); hive select * from hive_hbase_test; OK 1 c1-1c2-1c3-1 2 c1-2NULLNULL hive select c1 from hive_hbase_test; c1-1 c1-2 hive select c1, c2 from hive_hbase_test; c1-1 c2-1 c1-2 NULL {noformat} So far everything is correct but now: {noformat} hive select c1, c2, c2 from hive_hbase_test; c1-1 c2-1c2-1 c1-2 NULLc2-1 {noformat} Selecting c2 twice works the first time but the second time we actually get the value from the previous row. {noformat} hive select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test; c1-1 c3-1c2-1c2-1c3-1c3-1c1-1 c1-2 NULLNULLc2-1c3-1c3-1c1-2 {noformat} We've narrowed this down to an early initialization of {{fieldsInited\[fieldID] = true}} in {{LazyHBaseRow#uncheckedGetField}} and we'll try to provide a patch which surely needs review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3179) HBase Handler doesn't handle NULLs properly
[ https://issues.apache.org/jira/browse/HIVE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575246#comment-13575246 ] Mark Grover commented on HIVE-3179: --- They timed out on my pseudo-distributed laptop but that most likely is an environment issue local to me. But looks like Lars mentioned that he had run the tests, so that should be ok. I will try to fix the environment, but don't wait on me. HBase Handler doesn't handle NULLs properly --- Key: HIVE-3179 URL: https://issues.apache.org/jira/browse/HIVE-3179 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0, 0.10.0 Reporter: Lars Francke Priority: Critical Attachments: HIVE-3179.1.patch We found a quite severe issue in the HBase Handler which actually means that Hive potentially returns incorrect data if a column has NULL values in HBase (which means the cell doesn't even exist) In HBase Shell: {noformat} create 'hive_hbase_test', 'test' put 'hive_hbase_test', '1', 'test:c1', 'c1-1' put 'hive_hbase_test', '1', 'test:c2', 'c2-1' put 'hive_hbase_test', '1', 'test:c3', 'c3-1' put 'hive_hbase_test', '2', 'test:c1', 'c1-2' {noformat} In Hive: {noformat} DROP TABLE IF EXISTS hive_hbase_test; CREATE EXTERNAL TABLE hive_hbase_test ( id int, c1 string, c2 string, c3 string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key#s,test:c1#s,test:c2#s,test:c3#s) TBLPROPERTIES(hbase.table.name = hive_hbase_test); hive select * from hive_hbase_test; OK 1 c1-1c2-1c3-1 2 c1-2NULLNULL hive select c1 from hive_hbase_test; c1-1 c1-2 hive select c1, c2 from hive_hbase_test; c1-1 c2-1 c1-2 NULL {noformat} So far everything is correct but now: {noformat} hive select c1, c2, c2 from hive_hbase_test; c1-1 c2-1c2-1 c1-2 NULLc2-1 {noformat} Selecting c2 twice works the first time but the second time we actually get the value from the previous row. {noformat} hive select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test; c1-1 c3-1c2-1c2-1c3-1c3-1c1-1 c1-2 NULLNULLc2-1c3-1c3-1c1-2 {noformat} We've narrowed this down to an early initialization of {{fieldsInited\[fieldID] = true}} in {{LazyHBaseRow#uncheckedGetField}} and we'll try to provide a patch which surely needs review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3179) HBase Handler doesn't handle NULLs properly
[ https://issues.apache.org/jira/browse/HIVE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13574553#comment-13574553 ] Mark Grover commented on HIVE-3179: --- Running TestHBaseCliDriver tests... HBase Handler doesn't handle NULLs properly --- Key: HIVE-3179 URL: https://issues.apache.org/jira/browse/HIVE-3179 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0, 0.10.0 Reporter: Lars Francke Priority: Critical Attachments: HIVE-3179.1.patch We found a quite severe issue in the HBase Handler which actually means that Hive potentially returns incorrect data if a column has NULL values in HBase (which means the cell doesn't even exist) In HBase Shell: {noformat} create 'hive_hbase_test', 'test' put 'hive_hbase_test', '1', 'test:c1', 'c1-1' put 'hive_hbase_test', '1', 'test:c2', 'c2-1' put 'hive_hbase_test', '1', 'test:c3', 'c3-1' put 'hive_hbase_test', '2', 'test:c1', 'c1-2' {noformat} In Hive: {noformat} DROP TABLE IF EXISTS hive_hbase_test; CREATE EXTERNAL TABLE hive_hbase_test ( id int, c1 string, c2 string, c3 string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key#s,test:c1#s,test:c2#s,test:c3#s) TBLPROPERTIES(hbase.table.name = hive_hbase_test); hive select * from hive_hbase_test; OK 1 c1-1c2-1c3-1 2 c1-2NULLNULL hive select c1 from hive_hbase_test; c1-1 c1-2 hive select c1, c2 from hive_hbase_test; c1-1 c2-1 c1-2 NULL {noformat} So far everything is correct but now: {noformat} hive select c1, c2, c2 from hive_hbase_test; c1-1 c2-1c2-1 c1-2 NULLc2-1 {noformat} Selecting c2 twice works the first time but the second time we actually get the value from the previous row. {noformat} hive select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test; c1-1 c3-1c2-1c2-1c3-1c3-1c1-1 c1-2 NULLNULLc2-1c3-1c3-1c1-2 {noformat} We've narrowed this down to an early initialization of {{fieldsInited\[fieldID] = true}} in {{LazyHBaseRow#uncheckedGetField}} and we'll try to provide a patch which surely needs review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3179) HBase Handler doesn't handle NULLs properly
[ https://issues.apache.org/jira/browse/HIVE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573987#comment-13573987 ] Brock Noland commented on HIVE-3179: I have verified this is an issue with trunk, the patch applies, and the patch addresses the issue. {noformat} hive select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201302071609_0002, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201302071609_0002 Kill Command = /opt/local/hadoop-1.1.1/libexec/../bin/hadoop job -kill job_201302071609_0002 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2013-02-07 16:10:31,826 Stage-1 map = 0%, reduce = 0% 2013-02-07 16:10:34,846 Stage-1 map = 100%, reduce = 0% 2013-02-07 16:10:36,861 Stage-1 map = 100%, reduce = 100% Ended Job = job_201302071609_0002 MapReduce Jobs Launched: Job 0: Map: 1 HDFS Read: 260 HDFS Write: 60 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK c1-1c3-1c2-1c2-1c3-1c3-1c1-1 c1-2NULLNULLNULLNULLNULLc1-2 Time taken: 10.702 seconds, Fetched: 2 row(s) hive {noformat} HBase Handler doesn't handle NULLs properly --- Key: HIVE-3179 URL: https://issues.apache.org/jira/browse/HIVE-3179 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0 Reporter: Lars Francke Priority: Critical Attachments: HIVE-3179.1.patch We found a quite severe issue in the HBase Handler which actually means that Hive potentially returns incorrect data if a column has NULL values in HBase (which means the cell doesn't even exist) In HBase Shell: {noformat} create 'hive_hbase_test', 'test' put 'hive_hbase_test', '1', 'test:c1', 'c1-1' put 'hive_hbase_test', '1', 'test:c2', 'c2-1' put 'hive_hbase_test', '1', 'test:c3', 'c3-1' put 'hive_hbase_test', '2', 'test:c1', 'c1-2' {noformat} In Hive: {noformat} DROP TABLE IF EXISTS hive_hbase_test; CREATE EXTERNAL TABLE hive_hbase_test ( id int, c1 string, c2 string, c3 string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key#s,test:c1#s,test:c2#s,test:c3#s) TBLPROPERTIES(hbase.table.name = hive_hbase_test); hive select * from hive_hbase_test; OK 1 c1-1c2-1c3-1 2 c1-2NULLNULL hive select c1 from hive_hbase_test; c1-1 c1-2 hive select c1, c2 from hive_hbase_test; c1-1 c2-1 c1-2 NULL {noformat} So far everything is correct but now: {noformat} hive select c1, c2, c2 from hive_hbase_test; c1-1 c2-1c2-1 c1-2 NULLc2-1 {noformat} Selecting c2 twice works the first time but the second time we actually get the value from the previous row. {noformat} hive select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test; c1-1 c3-1c2-1c2-1c3-1c3-1c1-1 c1-2 NULLNULLc2-1c3-1c3-1c1-2 {noformat} We've narrowed this down to an early initialization of {{fieldsInited\[fieldID] = true}} in {{LazyHBaseRow#uncheckedGetField}} and we'll try to provide a patch which surely needs review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3179) HBase Handler doesn't handle NULLs properly
[ https://issues.apache.org/jira/browse/HIVE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573991#comment-13573991 ] Shreepadma Venugopalan commented on HIVE-3179: -- +1. HBase Handler doesn't handle NULLs properly --- Key: HIVE-3179 URL: https://issues.apache.org/jira/browse/HIVE-3179 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0, 0.10.0 Reporter: Lars Francke Priority: Critical Attachments: HIVE-3179.1.patch We found a quite severe issue in the HBase Handler which actually means that Hive potentially returns incorrect data if a column has NULL values in HBase (which means the cell doesn't even exist) In HBase Shell: {noformat} create 'hive_hbase_test', 'test' put 'hive_hbase_test', '1', 'test:c1', 'c1-1' put 'hive_hbase_test', '1', 'test:c2', 'c2-1' put 'hive_hbase_test', '1', 'test:c3', 'c3-1' put 'hive_hbase_test', '2', 'test:c1', 'c1-2' {noformat} In Hive: {noformat} DROP TABLE IF EXISTS hive_hbase_test; CREATE EXTERNAL TABLE hive_hbase_test ( id int, c1 string, c2 string, c3 string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key#s,test:c1#s,test:c2#s,test:c3#s) TBLPROPERTIES(hbase.table.name = hive_hbase_test); hive select * from hive_hbase_test; OK 1 c1-1c2-1c3-1 2 c1-2NULLNULL hive select c1 from hive_hbase_test; c1-1 c1-2 hive select c1, c2 from hive_hbase_test; c1-1 c2-1 c1-2 NULL {noformat} So far everything is correct but now: {noformat} hive select c1, c2, c2 from hive_hbase_test; c1-1 c2-1c2-1 c1-2 NULLc2-1 {noformat} Selecting c2 twice works the first time but the second time we actually get the value from the previous row. {noformat} hive select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test; c1-1 c3-1c2-1c2-1c3-1c3-1c1-1 c1-2 NULLNULLc2-1c3-1c3-1c1-2 {noformat} We've narrowed this down to an early initialization of {{fieldsInited\[fieldID] = true}} in {{LazyHBaseRow#uncheckedGetField}} and we'll try to provide a patch which surely needs review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3179) HBase Handler doesn't handle NULLs properly
[ https://issues.apache.org/jira/browse/HIVE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572282#comment-13572282 ] Lars Francke commented on HIVE-3179: As far as I can tell this is still an issue. Would anyone mind doing a review on this one? HBase Handler doesn't handle NULLs properly --- Key: HIVE-3179 URL: https://issues.apache.org/jira/browse/HIVE-3179 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0 Reporter: Lars Francke Priority: Critical Attachments: HIVE-3179.1.patch We found a quite severe issue in the HBase Handler which actually means that Hive potentially returns incorrect data if a column has NULL values in HBase (which means the cell doesn't even exist) In HBase Shell: {noformat} create 'hive_hbase_test', 'test' put 'hive_hbase_test', '1', 'test:c1', 'c1-1' put 'hive_hbase_test', '1', 'test:c2', 'c2-1' put 'hive_hbase_test', '1', 'test:c3', 'c3-1' put 'hive_hbase_test', '2', 'test:c1', 'c1-2' {noformat} In Hive: {noformat} DROP TABLE IF EXISTS hive_hbase_test; CREATE EXTERNAL TABLE hive_hbase_test ( id int, c1 string, c2 string, c3 string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key#s,test:c1#s,test:c2#s,test:c3#s) TBLPROPERTIES(hbase.table.name = hive_hbase_test); hive select * from hive_hbase_test; OK 1 c1-1c2-1c3-1 2 c1-2NULLNULL hive select c1 from hive_hbase_test; c1-1 c1-2 hive select c1, c2 from hive_hbase_test; c1-1 c2-1 c1-2 NULL {noformat} So far everything is correct but now: {noformat} hive select c1, c2, c2 from hive_hbase_test; c1-1 c2-1c2-1 c1-2 NULLc2-1 {noformat} Selecting c2 twice works the first time but the second time we actually get the value from the previous row. {noformat} hive select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test; c1-1 c3-1c2-1c2-1c3-1c3-1c1-1 c1-2 NULLNULLc2-1c3-1c3-1c1-2 {noformat} We've narrowed this down to an early initialization of {{fieldsInited\[fieldID] = true}} in {{LazyHBaseRow#uncheckedGetField}} and we'll try to provide a patch which surely needs review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3179) HBase Handler doesn't handle NULLs properly
[ https://issues.apache.org/jira/browse/HIVE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400269#comment-13400269 ] Carl Steinbach commented on HIVE-3179: -- @Lars: Please post a review request on reviews.apache.org. Thanks. HBase Handler doesn't handle NULLs properly --- Key: HIVE-3179 URL: https://issues.apache.org/jira/browse/HIVE-3179 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0 Reporter: Lars Francke Priority: Critical Attachments: HIVE-3179.1.patch We found a quite severe issue in the HBase Handler which actually means that Hive potentially returns incorrect data if a column has NULL values in HBase (which means the cell doesn't even exist) In HBase Shell: {noformat} create 'hive_hbase_test', 'test' put 'hive_hbase_test', '1', 'test:c1', 'c1-1' put 'hive_hbase_test', '1', 'test:c2', 'c2-1' put 'hive_hbase_test', '1', 'test:c3', 'c3-1' put 'hive_hbase_test', '2', 'test:c1', 'c1-2' {noformat} In Hive: {noformat} DROP TABLE IF EXISTS hive_hbase_test; CREATE EXTERNAL TABLE hive_hbase_test ( id int, c1 string, c2 string, c3 string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key#s,test:c1#s,test:c2#s,test:c3#s) TBLPROPERTIES(hbase.table.name = hive_hbase_test); hive select * from hive_hbase_test; OK 1 c1-1c2-1c3-1 2 c1-2NULLNULL hive select c1 from hive_hbase_test; c1-1 c1-2 hive select c1, c2 from hive_hbase_test; c1-1 c2-1 c1-2 NULL {noformat} So far everything is correct but now: {noformat} hive select c1, c2, c2 from hive_hbase_test; c1-1 c2-1c2-1 c1-2 NULLc2-1 {noformat} Selecting c2 twice works the first time but the second time we actually get the value from the previous row. {noformat} hive select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test; c1-1 c3-1c2-1c2-1c3-1c3-1c1-1 c1-2 NULLNULLc2-1c3-1c3-1c1-2 {noformat} We've narrowed this down to an early initialization of {{fieldsInited\[fieldID] = true}} in {{LazyHBaseRow#uncheckedGetField}} and we'll try to provide a patch which surely needs review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3179) HBase Handler doesn't handle NULLs properly
[ https://issues.apache.org/jira/browse/HIVE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400289#comment-13400289 ] Lars Francke commented on HIVE-3179: @Carl: Sure: https://reviews.apache.org/r/5542/ thanks for the reminder. HBase Handler doesn't handle NULLs properly --- Key: HIVE-3179 URL: https://issues.apache.org/jira/browse/HIVE-3179 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0 Reporter: Lars Francke Priority: Critical Attachments: HIVE-3179.1.patch We found a quite severe issue in the HBase Handler which actually means that Hive potentially returns incorrect data if a column has NULL values in HBase (which means the cell doesn't even exist) In HBase Shell: {noformat} create 'hive_hbase_test', 'test' put 'hive_hbase_test', '1', 'test:c1', 'c1-1' put 'hive_hbase_test', '1', 'test:c2', 'c2-1' put 'hive_hbase_test', '1', 'test:c3', 'c3-1' put 'hive_hbase_test', '2', 'test:c1', 'c1-2' {noformat} In Hive: {noformat} DROP TABLE IF EXISTS hive_hbase_test; CREATE EXTERNAL TABLE hive_hbase_test ( id int, c1 string, c2 string, c3 string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key#s,test:c1#s,test:c2#s,test:c3#s) TBLPROPERTIES(hbase.table.name = hive_hbase_test); hive select * from hive_hbase_test; OK 1 c1-1c2-1c3-1 2 c1-2NULLNULL hive select c1 from hive_hbase_test; c1-1 c1-2 hive select c1, c2 from hive_hbase_test; c1-1 c2-1 c1-2 NULL {noformat} So far everything is correct but now: {noformat} hive select c1, c2, c2 from hive_hbase_test; c1-1 c2-1c2-1 c1-2 NULLc2-1 {noformat} Selecting c2 twice works the first time but the second time we actually get the value from the previous row. {noformat} hive select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test; c1-1 c3-1c2-1c2-1c3-1c3-1c1-1 c1-2 NULLNULLc2-1c3-1c3-1c1-2 {noformat} We've narrowed this down to an early initialization of {{fieldsInited\[fieldID] = true}} in {{LazyHBaseRow#uncheckedGetField}} and we'll try to provide a patch which surely needs review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3179) HBase Handler doesn't handle NULLs properly
[ https://issues.apache.org/jira/browse/HIVE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13399322#comment-13399322 ] Lars Francke commented on HIVE-3179: We could add a second boolean array to go with {{fieldsInited}} that's called {{fieldsNull}} that caches those fields. Not sure if that's needed though. Thanks to my colleague Oliver Meyn who actually looked at the code and found the fix, I only packaged it up and added the unit test. HBase Handler doesn't handle NULLs properly --- Key: HIVE-3179 URL: https://issues.apache.org/jira/browse/HIVE-3179 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.9.0 Reporter: Lars Francke Priority: Critical Attachments: HIVE-3179.1.patch We found a quite severe issue in the HBase Handler which actually means that Hive potentially returns incorrect data if a column has NULL values in HBase (which means the cell doesn't even exist) In HBase Shell: {noformat} create 'hive_hbase_test', 'test' put 'hive_hbase_test', '1', 'test:c1', 'c1-1' put 'hive_hbase_test', '1', 'test:c2', 'c2-1' put 'hive_hbase_test', '1', 'test:c3', 'c3-1' put 'hive_hbase_test', '2', 'test:c1', 'c1-2' {noformat} In Hive: {noformat} DROP TABLE IF EXISTS hive_hbase_test; CREATE EXTERNAL TABLE hive_hbase_test ( id int, c1 string, c2 string, c3 string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key#s,test:c1#s,test:c2#s,test:c3#s) TBLPROPERTIES(hbase.table.name = hive_hbase_test); hive select * from hive_hbase_test; OK 1 c1-1c2-1c3-1 2 c1-2NULLNULL hive select c1 from hive_hbase_test; c1-1 c1-2 hive select c1, c2 from hive_hbase_test; c1-1 c2-1 c1-2 NULL {noformat} So far everything is correct but now: {noformat} hive select c1, c2, c2 from hive_hbase_test; c1-1 c2-1c2-1 c1-2 NULLc2-1 {noformat} Selecting c2 twice works the first time but the second time we actually get the value from the previous row. {noformat} hive select c1, c3, c2, c2, c3, c3, c1 from hive_hbase_test; c1-1 c3-1c2-1c2-1c3-1c3-1c1-1 c1-2 NULLNULLc2-1c3-1c3-1c1-2 {noformat} We've narrowed this down to an early initialization of {{fieldsInited\[fieldID] = true}} in {{LazyHBaseRow#uncheckedGetField}} and we'll try to provide a patch which surely needs review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira