[jira] [Updated] (HIVE-11097) HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases
[ https://issues.apache.org/jira/browse/HIVE-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wan Chang updated HIVE-11097: - Attachment: HIVE-11097.5.patch > HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases > - > > Key: HIVE-11097 > URL: https://issues.apache.org/jira/browse/HIVE-11097 > Project: Hive > Issue Type: Bug > Components: File Formats >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0 > Environment: Hive 0.13.1, Hive 2.0.0, hadoop 2.4.1 >Reporter: Wan Chang >Assignee: Wan Chang >Priority: Critical > Attachments: HIVE-11097.1.patch, HIVE-11097.2.patch, > HIVE-11097.3.patch, HIVE-11097.4.patch, HIVE-11097.5.patch > > > Say we have a sql as > {code} > create table if not exists test_orc_src (a int, b int, c int) stored as orc; > create table if not exists test_orc_src2 (a int, b int, d int) stored as orc; > insert overwrite table test_orc_src select 1,2,3 from src limit 1; > insert overwrite table test_orc_src2 select 1,2,4 from src limit 1; > set hive.auto.convert.join = false; > set hive.execution.engine=mr; > select > tb.c > from test.test_orc_src tb > join (select * from test.test_orc_src2) tm > on tb.a = tm.a > where tb.b = 2 > {code} > The correct result is 3 but it produced no result. > I find that in HiveInputFormat.pushProjectionsAndFilters > {code} > match = splitPath.startsWith(key) || splitPathWithNoSchema.startsWith(key); > {code} > It uses startsWith to combine aliases with path, so tm will match two alias > in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11097) HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases
[ https://issues.apache.org/jira/browse/HIVE-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110204#comment-15110204 ] Wan Chang commented on HIVE-11097: -- [~prasanth_j] symlink_text_input_format.q test failure relates to the patch. I have fixed it and add some comments to specify the scene. > HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases > - > > Key: HIVE-11097 > URL: https://issues.apache.org/jira/browse/HIVE-11097 > Project: Hive > Issue Type: Bug > Components: File Formats >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0 > Environment: Hive 0.13.1, Hive 2.0.0, hadoop 2.4.1 >Reporter: Wan Chang >Assignee: Wan Chang >Priority: Critical > Attachments: HIVE-11097.1.patch, HIVE-11097.2.patch, > HIVE-11097.3.patch, HIVE-11097.4.patch > > > Say we have a sql as > {code} > create table if not exists test_orc_src (a int, b int, c int) stored as orc; > create table if not exists test_orc_src2 (a int, b int, d int) stored as orc; > insert overwrite table test_orc_src select 1,2,3 from src limit 1; > insert overwrite table test_orc_src2 select 1,2,4 from src limit 1; > set hive.auto.convert.join = false; > set hive.execution.engine=mr; > select > tb.c > from test.test_orc_src tb > join (select * from test.test_orc_src2) tm > on tb.a = tm.a > where tb.b = 2 > {code} > The correct result is 3 but it produced no result. > I find that in HiveInputFormat.pushProjectionsAndFilters > {code} > match = splitPath.startsWith(key) || splitPathWithNoSchema.startsWith(key); > {code} > It uses startsWith to combine aliases with path, so tm will match two alias > in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11097) HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases
[ https://issues.apache.org/jira/browse/HIVE-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wan Chang updated HIVE-11097: - Attachment: HIVE-11097.4.patch Update patch > HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases > - > > Key: HIVE-11097 > URL: https://issues.apache.org/jira/browse/HIVE-11097 > Project: Hive > Issue Type: Bug > Components: File Formats >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0 > Environment: Hive 0.13.1, Hive 2.0.0, hadoop 2.4.1 >Reporter: Wan Chang >Assignee: Wan Chang >Priority: Critical > Attachments: HIVE-11097.1.patch, HIVE-11097.2.patch, > HIVE-11097.3.patch, HIVE-11097.4.patch > > > Say we have a sql as > {code} > create table if not exists test_orc_src (a int, b int, c int) stored as orc; > create table if not exists test_orc_src2 (a int, b int, d int) stored as orc; > insert overwrite table test_orc_src select 1,2,3 from src limit 1; > insert overwrite table test_orc_src2 select 1,2,4 from src limit 1; > set hive.auto.convert.join = false; > set hive.execution.engine=mr; > select > tb.c > from test.test_orc_src tb > join (select * from test.test_orc_src2) tm > on tb.a = tm.a > where tb.b = 2 > {code} > The correct result is 3 but it produced no result. > I find that in HiveInputFormat.pushProjectionsAndFilters > {code} > match = splitPath.startsWith(key) || splitPathWithNoSchema.startsWith(key); > {code} > It uses startsWith to combine aliases with path, so tm will match two alias > in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11097) HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases
[ https://issues.apache.org/jira/browse/HIVE-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106468#comment-15106468 ] Wan Chang commented on HIVE-11097: -- Hi [~prasanth_j], I use hive0.13.1 and the bug occurs with some complex sql. But I didn't reproduce the case on the master branch. I don't know whether it has been fix yet. > HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases > - > > Key: HIVE-11097 > URL: https://issues.apache.org/jira/browse/HIVE-11097 > Project: Hive > Issue Type: Bug > Components: File Formats >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0 > Environment: Hive 0.13.1, Hive 2.0.0, hadoop 2.4.1 >Reporter: Wan Chang >Assignee: Wan Chang >Priority: Critical > Attachments: HIVE-11097.1.patch, HIVE-11097.2.patch, > HIVE-11097.3.patch > > > Say we have a sql as > {code} > create table if not exists test_orc_src (a int, b int, c int) stored as orc; > create table if not exists test_orc_src2 (a int, b int, d int) stored as orc; > insert overwrite table test_orc_src select 1,2,3 from src limit 1; > insert overwrite table test_orc_src2 select 1,2,4 from src limit 1; > set hive.auto.convert.join = false; > set hive.execution.engine=mr; > select > tb.c > from test.test_orc_src tb > join (select * from test.test_orc_src2) tm > on tb.a = tm.a > where tb.b = 2 > {code} > The correct result is 3 but it produced no result. > I find that in HiveInputFormat.pushProjectionsAndFilters > {code} > match = splitPath.startsWith(key) || splitPathWithNoSchema.startsWith(key); > {code} > It uses startsWith to combine aliases with path, so tm will match two alias > in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11097) HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases
[ https://issues.apache.org/jira/browse/HIVE-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wan Chang updated HIVE-11097: - Attachment: HIVE-11097.3.patch > HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases > - > > Key: HIVE-11097 > URL: https://issues.apache.org/jira/browse/HIVE-11097 > Project: Hive > Issue Type: Bug > Components: File Formats >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0 > Environment: Hive 0.13.1, Hive 2.0.0, hadoop 2.4.1 >Reporter: Wan Chang >Assignee: Wan Chang >Priority: Critical > Attachments: HIVE-11097.1.patch, HIVE-11097.2.patch, > HIVE-11097.3.patch > > > Say we have a sql as > {code} > create table if not exists test_orc_src (a int, b int, c int) stored as orc; > create table if not exists test_orc_src2 (a int, b int, d int) stored as orc; > insert overwrite table test_orc_src select 1,2,3 from src limit 1; > insert overwrite table test_orc_src2 select 1,2,4 from src limit 1; > set hive.auto.convert.join = false; > set hive.execution.engine=mr; > select > tb.c > from test.test_orc_src tb > join (select * from test.test_orc_src2) tm > on tb.a = tm.a > where tb.b = 2 > {code} > The correct result is 3 but it produced no result. > I find that in HiveInputFormat.pushProjectionsAndFilters > {code} > match = splitPath.startsWith(key) || splitPathWithNoSchema.startsWith(key); > {code} > It uses startsWith to combine aliases with path, so tm will match two alias > in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11097) HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases
[ https://issues.apache.org/jira/browse/HIVE-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wan Chang updated HIVE-11097: - Attachment: HIVE-11097.2.patch Update patch > HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases > - > > Key: HIVE-11097 > URL: https://issues.apache.org/jira/browse/HIVE-11097 > Project: Hive > Issue Type: Bug > Components: File Formats >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0 > Environment: Hive 0.13.1, Hive 2.0.0, hadoop 2.4.1 >Reporter: Wan Chang >Assignee: Wan Chang >Priority: Critical > Attachments: HIVE-11097.1.patch, HIVE-11097.2.patch > > > Say we have a sql as > {code} > create table if not exists test_orc_src (a int, b int, c int) stored as orc; > create table if not exists test_orc_src2 (a int, b int, d int) stored as orc; > insert overwrite table test_orc_src select 1,2,3 from src limit 1; > insert overwrite table test_orc_src2 select 1,2,4 from src limit 1; > set hive.auto.convert.join = false; > set hive.execution.engine=mr; > select > tb.c > from test.test_orc_src tb > join (select * from test.test_orc_src2) tm > on tb.a = tm.a > where tb.b = 2 > {code} > The correct result is 3 but it produced no result. > I find that in HiveInputFormat.pushProjectionsAndFilters > {code} > match = splitPath.startsWith(key) || splitPathWithNoSchema.startsWith(key); > {code} > It uses startsWith to combine aliases with path, so tm will match two alias > in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11097) HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases
[ https://issues.apache.org/jira/browse/HIVE-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15104953#comment-15104953 ] Wan Chang commented on HIVE-11097: -- [~prasanth_j] Thanks for your information. I will update the patch soon. > HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases > - > > Key: HIVE-11097 > URL: https://issues.apache.org/jira/browse/HIVE-11097 > Project: Hive > Issue Type: Bug > Components: File Formats >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0 > Environment: Hive 0.13.1, Hive 2.0.0, hadoop 2.4.1 >Reporter: Wan Chang >Assignee: Wan Chang >Priority: Critical > Attachments: HIVE-11097.1.patch > > > Say we have a sql as > {code} > create table if not exists test_orc_src (a int, b int, c int) stored as orc; > create table if not exists test_orc_src2 (a int, b int, d int) stored as orc; > insert overwrite table test_orc_src select 1,2,3 from src limit 1; > insert overwrite table test_orc_src2 select 1,2,4 from src limit 1; > set hive.auto.convert.join = false; > set hive.execution.engine=mr; > select > tb.c > from test.test_orc_src tb > join (select * from test.test_orc_src2) tm > on tb.a = tm.a > where tb.b = 2 > {code} > The correct result is 3 but it produced no result. > I find that in HiveInputFormat.pushProjectionsAndFilters > {code} > match = splitPath.startsWith(key) || splitPathWithNoSchema.startsWith(key); > {code} > It uses startsWith to combine aliases with path, so tm will match two alias > in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11097) HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases
[ https://issues.apache.org/jira/browse/HIVE-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613563#comment-14613563 ] Wan Chang commented on HIVE-11097: -- Hi [~ashutoshc], would you help to review this patch please? > HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases > - > > Key: HIVE-11097 > URL: https://issues.apache.org/jira/browse/HIVE-11097 > Project: Hive > Issue Type: Bug > Components: File Formats >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0 > Environment: Hive 0.13.1, Hive 2.0.0, hadoop 2.4.1 >Reporter: Wan Chang >Assignee: Wan Chang >Priority: Critical > Attachments: HIVE-11097.1.patch > > > Say we have a sql as > {code} > create table if not exists test_orc_src (a int, b int, c int) stored as orc; > create table if not exists test_orc_src2 (a int, b int, d int) stored as orc; > insert overwrite table test_orc_src select 1,2,3 from src limit 1; > insert overwrite table test_orc_src2 select 1,2,4 from src limit 1; > set hive.auto.convert.join = false; > set hive.execution.engine=mr; > select > tb.c > from test.test_orc_src tb > join (select * from test.test_orc_src2) tm > on tb.a = tm.a > where tb.b = 2 > {code} > The correct result is 3 but it produced no result. > I find that in HiveInputFormat.pushProjectionsAndFilters > {code} > match = splitPath.startsWith(key) || splitPathWithNoSchema.startsWith(key); > {code} > It uses startsWith to combine aliases with path, so tm will match two alias > in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11097) HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases
[ https://issues.apache.org/jira/browse/HIVE-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600719#comment-14600719 ] Wan Chang commented on HIVE-11097: -- Hi [~jvs], would you help to review this? > HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases > - > > Key: HIVE-11097 > URL: https://issues.apache.org/jira/browse/HIVE-11097 > Project: Hive > Issue Type: Bug > Components: File Formats >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0 > Environment: Hive 0.13.1, Hive 2.0.0, hadoop 2.4.1 >Reporter: Wan Chang >Priority: Critical > Attachments: HIVE-11097.1.patch > > > Say we have a sql as > {code} > create table if not exists test_orc_src (a int, b int, c int) stored as orc; > create table if not exists test_orc_src2 (a int, b int, d int) stored as orc; > insert overwrite table test_orc_src select 1,2,3 from src limit 1; > insert overwrite table test_orc_src2 select 1,2,4 from src limit 1; > set hive.auto.convert.join = false; > set hive.execution.engine=mr; > select > tb.c > from test.test_orc_src tb > join (select * from test.test_orc_src2) tm > on tb.a = tm.a > where tb.b = 2 > {code} > The correct result is 3 but it produced no result. > I find that in HiveInputFormat.pushProjectionsAndFilters > {code} > match = splitPath.startsWith(key) || splitPathWithNoSchema.startsWith(key); > {code} > It uses startsWith to combine aliases with path, so tm will match two alias > in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-1903) Can't join HBase tables if one's name is the beginning of the other
[ https://issues.apache.org/jira/browse/HIVE-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wan Chang updated HIVE-1903: Attachment: (was: HIVE-11097.1.patch) > Can't join HBase tables if one's name is the beginning of the other > --- > > Key: HIVE-1903 > URL: https://issues.apache.org/jira/browse/HIVE-1903 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Jean-Daniel Cryans >Assignee: John Sichi > Fix For: 0.7.0 > > Attachments: HIVE-1903.1.patch > > > I tried joining two tables, let's call them "table" and "table_a", but I'm > seeing an array of errors such as this: > {noformat} > java.lang.IndexOutOfBoundsException: Index: 3, Size: 3 > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(ArrayList.java:322) > at > org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getRecordReader(HiveHBaseTableInputFormat.java:118) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:231) > {noformat} > The reason is that HiveInputFormat.pushProjectionsAndFilters matches the > aliases with startsWith so in my case the mappers for "table_a" were getting > the columns from "table" as well as its own (and since it had less column, it > was trying to get one too far in the array). > I don't know if just changing it to "equals" fill fix it, my guess is it > won't, since it may break RCFiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-1903) Can't join HBase tables if one's name is the beginning of the other
[ https://issues.apache.org/jira/browse/HIVE-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wan Chang updated HIVE-1903: Attachment: HIVE-11097.1.patch > Can't join HBase tables if one's name is the beginning of the other > --- > > Key: HIVE-1903 > URL: https://issues.apache.org/jira/browse/HIVE-1903 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Jean-Daniel Cryans >Assignee: John Sichi > Fix For: 0.7.0 > > Attachments: HIVE-11097.1.patch, HIVE-1903.1.patch > > > I tried joining two tables, let's call them "table" and "table_a", but I'm > seeing an array of errors such as this: > {noformat} > java.lang.IndexOutOfBoundsException: Index: 3, Size: 3 > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(ArrayList.java:322) > at > org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getRecordReader(HiveHBaseTableInputFormat.java:118) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:231) > {noformat} > The reason is that HiveInputFormat.pushProjectionsAndFilters matches the > aliases with startsWith so in my case the mappers for "table_a" were getting > the columns from "table" as well as its own (and since it had less column, it > was trying to get one too far in the array). > I don't know if just changing it to "equals" fill fix it, my guess is it > won't, since it may break RCFiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11097) HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases
[ https://issues.apache.org/jira/browse/HIVE-11097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wan Chang updated HIVE-11097: - Attachment: HIVE-11097.1.patch Attach patch file > HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases > - > > Key: HIVE-11097 > URL: https://issues.apache.org/jira/browse/HIVE-11097 > Project: Hive > Issue Type: Bug > Components: File Formats >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0 > Environment: Hive 0.13.1, Hive 2.0.0, hadoop 2.4.1 >Reporter: Wan Chang >Priority: Critical > Attachments: HIVE-11097.1.patch > > > Say we have a sql as > {code} > create table if not exists test_orc_src (a int, b int, c int) stored as orc; > create table if not exists test_orc_src2 (a int, b int, d int) stored as orc; > insert overwrite table test_orc_src select 1,2,3 from src limit 1; > insert overwrite table test_orc_src2 select 1,2,4 from src limit 1; > set hive.auto.convert.join = false; > set hive.execution.engine=mr; > select > tb.c > from test.test_orc_src tb > join (select * from test.test_orc_src2) tm > on tb.a = tm.a > where tb.b = 2 > {code} > The correct result is 3 but it produced no result. > I find that in HiveInputFormat.pushProjectionsAndFilters > {code} > match = splitPath.startsWith(key) || splitPathWithNoSchema.startsWith(key); > {code} > It uses startsWith to combine aliases with path, so tm will match two alias > in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)