[jira] [Commented] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type
[ https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070540#comment-15070540 ] Aaron Tokhy commented on HIVE-12502: Sure, will do. > to_date UDF cannot accept NULLs of VOID type > > > Key: HIVE-12502 > URL: https://issues.apache.org/jira/browse/HIVE-12502 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.0.0 >Reporter: Aaron Tokhy >Assignee: Aaron Tokhy >Priority: Trivial > Attachments: HIVE-12502-branch-1.patch, HIVE-12502.1.patch, > HIVE-12502.patch > > > The to_date method behaves differently based off the 'data type' of null > passed in. > hive> select to_date(null); > FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': > TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID > hive> select to_date(cast(null as timestamp)); > OK > NULL > Time taken: 0.031 seconds, Fetched: 1 row(s) > This appears to be a regression introduced in HIVE-5731. The previous > version of to_date would not check the type: > https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type
[ https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-12502: --- Attachment: HIVE-12502.1.patch > to_date UDF cannot accept NULLs of VOID type > > > Key: HIVE-12502 > URL: https://issues.apache.org/jira/browse/HIVE-12502 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.0.0 >Reporter: Aaron Tokhy >Assignee: Jason Dere >Priority: Trivial > Attachments: HIVE-12502.1.patch > > > The to_date method behaves differently based off the 'data type' of null > passed in. > hive> select to_date(null); > FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': > TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID > hive> select to_date(cast(null as timestamp)); > OK > NULL > Time taken: 0.031 seconds, Fetched: 1 row(s) > This appears to be a regression introduced in HIVE-5731. The previous > version of to_date would not check the type: > https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type
[ https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052191#comment-15052191 ] Aaron Tokhy commented on HIVE-12502: I've submitted a patch with an associated unit test. > to_date UDF cannot accept NULLs of VOID type > > > Key: HIVE-12502 > URL: https://issues.apache.org/jira/browse/HIVE-12502 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.0.0 >Reporter: Aaron Tokhy >Assignee: Jason Dere >Priority: Trivial > Attachments: HIVE-12502.1.patch > > > The to_date method behaves differently based off the 'data type' of null > passed in. > hive> select to_date(null); > FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': > TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID > hive> select to_date(cast(null as timestamp)); > OK > NULL > Time taken: 0.031 seconds, Fetched: 1 row(s) > This appears to be a regression introduced in HIVE-5731. The previous > version of to_date would not check the type: > https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type
[ https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-12502: --- Attachment: HIVE-12502-branch-1.patch HIVE-12502.patch > to_date UDF cannot accept NULLs of VOID type > > > Key: HIVE-12502 > URL: https://issues.apache.org/jira/browse/HIVE-12502 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.0.0 >Reporter: Aaron Tokhy >Assignee: Jason Dere >Priority: Trivial > Attachments: HIVE-12502-branch-1.patch, HIVE-12502.patch > > > The to_date method behaves differently based off the 'data type' of null > passed in. > hive> select to_date(null); > FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': > TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID > hive> select to_date(cast(null as timestamp)); > OK > NULL > Time taken: 0.031 seconds, Fetched: 1 row(s) > This appears to be a regression introduced in HIVE-5731. The previous > version of to_date would not check the type: > https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type
[ https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-12502: --- Attachment: (was: HIVE-12502.1.patch) > to_date UDF cannot accept NULLs of VOID type > > > Key: HIVE-12502 > URL: https://issues.apache.org/jira/browse/HIVE-12502 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.0.0 >Reporter: Aaron Tokhy >Assignee: Jason Dere >Priority: Trivial > > The to_date method behaves differently based off the 'data type' of null > passed in. > hive> select to_date(null); > FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': > TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID > hive> select to_date(cast(null as timestamp)); > OK > NULL > Time taken: 0.031 seconds, Fetched: 1 row(s) > This appears to be a regression introduced in HIVE-5731. The previous > version of to_date would not check the type: > https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12474) ORDER BY should handle column refs in parantheses
[ https://issues.apache.org/jira/browse/HIVE-12474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023310#comment-15023310 ] Aaron Tokhy commented on HIVE-12474: Should 'cluster by'/'sort by'/'distribute by'/'partition by' allow for the use of parenthesis if 'order by' does not? > ORDER BY should handle column refs in parantheses > - > > Key: HIVE-12474 > URL: https://issues.apache.org/jira/browse/HIVE-12474 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 1.0.0, 1.2.1 >Reporter: Aaron Tokhy >Assignee: Xuefu Zhang >Priority: Minor > > CREATE TABLE test(a INT, b INT, c INT) > COMMENT 'This is a test table'; > hive> > select lead(c) over (order by (a,b)) from test limit 10; > FAILED: ParseException line 1:31 missing ) at ',' near ')' > line 1:34 missing EOF at ')' near ')' > hive> > select lead(c) over (order by a,b) from test limit 10; > - Works as expected. > It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows > this: > https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129 > For example, this syntax is still valid: > select lead(c) over (sort by (a,b)) from test limit 10; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type
[ https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-12502: --- External issue ID: (was: HIVE-5731) > to_date UDF cannot accept NULLs of VOID type > > > Key: HIVE-12502 > URL: https://issues.apache.org/jira/browse/HIVE-12502 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.0.0 >Reporter: Aaron Tokhy >Assignee: Jason Dere >Priority: Trivial > > The to_date method behaves differently based off the 'data type' of null > passed in. > hive> select to_date(null); > FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': > TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID > hive> select to_date(cast(null as timestamp)); > OK > NULL > Time taken: 0.031 seconds, Fetched: 1 row(s) > This appears to be a regression introduced in HIVE-5731. The previous > version of to_date would not check the type: > https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type
[ https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023543#comment-15023543 ] Aaron Tokhy commented on HIVE-12502: Regression > to_date UDF cannot accept NULLs of VOID type > > > Key: HIVE-12502 > URL: https://issues.apache.org/jira/browse/HIVE-12502 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.0.0 >Reporter: Aaron Tokhy >Assignee: Jason Dere >Priority: Trivial > > The to_date method behaves differently based off the 'data type' of null > passed in. > hive> select to_date(null); > FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': > TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID > hive> select to_date(cast(null as timestamp)); > OK > NULL > Time taken: 0.031 seconds, Fetched: 1 row(s) > This appears to be a regression introduced in HIVE-5731. The previous > version of to_date would not check the type: > https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type
[ https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-12502: --- Affects Version/s: (was: 0.13.1) 1.0.0 > to_date UDF cannot accept NULLs of VOID type > > > Key: HIVE-12502 > URL: https://issues.apache.org/jira/browse/HIVE-12502 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.0.0 >Reporter: Aaron Tokhy >Assignee: Jason Dere >Priority: Trivial > > The to_date method behaves differently based off the 'data type' of null > passed in. > hive> select to_date(null); > FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': > TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID > hive> select to_date(cast(null as timestamp)); > OK > NULL > Time taken: 0.031 seconds, Fetched: 1 row(s) > This appears to be a regression introduced in HIVE-5731. The previous > version of to_date would not check the type: > https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type
[ https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-12502: --- Affects Version/s: (was: 1.0.0) 0.13.1 > to_date UDF cannot accept NULLs of VOID type > > > Key: HIVE-12502 > URL: https://issues.apache.org/jira/browse/HIVE-12502 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 0.13.1 >Reporter: Aaron Tokhy >Assignee: Jason Dere >Priority: Trivial > > The to_date method behaves differently based off the 'data type' of null > passed in. > hive> select to_date(null); > FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': > TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID > hive> select to_date(cast(null as timestamp)); > OK > NULL > Time taken: 0.031 seconds, Fetched: 1 row(s) > This appears to be a regression introduced in HIVE-5731. The previous > version of to_date would not check the type: > https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12474) ORDER BY should handle column refs in parantheses
[ https://issues.apache.org/jira/browse/HIVE-12474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-12474: --- Description: CREATE TABLE test(a INT, b INT, c INT) COMMENT 'This is a test table'; hive> select lead(c) over (order by (a,b)) from test limit 10; FAILED: ParseException line 1:31 missing ) at ',' near ')' line 1:34 missing EOF at ')' near ')' hive> select lead(c) over (order by a,b) from test limit 10; - Works as expected. It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows this: https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129 For example, this syntax is still valid: select lead(c) over (sort by (a,b)) from test limit 10; was: CREATE TABLE test(a INT, b INT, c INT) COMMENT 'This is a test table'; hive> select lead(c) over (order by (a,b)) from test limit 10; FAILED: ParseException line 1:31 missing ) at ',' near ')' line 1:34 missing EOF at ')' near ')' hive> select lead(c) over (order by a,b) from test limit 10; - Works as expected. It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows this: https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129 For example, this syntax is still valid: select lead(c) over (sort by (a,b)) from test limit 10; This is related to changes that were made as a part of HIVE-6617 > ORDER BY should handle column refs in parantheses > - > > Key: HIVE-12474 > URL: https://issues.apache.org/jira/browse/HIVE-12474 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 1.0.0, 1.2.1 >Reporter: Aaron Tokhy >Assignee: Pengcheng Xiong >Priority: Minor > > CREATE TABLE test(a INT, b INT, c INT) > COMMENT 'This is a test table'; > hive> > select lead(c) over (order by (a,b)) from test limit 10; > FAILED: ParseException line 1:31 missing ) at ',' near ')' > line 1:34 missing EOF at ')' near ')' > hive> > select lead(c) over (order by a,b) from test limit 10; > - Works as expected. > It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows > this: > https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129 > For example, this syntax is still valid: > select lead(c) over (sort by (a,b)) from test limit 10; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12474) ORDER BY should handle column refs in parantheses
[ https://issues.apache.org/jira/browse/HIVE-12474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-12474: --- Description: CREATE TABLE test(a INT, b INT, c INT) COMMENT 'This is a test table'; hive> select lead(c) over (order by (a,b)) from test limit 10; FAILED: ParseException line 1:31 missing ) at ',' near ')' line 1:34 missing EOF at ')' near ')' hive> select lead(c) over (order by a,b) from test limit 10; - Works as expected. It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows this: https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129 For example, this syntax is still valid: select lead(c) over (sort by (a,b)) from test limit 10; This is related to changes that were made as a part of HIVE-6617 was: CREATE TABLE test(a INT, b INT, c INT) COMMENT 'This is a test table'; hive> select lead(c) over (order by (a,b)) from test limit 10; FAILED: ParseException line 1:31 missing ) at ',' near ')' line 1:34 missing EOF at ')' near ')' hive> select lead(c) over (order by a,b) from test limit 10; - Works as expected. It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows this: https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129 For example, this syntax is still valid: select lead(c) over (sort by (a,b)) from test limit 10; > ORDER BY should handle column refs in parantheses > - > > Key: HIVE-12474 > URL: https://issues.apache.org/jira/browse/HIVE-12474 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 1.0.0, 1.2.1 >Reporter: Aaron Tokhy >Assignee: Pengcheng Xiong >Priority: Minor > > CREATE TABLE test(a INT, b INT, c INT) > COMMENT 'This is a test table'; > hive> > select lead(c) over (order by (a,b)) from test limit 10; > FAILED: ParseException line 1:31 missing ) at ',' near ')' > line 1:34 missing EOF at ')' near ')' > hive> > select lead(c) over (order by a,b) from test limit 10; > - Works as expected. > It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows > this: > https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129 > For example, this syntax is still valid: > select lead(c) over (sort by (a,b)) from test limit 10; > This is related to changes that were made as a part of HIVE-6617 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707706#comment-14707706 ] Aaron Tokhy commented on HIVE-10631: Code review posted: https://reviews.apache.org/r/37484/ create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Assignee: Aaron Tokhy Priority: Minor Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697589#comment-14697589 ] Aaron Tokhy commented on HIVE-10631: Sorry, fixed it just now. create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Assignee: Aaron Tokhy Priority: Minor Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697371#comment-14697371 ] Aaron Tokhy commented on HIVE-10631: Sure, here it is: https://reviews.apache.org/r/37484/ create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Assignee: Aaron Tokhy Priority: Minor Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695641#comment-14695641 ] Aaron Tokhy commented on HIVE-10631: Uploading a patch that cleanly applies to TRUNK as well as branch-1.0 create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Assignee: Aaron Tokhy Priority: Minor Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-10631: --- Attachment: HIVE-10631.patch create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Assignee: Aaron Tokhy Priority: Minor Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-10631: --- Attachment: HIVE-10631-branch-1.0.patch create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Assignee: Aaron Tokhy Priority: Minor Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693903#comment-14693903 ] Aaron Tokhy commented on HIVE-10631: Running precommit tests. create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Assignee: Aaron Tokhy Priority: Minor Attachments: HIVE-10631.patch HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-10631: --- Attachment: (was: HIVE-10631.patch.1) create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Assignee: Aaron Tokhy Priority: Minor Attachments: HIVE-10631.patch HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692730#comment-14692730 ] Aaron Tokhy commented on HIVE-10631: Reuploaded as HIVE-10631.patch create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Assignee: Aaron Tokhy Priority: Minor Attachments: HIVE-10631.patch HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-10631: --- Attachment: HIVE-10631.patch create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Assignee: Aaron Tokhy Priority: Minor Attachments: HIVE-10631.patch HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy reassigned HIVE-10631: -- Assignee: Aaron Tokhy create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Assignee: Aaron Tokhy Priority: Minor Attachments: HIVE-10631.patch.1 HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681038#comment-14681038 ] Aaron Tokhy commented on HIVE-10631: Reading more about hive.stats.reliable, it did not appear to be appropriate to use it in this case, and to instead it would be better to defer stats calculation for partitioned tables when partitions are being added to a table (MSCK/ALTER TABLE), and not on table creation (CREATE [EXTERNAL] TABLE) create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.0, 1.0.0 Reporter: Dongwook Kwon Priority: Minor HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-10631: --- Attachment: HIVE-10631.patch create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Priority: Minor Attachments: HIVE-10631.patch HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-10631: --- Attachment: (was: HIVE-10631.patch) create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Priority: Minor Attachments: HIVE-10631.patch.1 HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-10631: --- Attachment: HIVE-10631.patch.1 create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 1.0.0 Reporter: Dongwook Kwon Priority: Minor Attachments: HIVE-10631.patch.1 HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats
[ https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647156#comment-14647156 ] Aaron Tokhy commented on HIVE-10631: hive.stats.reliable requires that both numFiles and totalSize be set properly, regardless of the condition. So if 'create table' or 'create external table' were to use a location already populated with partitions, it will traverse those partitions regardless. As of writing, hive.stats.reliable appears to be set to false by default. Perhaps stats calculation on creation of partitioned tables can be forgone when hive.stats.reliable is false only, as stats will be populated on MSCK REPAIR PARTITIONS or by adding partitions using ALTER TABLE ADD PARTITION. create_table_core method has invalid update for Fast Stats -- Key: HIVE-10631 URL: https://issues.apache.org/jira/browse/HIVE-10631 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.0, 1.0.0 Reporter: Dongwook Kwon Priority: Minor HiveMetaStore.create_table_core method calls MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather is on, however for partitioned table, this updateUnpartitionedTableStatsFast call scanning warehouse dir and doesn't seem to use it. Fast Stats was implemented by HIVE-3959 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363 From create_table_core method {code} if (HiveConf.getBoolVar(hiveConf, HiveConf.ConfVars.HIVESTATSAUTOGATHER) !MetaStoreUtils.isView(tbl)) { if (tbl.getPartitionKeysSize() == 0) { // Unpartitioned table MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, madeDir); } else { // Partitioned table with no partitions. MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); } } {code} Particularly Line 1363: // Partitioned table with no partitions. {code} MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true); {code} This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to newDir flag is always true Impact of this bug is minor with HDFS warehouse location(hive.metastore.warehouse.dir), it could be big with S3 warehouse location especially for large existing partitions. Also the impact is heighten with HIVE-6727 when warehouse location is S3, basically it could scan wrong S3 directory recursively and do nothing with it. I will add more detail of cases in comments -- This message was sent by Atlassian JIRA (v6.3.4#6332)