[jira] [Commented] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type

2015-12-23 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070540#comment-15070540
 ] 

Aaron Tokhy commented on HIVE-12502:


Sure, will do.

> to_date UDF cannot accept NULLs of VOID type
> 
>
> Key: HIVE-12502
> URL: https://issues.apache.org/jira/browse/HIVE-12502
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.0.0
>Reporter: Aaron Tokhy
>Assignee: Aaron Tokhy
>Priority: Trivial
> Attachments: HIVE-12502-branch-1.patch, HIVE-12502.1.patch, 
> HIVE-12502.patch
>
>
> The to_date method behaves differently based off the 'data type' of null 
> passed in.
> hive> select to_date(null);   
> FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': 
> TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID
> hive> select to_date(cast(null as timestamp));
> OK
> NULL
> Time taken: 0.031 seconds, Fetched: 1 row(s)
> This appears to be a regression introduced in HIVE-5731.  The previous 
> version of to_date would not check the type:
> https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type

2015-12-10 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-12502:
---
Attachment: HIVE-12502.1.patch

> to_date UDF cannot accept NULLs of VOID type
> 
>
> Key: HIVE-12502
> URL: https://issues.apache.org/jira/browse/HIVE-12502
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.0.0
>Reporter: Aaron Tokhy
>Assignee: Jason Dere
>Priority: Trivial
> Attachments: HIVE-12502.1.patch
>
>
> The to_date method behaves differently based off the 'data type' of null 
> passed in.
> hive> select to_date(null);   
> FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': 
> TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID
> hive> select to_date(cast(null as timestamp));
> OK
> NULL
> Time taken: 0.031 seconds, Fetched: 1 row(s)
> This appears to be a regression introduced in HIVE-5731.  The previous 
> version of to_date would not check the type:
> https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type

2015-12-10 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052191#comment-15052191
 ] 

Aaron Tokhy commented on HIVE-12502:


I've submitted a patch with an associated unit test.

> to_date UDF cannot accept NULLs of VOID type
> 
>
> Key: HIVE-12502
> URL: https://issues.apache.org/jira/browse/HIVE-12502
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.0.0
>Reporter: Aaron Tokhy
>Assignee: Jason Dere
>Priority: Trivial
> Attachments: HIVE-12502.1.patch
>
>
> The to_date method behaves differently based off the 'data type' of null 
> passed in.
> hive> select to_date(null);   
> FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': 
> TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID
> hive> select to_date(cast(null as timestamp));
> OK
> NULL
> Time taken: 0.031 seconds, Fetched: 1 row(s)
> This appears to be a regression introduced in HIVE-5731.  The previous 
> version of to_date would not check the type:
> https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type

2015-12-10 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-12502:
---
Attachment: HIVE-12502-branch-1.patch
HIVE-12502.patch

> to_date UDF cannot accept NULLs of VOID type
> 
>
> Key: HIVE-12502
> URL: https://issues.apache.org/jira/browse/HIVE-12502
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.0.0
>Reporter: Aaron Tokhy
>Assignee: Jason Dere
>Priority: Trivial
> Attachments: HIVE-12502-branch-1.patch, HIVE-12502.patch
>
>
> The to_date method behaves differently based off the 'data type' of null 
> passed in.
> hive> select to_date(null);   
> FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': 
> TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID
> hive> select to_date(cast(null as timestamp));
> OK
> NULL
> Time taken: 0.031 seconds, Fetched: 1 row(s)
> This appears to be a regression introduced in HIVE-5731.  The previous 
> version of to_date would not check the type:
> https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type

2015-12-10 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-12502:
---
Attachment: (was: HIVE-12502.1.patch)

> to_date UDF cannot accept NULLs of VOID type
> 
>
> Key: HIVE-12502
> URL: https://issues.apache.org/jira/browse/HIVE-12502
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.0.0
>Reporter: Aaron Tokhy
>Assignee: Jason Dere
>Priority: Trivial
>
> The to_date method behaves differently based off the 'data type' of null 
> passed in.
> hive> select to_date(null);   
> FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': 
> TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID
> hive> select to_date(cast(null as timestamp));
> OK
> NULL
> Time taken: 0.031 seconds, Fetched: 1 row(s)
> This appears to be a regression introduced in HIVE-5731.  The previous 
> version of to_date would not check the type:
> https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12474) ORDER BY should handle column refs in parantheses

2015-11-23 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023310#comment-15023310
 ] 

Aaron Tokhy commented on HIVE-12474:


Should 'cluster by'/'sort by'/'distribute by'/'partition by' allow for the use 
of parenthesis if 'order by' does not?

> ORDER BY should handle column refs in parantheses
> -
>
> Key: HIVE-12474
> URL: https://issues.apache.org/jira/browse/HIVE-12474
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.0.0, 1.2.1
>Reporter: Aaron Tokhy
>Assignee: Xuefu Zhang
>Priority: Minor
>
> CREATE TABLE test(a INT, b INT, c INT)
> COMMENT 'This is a test table';
> hive>
> select lead(c) over (order by (a,b)) from test limit 10;
> FAILED: ParseException line 1:31 missing ) at ',' near ')'
> line 1:34 missing EOF at ')' near ')'
> hive>
> select lead(c) over (order by a,b) from test limit 10;
> - Works as expected.
> It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows 
> this:
> https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129
> For example, this syntax is still valid:
> select lead(c) over (sort by (a,b)) from test limit 10;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type

2015-11-23 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-12502:
---
External issue ID:   (was: HIVE-5731)

> to_date UDF cannot accept NULLs of VOID type
> 
>
> Key: HIVE-12502
> URL: https://issues.apache.org/jira/browse/HIVE-12502
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.0.0
>Reporter: Aaron Tokhy
>Assignee: Jason Dere
>Priority: Trivial
>
> The to_date method behaves differently based off the 'data type' of null 
> passed in.
> hive> select to_date(null);   
> FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': 
> TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID
> hive> select to_date(cast(null as timestamp));
> OK
> NULL
> Time taken: 0.031 seconds, Fetched: 1 row(s)
> This appears to be a regression introduced in HIVE-5731.  The previous 
> version of to_date would not check the type:
> https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type

2015-11-23 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023543#comment-15023543
 ] 

Aaron Tokhy commented on HIVE-12502:


Regression

> to_date UDF cannot accept NULLs of VOID type
> 
>
> Key: HIVE-12502
> URL: https://issues.apache.org/jira/browse/HIVE-12502
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.0.0
>Reporter: Aaron Tokhy
>Assignee: Jason Dere
>Priority: Trivial
>
> The to_date method behaves differently based off the 'data type' of null 
> passed in.
> hive> select to_date(null);   
> FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': 
> TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID
> hive> select to_date(cast(null as timestamp));
> OK
> NULL
> Time taken: 0.031 seconds, Fetched: 1 row(s)
> This appears to be a regression introduced in HIVE-5731.  The previous 
> version of to_date would not check the type:
> https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type

2015-11-23 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-12502:
---
Affects Version/s: (was: 0.13.1)
   1.0.0

> to_date UDF cannot accept NULLs of VOID type
> 
>
> Key: HIVE-12502
> URL: https://issues.apache.org/jira/browse/HIVE-12502
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.0.0
>Reporter: Aaron Tokhy
>Assignee: Jason Dere
>Priority: Trivial
>
> The to_date method behaves differently based off the 'data type' of null 
> passed in.
> hive> select to_date(null);   
> FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': 
> TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID
> hive> select to_date(cast(null as timestamp));
> OK
> NULL
> Time taken: 0.031 seconds, Fetched: 1 row(s)
> This appears to be a regression introduced in HIVE-5731.  The previous 
> version of to_date would not check the type:
> https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type

2015-11-23 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-12502:
---
Affects Version/s: (was: 1.0.0)
   0.13.1

> to_date UDF cannot accept NULLs of VOID type
> 
>
> Key: HIVE-12502
> URL: https://issues.apache.org/jira/browse/HIVE-12502
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 0.13.1
>Reporter: Aaron Tokhy
>Assignee: Jason Dere
>Priority: Trivial
>
> The to_date method behaves differently based off the 'data type' of null 
> passed in.
> hive> select to_date(null);   
> FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': 
> TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID
> hive> select to_date(cast(null as timestamp));
> OK
> NULL
> Time taken: 0.031 seconds, Fetched: 1 row(s)
> This appears to be a regression introduced in HIVE-5731.  The previous 
> version of to_date would not check the type:
> https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12474) ORDER BY should handle column refs in parantheses

2015-11-19 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-12474:
---
Description: 
CREATE TABLE test(a INT, b INT, c INT)
COMMENT 'This is a test table';

hive>
select lead(c) over (order by (a,b)) from test limit 10;
FAILED: ParseException line 1:31 missing ) at ',' near ')'
line 1:34 missing EOF at ')' near ')'

hive>
select lead(c) over (order by a,b) from test limit 10;

- Works as expected.

It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows 
this:
https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129

For example, this syntax is still valid:
select lead(c) over (sort by (a,b)) from test limit 10;


  was:
CREATE TABLE test(a INT, b INT, c INT)
COMMENT 'This is a test table';

hive>
select lead(c) over (order by (a,b)) from test limit 10;
FAILED: ParseException line 1:31 missing ) at ',' near ')'
line 1:34 missing EOF at ')' near ')'

hive>
select lead(c) over (order by a,b) from test limit 10;

- Works as expected.

It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows 
this:
https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129

For example, this syntax is still valid:
select lead(c) over (sort by (a,b)) from test limit 10;

This is related to changes that were made as a part of HIVE-6617


> ORDER BY should handle column refs in parantheses
> -
>
> Key: HIVE-12474
> URL: https://issues.apache.org/jira/browse/HIVE-12474
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.0.0, 1.2.1
>Reporter: Aaron Tokhy
>Assignee: Pengcheng Xiong
>Priority: Minor
>
> CREATE TABLE test(a INT, b INT, c INT)
> COMMENT 'This is a test table';
> hive>
> select lead(c) over (order by (a,b)) from test limit 10;
> FAILED: ParseException line 1:31 missing ) at ',' near ')'
> line 1:34 missing EOF at ')' near ')'
> hive>
> select lead(c) over (order by a,b) from test limit 10;
> - Works as expected.
> It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows 
> this:
> https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129
> For example, this syntax is still valid:
> select lead(c) over (sort by (a,b)) from test limit 10;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12474) ORDER BY should handle column refs in parantheses

2015-11-19 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-12474:
---
Description: 
CREATE TABLE test(a INT, b INT, c INT)
COMMENT 'This is a test table';

hive>
select lead(c) over (order by (a,b)) from test limit 10;
FAILED: ParseException line 1:31 missing ) at ',' near ')'
line 1:34 missing EOF at ')' near ')'

hive>
select lead(c) over (order by a,b) from test limit 10;

- Works as expected.

It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows 
this:
https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129

For example, this syntax is still valid:
select lead(c) over (sort by (a,b)) from test limit 10;

This is related to changes that were made as a part of HIVE-6617

  was:
CREATE TABLE test(a INT, b INT, c INT)
COMMENT 'This is a test table';

hive>
select lead(c) over (order by (a,b)) from test limit 10;
FAILED: ParseException line 1:31 missing ) at ',' near ')'
line 1:34 missing EOF at ')' near ')'

hive>
select lead(c) over (order by a,b) from test limit 10;

- Works as expected.

It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows 
this:
https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129

For example, this syntax is still valid:
select lead(c) over (sort by (a,b)) from test limit 10;


> ORDER BY should handle column refs in parantheses
> -
>
> Key: HIVE-12474
> URL: https://issues.apache.org/jira/browse/HIVE-12474
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.0.0, 1.2.1
>Reporter: Aaron Tokhy
>Assignee: Pengcheng Xiong
>Priority: Minor
>
> CREATE TABLE test(a INT, b INT, c INT)
> COMMENT 'This is a test table';
> hive>
> select lead(c) over (order by (a,b)) from test limit 10;
> FAILED: ParseException line 1:31 missing ) at ',' near ')'
> line 1:34 missing EOF at ')' near ')'
> hive>
> select lead(c) over (order by a,b) from test limit 10;
> - Works as expected.
> It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows 
> this:
> https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129
> For example, this syntax is still valid:
> select lead(c) over (sort by (a,b)) from test limit 10;
> This is related to changes that were made as a part of HIVE-6617



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-21 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707706#comment-14707706
 ] 

Aaron Tokhy commented on HIVE-10631:


Code review posted:

https://reviews.apache.org/r/37484/

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Assignee: Aaron Tokhy
Priority: Minor
 Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-14 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697589#comment-14697589
 ] 

Aaron Tokhy commented on HIVE-10631:


Sorry, fixed it just now.

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Assignee: Aaron Tokhy
Priority: Minor
 Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-14 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697371#comment-14697371
 ] 

Aaron Tokhy commented on HIVE-10631:


Sure, here it is:

https://reviews.apache.org/r/37484/

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Assignee: Aaron Tokhy
Priority: Minor
 Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-13 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695641#comment-14695641
 ] 

Aaron Tokhy commented on HIVE-10631:


Uploading a patch that cleanly applies to TRUNK as well as branch-1.0

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Assignee: Aaron Tokhy
Priority: Minor
 Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-13 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-10631:
---
Attachment: HIVE-10631.patch

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Assignee: Aaron Tokhy
Priority: Minor
 Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-12 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-10631:
---
Attachment: HIVE-10631-branch-1.0.patch

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Assignee: Aaron Tokhy
Priority: Minor
 Attachments: HIVE-10631-branch-1.0.patch, HIVE-10631.patch


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-12 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693903#comment-14693903
 ] 

Aaron Tokhy commented on HIVE-10631:


Running precommit tests.

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Assignee: Aaron Tokhy
Priority: Minor
 Attachments: HIVE-10631.patch


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-11 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-10631:
---
Attachment: (was: HIVE-10631.patch.1)

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Assignee: Aaron Tokhy
Priority: Minor
 Attachments: HIVE-10631.patch


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-11 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692730#comment-14692730
 ] 

Aaron Tokhy commented on HIVE-10631:


Reuploaded as HIVE-10631.patch

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Assignee: Aaron Tokhy
Priority: Minor
 Attachments: HIVE-10631.patch


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-11 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-10631:
---
Attachment: HIVE-10631.patch

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Assignee: Aaron Tokhy
Priority: Minor
 Attachments: HIVE-10631.patch


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-11 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy reassigned HIVE-10631:
--

Assignee: Aaron Tokhy

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Assignee: Aaron Tokhy
Priority: Minor
 Attachments: HIVE-10631.patch.1


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-10 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681038#comment-14681038
 ] 

Aaron Tokhy commented on HIVE-10631:


Reading more about hive.stats.reliable, it did not appear to be appropriate to 
use it in this case, and to instead it would be better to defer stats 
calculation for partitioned tables when partitions are being added to a table 
(MSCK/ALTER TABLE), and not on table creation (CREATE [EXTERNAL] TABLE)

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.0, 1.0.0
Reporter: Dongwook Kwon
Priority: Minor

 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-10 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-10631:
---
Attachment: HIVE-10631.patch

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Priority: Minor
 Attachments: HIVE-10631.patch


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-10 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-10631:
---
Attachment: (was: HIVE-10631.patch)

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Priority: Minor
 Attachments: HIVE-10631.patch.1


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-10 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-10631:
---
Attachment: HIVE-10631.patch.1

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Priority: Minor
 Attachments: HIVE-10631.patch.1


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-07-29 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647156#comment-14647156
 ] 

Aaron Tokhy commented on HIVE-10631:


hive.stats.reliable requires that both numFiles and totalSize be set properly, 
regardless of the condition.  So if 'create table' or 'create external table' 
were to use a location already populated with partitions, it will traverse 
those partitions regardless.

As of writing, hive.stats.reliable appears to be set to false by default.  
Perhaps stats calculation on creation of partitioned tables can be forgone when 
hive.stats.reliable is false only, as stats will be populated on MSCK REPAIR 
PARTITIONS or by adding partitions using ALTER TABLE ADD PARTITION.


 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.0, 1.0.0
Reporter: Dongwook Kwon
Priority: Minor

 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)