[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009286#comment-14009286 ] Zhuoluo (Clark) Yang commented on HIVE-4561: Thanks, [~navis], It seems more situations needs to be consider. Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0, 0.13.0 Reporter: caofangkun Assignee: Navis Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch, HIVE-4561.4.patch.txt, HIVE-4561.5.patch.txt, HIVE-4561.6.patch.txt, HIVE-4561.7.patch.txt if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3421) Column Level Top K Values Statistics
[ https://issues.apache.org/jira/browse/HIVE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13762679#comment-13762679 ] Zhuoluo (Clark) Yang commented on HIVE-3421: Would any committers review this? IMHO, It looks cool. Column Level Top K Values Statistics Key: HIVE-3421 URL: https://issues.apache.org/jira/browse/HIVE-3421 Project: Hive Issue Type: New Feature Reporter: Feng Lu Assignee: Feng Lu Attachments: HIVE-3421.patch.1.txt, HIVE-3421.patch.2.txt, HIVE-3421.patch.3.txt, HIVE-3421.patch.4.txt, HIVE-3421.patch.5.txt, HIVE-3421.patch.6.txt, HIVE-3421.patch.7.txt, HIVE-3421.patch.8.txt, HIVE-3421.patch.9.txt, HIVE-3421.patch.txt Compute (estimate) top k values statistics for each column, and put the most skewed column into skewed info, if user hasn't specified skew. This feature depends on ListBucketing (create table skewed on) https://cwiki.apache.org/Hive/listbucketing.html. All column topk can be added to skewed info, if in the future skewed info supports multiple independent columns. The TopK algorithm is based on this paper: http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-4561: --- Attachment: (was: HIVE-4561.4.patch) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun Assignee: Zhuoluo (Clark) Yang Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676754#comment-13676754 ] Zhuoluo (Clark) Yang commented on HIVE-4561: [~shreepadma] It seems null value will involve a lot of modification, such as ColumnStatsTask or ObjectStore or thrift files. Currently, statistics are looked up by code, I think it make sense to keep Long.Min/Long.Max. And we can apply HIVE-4561.3.patch instead of HIVE-4561.4.patch. Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun Assignee: Zhuoluo (Clark) Yang Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-4561: --- Attachment: HIVE-4561.4.patch Update patch, make HIGH/LOW values of empty tables return null. Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun Assignee: Zhuoluo (Clark) Yang Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch, HIVE-4561.4.patch if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-4561: --- Status: Patch Available (was: Open) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun Assignee: Zhuoluo (Clark) Yang Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch, HIVE-4561.4.patch if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676571#comment-13676571 ] Zhuoluo (Clark) Yang commented on HIVE-4561: [~ashutoshc] I think it happens when we try to persist a null max/min,I think the simplest way is to leave it empty in the ColumnStatsTask. I will try to make a new patch and make a full test. Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun Assignee: Zhuoluo (Clark) Yang Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch, HIVE-4561.4.patch if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676611#comment-13676611 ] Zhuoluo (Clark) Yang commented on HIVE-4561: [~shreepadma] I think I am wrong. Originally, I want to return like this: {code} @@ -189,6 +187,11 @@ statsObj.setStatsData(statsData); } } else { + // Any null object, such as min/max value of an empty table, + // need not be unpacked. + if (o == null) { +return; + } // invoke the right unpack method depending on data type of the column if (statsObj.getStatsData().isSetBooleanStats()) { unpackBooleanStats(oi, o, fieldName, statsObj); {code} However, I've found that LongColumnStatsData.highValue is required by thrift. And also modifications of ObjectStore is required and checks LongColumnStatsData.isSetHighValue(). Any suggestions? Thanks! Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun Assignee: Zhuoluo (Clark) Yang Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch, HIVE-4561.4.patch if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-2615) CTAS with literal NULL creates VOID type
[ https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-2615 started by Zhuoluo (Clark) Yang. CTAS with literal NULL creates VOID type Key: HIVE-2615 URL: https://issues.apache.org/jira/browse/HIVE-2615 Project: Hive Issue Type: Bug Reporter: David Phillips Assignee: Zhuoluo (Clark) Yang Create the table with a column that always contains NULL: {quote} hive create table bad as select 1 x, null z from dual; {quote} Because there's no type, Hive gives it the VOID type: {quote} hive describe bad; OK x int z void {quote} This seems weird, because AFAIK, there is no normal way to create a column of type VOID. The problem is that the table can't be queried: {quote} hive select * from bad; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Internal error: no LazyObject for VOID {quote} Worse, even if you don't select that field, the query fails at runtime: {quote} hive select x from bad; ... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2615) CTAS with literal NULL creates VOID type
[ https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-2615: --- Component/s: Query Processor CTAS with literal NULL creates VOID type Key: HIVE-2615 URL: https://issues.apache.org/jira/browse/HIVE-2615 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: David Phillips Assignee: Zhuoluo (Clark) Yang Create the table with a column that always contains NULL: {quote} hive create table bad as select 1 x, null z from dual; {quote} Because there's no type, Hive gives it the VOID type: {quote} hive describe bad; OK x int z void {quote} This seems weird, because AFAIK, there is no normal way to create a column of type VOID. The problem is that the table can't be queried: {quote} hive select * from bad; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Internal error: no LazyObject for VOID {quote} Worse, even if you don't select that field, the query fails at runtime: {quote} hive select x from bad; ... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2615) CTAS with literal NULL creates VOID type
[ https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-2615: --- Affects Version/s: 0.6.0 CTAS with literal NULL creates VOID type Key: HIVE-2615 URL: https://issues.apache.org/jira/browse/HIVE-2615 Project: Hive Issue Type: Bug Affects Versions: 0.6.0 Reporter: David Phillips Assignee: Zhuoluo (Clark) Yang Create the table with a column that always contains NULL: {quote} hive create table bad as select 1 x, null z from dual; {quote} Because there's no type, Hive gives it the VOID type: {quote} hive describe bad; OK x int z void {quote} This seems weird, because AFAIK, there is no normal way to create a column of type VOID. The problem is that the table can't be queried: {quote} hive select * from bad; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Internal error: no LazyObject for VOID {quote} Worse, even if you don't select that field, the query fails at runtime: {quote} hive select x from bad; ... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2615) CTAS with literal NULL creates VOID type
[ https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-2615: --- Attachment: HIVE-2615.1.patch Attach a patch. The checks after result schema is generated. if CTAS and contains void, it raise an exception and ask user to cast the type. CTAS with literal NULL creates VOID type Key: HIVE-2615 URL: https://issues.apache.org/jira/browse/HIVE-2615 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: David Phillips Assignee: Zhuoluo (Clark) Yang Attachments: HIVE-2615.1.patch Create the table with a column that always contains NULL: {quote} hive create table bad as select 1 x, null z from dual; {quote} Because there's no type, Hive gives it the VOID type: {quote} hive describe bad; OK x int z void {quote} This seems weird, because AFAIK, there is no normal way to create a column of type VOID. The problem is that the table can't be queried: {quote} hive select * from bad; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Internal error: no LazyObject for VOID {quote} Worse, even if you don't select that field, the query fails at runtime: {quote} hive select x from bad; ... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2615) CTAS with literal NULL creates VOID type
[ https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13674136#comment-13674136 ] Zhuoluo (Clark) Yang commented on HIVE-2615: https://reviews.apache.org/r/11622/ CTAS with literal NULL creates VOID type Key: HIVE-2615 URL: https://issues.apache.org/jira/browse/HIVE-2615 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: David Phillips Assignee: Zhuoluo (Clark) Yang Attachments: HIVE-2615.1.patch Create the table with a column that always contains NULL: {quote} hive create table bad as select 1 x, null z from dual; {quote} Because there's no type, Hive gives it the VOID type: {quote} hive describe bad; OK x int z void {quote} This seems weird, because AFAIK, there is no normal way to create a column of type VOID. The problem is that the table can't be queried: {quote} hive select * from bad; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Internal error: no LazyObject for VOID {quote} Worse, even if you don't select that field, the query fails at runtime: {quote} hive select x from bad; ... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2615) CTAS with literal NULL creates VOID type
[ https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-2615: --- Fix Version/s: 0.12.0 Status: Patch Available (was: In Progress) Would any committer review this issue? CTAS with literal NULL creates VOID type Key: HIVE-2615 URL: https://issues.apache.org/jira/browse/HIVE-2615 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: David Phillips Assignee: Zhuoluo (Clark) Yang Fix For: 0.12.0 Attachments: HIVE-2615.1.patch Create the table with a column that always contains NULL: {quote} hive create table bad as select 1 x, null z from dual; {quote} Because there's no type, Hive gives it the VOID type: {quote} hive describe bad; OK x int z void {quote} This seems weird, because AFAIK, there is no normal way to create a column of type VOID. The problem is that the table can't be queried: {quote} hive select * from bad; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Internal error: no LazyObject for VOID {quote} Worse, even if you don't select that field, the query fails at runtime: {quote} hive select x from bad; ... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-4561: --- Attachment: HIVE-4561.3.patch fix compute_stats_empty_table.q test results. Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun Assignee: Zhuoluo (Clark) Yang Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-4561: --- Status: Open (was: Patch Available) [~ashutoshc] The values sounds quite strange, I will try to make a new patch. Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun Assignee: Zhuoluo (Clark) Yang Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2615) CTAS with literal NULL creates VOID type
[ https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-2615: --- Assignee: Zhuoluo (Clark) Yang CTAS with literal NULL creates VOID type Key: HIVE-2615 URL: https://issues.apache.org/jira/browse/HIVE-2615 Project: Hive Issue Type: Bug Reporter: David Phillips Assignee: Zhuoluo (Clark) Yang Create the table with a column that always contains NULL: {quote} hive create table bad as select 1 x, null z from dual; {quote} Because there's no type, Hive gives it the VOID type: {quote} hive describe bad; OK x int z void {quote} This seems weird, because AFAIK, there is no normal way to create a column of type VOID. The problem is that the table can't be queried: {quote} hive select * from bad; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Internal error: no LazyObject for VOID {quote} Worse, even if you don't select that field, the query fails at runtime: {quote} hive select x from bad; ... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-4561: --- Attachment: HIVE-4561.2.patch Update a new patch. In case of all the long values are positive, we can get the right min. In case of all the values are negative, we can get the right max. UT compute_stats_long.q reads values from data/files/int.txt which values are all above zero. Original ut computes the min value 0, however, the correct min value is 4. This patch fixes the bug. Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun Assignee: Zhuoluo (Clark) Yang Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672777#comment-13672777 ] Zhuoluo (Clark) Yang commented on HIVE-4561: Hi, [~ashutoshc], Thanks for your comments. I've updated a new patch on the reveiwboard. Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun Assignee: Zhuoluo (Clark) Yang Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2616) Passing user identity from metastore client to server in non-secure mode
[ https://issues.apache.org/jira/browse/HIVE-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660463#comment-13660463 ] Zhuoluo (Clark) Yang commented on HIVE-2616: Hi! I am curious about this patch, what will happen if hive.metastore.sasl.enabled is NOT enabled and hive.metastore.execute.setugi is set. Look into it from the code, I think the ugi is passed to the HMS and meaning nothing. The HMS will create/delete HDFS dir use the server side UGI. Is there a way to use client side ugi to let HMS manipulate HDFS without hive.metastore.sasl.enabled? Passing user identity from metastore client to server in non-secure mode Key: HIVE-2616 URL: https://issues.apache.org/jira/browse/HIVE-2616 Project: Hive Issue Type: New Feature Components: Metastore Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.1, 0.9.0 Attachments: hive-2616_1.patch, hive-2616_3.patch, hive-2616_4.patch, hive-2616_5.patch, hive-2616.patch Currently in unsecure mode client don't pass on user identity. As a result hdfs and other operations done by server gets executed by user running metastore process instead of being done in context of client. This results in problem as reported here: http://mail-archives.apache.org/mod_mbox/hive-user/20.mbox/%3CCAK0mCrRC3aPqtRHDe2J25Rm0JX6TS1KXxd7KPjqJjoqBjg=a...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2616) Passing user identity from metastore client to server in non-secure mode
[ https://issues.apache.org/jira/browse/HIVE-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660466#comment-13660466 ] Zhuoluo (Clark) Yang commented on HIVE-2616: Is there a way to let user create their table/part dir based on their own UGI? Passing user identity from metastore client to server in non-secure mode Key: HIVE-2616 URL: https://issues.apache.org/jira/browse/HIVE-2616 Project: Hive Issue Type: New Feature Components: Metastore Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.1, 0.9.0 Attachments: hive-2616_1.patch, hive-2616_3.patch, hive-2616_4.patch, hive-2616_5.patch, hive-2616.patch Currently in unsecure mode client don't pass on user identity. As a result hdfs and other operations done by server gets executed by user running metastore process instead of being done in context of client. This results in problem as reported here: http://mail-archives.apache.org/mod_mbox/hive-user/20.mbox/%3CCAK0mCrRC3aPqtRHDe2J25Rm0JX6TS1KXxd7KPjqJjoqBjg=a...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2616) Passing user identity from metastore client to server in non-secure mode
[ https://issues.apache.org/jira/browse/HIVE-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660474#comment-13660474 ] Zhuoluo (Clark) Yang commented on HIVE-2616: I think I've got the point. Is TUGIBasedProcessor.process() doing this? try { shim.doAs(clientUgi, pvea); return true; } catch (RuntimeException rte) { Passing user identity from metastore client to server in non-secure mode Key: HIVE-2616 URL: https://issues.apache.org/jira/browse/HIVE-2616 Project: Hive Issue Type: New Feature Components: Metastore Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.1, 0.9.0 Attachments: hive-2616_1.patch, hive-2616_3.patch, hive-2616_4.patch, hive-2616_5.patch, hive-2616.patch Currently in unsecure mode client don't pass on user identity. As a result hdfs and other operations done by server gets executed by user running metastore process instead of being done in context of client. This results in problem as reported here: http://mail-archives.apache.org/mod_mbox/hive-user/20.mbox/%3CCAK0mCrRC3aPqtRHDe2J25Rm0JX6TS1KXxd7KPjqJjoqBjg=a...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-4561: --- Priority: Major (was: Minor) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-4561: --- Status: Patch Available (was: Open) A quick fix, would any body assign the issue to me? Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-4561: --- Attachment: HIVE-4561.1.patch A quick fix. Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun Attachments: HIVE-4561.1.patch if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)
[ https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13658133#comment-13658133 ] Zhuoluo (Clark) Yang commented on HIVE-4561: https://reviews.apache.org/r/11172/ Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the column values larger than 0.0 (or if all column values smaller than 0.0) Key: HIVE-4561 URL: https://issues.apache.org/jira/browse/HIVE-4561 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.12.0 Reporter: caofangkun Attachments: HIVE-4561.1.patch if all column values larger than 0.0 DOUBLE_LOW_VALUE always will be 0.0 or if all column values less than 0.0, DOUBLE_HIGH_VALUE will always be hive (default) create table src_test (price double); hive (default) load data local inpath './test.txt' into table src_test; hive (default) select * from src_test; OK 1.0 2.0 3.0 Time taken: 0.313 seconds, Fetched: 3 row(s) hive (default) analyze table src_test compute statistics for columns price; mysql select * from TAB_COL_STATS \G; CS_ID: 16 DB_NAME: default TABLE_NAME: src_test COLUMN_NAME: price COLUMN_TYPE: double TBL_ID: 2586 LONG_LOW_VALUE: 0 LONG_HIGH_VALUE: 0 DOUBLE_LOW_VALUE: 0. # Wrong Result ! Expected is 1. DOUBLE_HIGH_VALUE: 3. BIG_DECIMAL_LOW_VALUE: NULL BIG_DECIMAL_HIGH_VALUE: NULL NUM_NULLS: 0 NUM_DISTINCTS: 1 AVG_COL_LEN: 0. MAX_COL_LEN: 0 NUM_TRUES: 0 NUM_FALSES: 0 LAST_ANALYZED: 1368596151 2 rows in set (0.00 sec) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4562) Some jars of Hive are required to be deployed on every salve of hadoop cluster,we'd better separate these jars from common client-side-jars
[ https://issues.apache.org/jira/browse/HIVE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13658154#comment-13658154 ] Zhuoluo (Clark) Yang commented on HIVE-4562: I think the trunk version has no such problems. trunk has unzipped this jars and repack them into hive-exec.jar. [~caofangkun] IMHO, you can use grep -nC5 unzip ql/build.xml to look into the logic. And you can put mysql-jdbc-connector.jar in the HIVE_AUX_JARS_PATH. For these reasons, shall we mark this issue Won't Fix? Some jars of Hive are required to be deployed on every salve of hadoop cluster,we'd better separate these jars from common client-side-jars --- Key: HIVE-4562 URL: https://issues.apache.org/jira/browse/HIVE-4562 Project: Hive Issue Type: Bug Components: Clients Reporter: caofangkun Priority: Minor Some jars of Hive are required not only by the client but also the server (every Hadoop slave), though we could use 'add jar' command to add all the jars in dis-cache , but in common way ,we may add these jars in $HADOOP_HOME/lib/ of every salve of the Hadoop Cluster, and need restart all the tasktrackers . For example: When using hive stats, If we use mysql as tmp stats db ,every salve of the Hadoop Cluster should contain mysql-connector-java-.jar in $HADOOP_HOME/lib/ And for column stats In all slaves $HADOOP_HOME/lib/ should contain: jackson-core-asl-1.8.8.jar jackson-jaxrs-1.8.8.jar jackson-mapper-asl-1.8.8.jar jackson-xc-1.8.8.jar These jars should be separated from other common client-side-jars . -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4562) HIVE-3393 brought in Jackson library,and these four jars should be packed into hive-exec.jar
[ https://issues.apache.org/jira/browse/HIVE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13658169#comment-13658169 ] Zhuoluo (Clark) Yang commented on HIVE-4562: Yes, I think a repack is necessary. HIVE-3393 brought in Jackson library,and these four jars should be packed into hive-exec.jar Key: HIVE-4562 URL: https://issues.apache.org/jira/browse/HIVE-4562 Project: Hive Issue Type: Bug Components: Clients Reporter: caofangkun Priority: Minor Some jars of Hive are required not only by the client but also the server (every Hadoop slave), though we could use 'add jar' command to add all the jars in dis-cache , but in common way ,we may add these jars in $HADOOP_HOME/lib/ of every salve of the Hadoop Cluster, and need restart all the tasktrackers . For example: When using hive stats, If we use mysql as tmp stats db ,every salve of the Hadoop Cluster should contain mysql-connector-java-.jar in $HADOOP_HOME/lib/ And for column stats In all slaves $HADOOP_HOME/lib/ should contain: jackson-core-asl-1.8.8.jar jackson-jaxrs-1.8.8.jar jackson-mapper-asl-1.8.8.jar jackson-xc-1.8.8.jar These jars should be separated from other common client-side-jars . -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2616) Passing user identity from metastore client to server in non-secure mode
[ https://issues.apache.org/jira/browse/HIVE-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-2616: --- Issue Type: New Feature (was: Bug) Passing user identity from metastore client to server in non-secure mode Key: HIVE-2616 URL: https://issues.apache.org/jira/browse/HIVE-2616 Project: Hive Issue Type: New Feature Components: Metastore Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.1, 0.9.0 Attachments: hive-2616_1.patch, hive-2616_3.patch, hive-2616_4.patch, hive-2616_5.patch, hive-2616.patch Currently in unsecure mode client don't pass on user identity. As a result hdfs and other operations done by server gets executed by user running metastore process instead of being done in context of client. This results in problem as reported here: http://mail-archives.apache.org/mod_mbox/hive-user/20.mbox/%3CCAK0mCrRC3aPqtRHDe2J25Rm0JX6TS1KXxd7KPjqJjoqBjg=a...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4501) HS2 memory leak - FileSystem objects in FileSystem.CACHE
[ https://issues.apache.org/jira/browse/HIVE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649529#comment-13649529 ] Zhuoluo (Clark) Yang commented on HIVE-4501: I think HS1 has similar problems... HS2 memory leak - FileSystem objects in FileSystem.CACHE Key: HIVE-4501 URL: https://issues.apache.org/jira/browse/HIVE-4501 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0 Reporter: Thejas M Nair org.apache.hadoop.fs.FileSystem objects are getting accumulated in FileSystem.CACHE, with HS2 in unsecure mode. As a workaround, it is possible to set fs.hdfs.impl.disable.cache and fs.file.impl.disable.cache to false. Users should not have to bother with this extra configuration. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2019) Implement NOW() UDF
[ https://issues.apache.org/jira/browse/HIVE-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634112#comment-13634112 ] Zhuoluo (Clark) Yang commented on HIVE-2019: If we use now() for filters, the result will be uncertain. Because if a Map Task is scheduled first, the now() of this map task is earlier, if this task is scheduled later, the now() of this map task is later. In our production environment, a lot of hive tasks are scheduled at mid night, the now() of the Tasks may cross a day for the scheduling orders. I think it is necessary to add a kind of UDF called UDCF. (User Defined Client Functions). if we get the client side now() and make it a constant during compile time, it will be no such problems. Implement NOW() UDF --- Key: HIVE-2019 URL: https://issues.apache.org/jira/browse/HIVE-2019 Project: Hive Issue Type: New Feature Components: UDF Reporter: Carl Steinbach Assignee: Priyadarshini Attachments: HIVE-2019.patch Reference: http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_now -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2019) Implement NOW() UDF
[ https://issues.apache.org/jira/browse/HIVE-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634121#comment-13634121 ] Zhuoluo (Clark) Yang commented on HIVE-2019: [~priyadarshini] I think the patch is a little bit simple, and should consider distributed situation. I think a better way is to fold the NOW() into a constant during compile time. Implement NOW() UDF --- Key: HIVE-2019 URL: https://issues.apache.org/jira/browse/HIVE-2019 Project: Hive Issue Type: New Feature Components: UDF Reporter: Carl Steinbach Assignee: Priyadarshini Attachments: HIVE-2019.patch Reference: http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_now -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2019) Implement NOW() UDF
[ https://issues.apache.org/jira/browse/HIVE-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634176#comment-13634176 ] Zhuoluo (Clark) Yang commented on HIVE-2019: Actually, NOW() is not a non-deterministic UDF like rand(), for every time you call it, it returns different answers. Is HIVE-746 a related JIRA issue? Implement NOW() UDF --- Key: HIVE-2019 URL: https://issues.apache.org/jira/browse/HIVE-2019 Project: Hive Issue Type: New Feature Components: UDF Reporter: Carl Steinbach Assignee: Priyadarshini Attachments: HIVE-2019.patch Reference: http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_now -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3958) support partial scan for analyze command
[ https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-3958: --- Description: analyze commands allows us to collect statistics on existing tables/partitions. It works great but might be slow since it scans all files. There are 2 ways to speed it up: 1. collect stats without file scan. It may not collect all stats but good and fast enough for use case. HIVE-3917 addresses it 2. collect stats via partial file scan. It doesn't scan all content of files but part of it to get file metadata. some examples are https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) and HFile of Hbase This jira is targeted to address the #2 was: analyze commands allows us to collect statistics on existing tables/partitions. It works great but might be slow since it scans all files. There are 2 ways to speed it up: 1. collect stats without file scan. It may not collect all stats but good and fast enough for use case. Hive-3917 addresses it 2. collect stats via partial file scan. It doesn't scan all content of files but part of it to get file metadata. some examples are https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) and HFile of Hbase This jira is targeted to address the #2 support partial scan for analyze command Key: HIVE-3958 URL: https://issues.apache.org/jira/browse/HIVE-3958 Project: Hive Issue Type: Improvement Reporter: Gang Tim Liu Assignee: Gang Tim Liu analyze commands allows us to collect statistics on existing tables/partitions. It works great but might be slow since it scans all files. There are 2 ways to speed it up: 1. collect stats without file scan. It may not collect all stats but good and fast enough for use case. HIVE-3917 addresses it 2. collect stats via partial file scan. It doesn't scan all content of files but part of it to get file metadata. some examples are https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) and HFile of Hbase This jira is targeted to address the #2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2615) CTAS with literal NULL creates VOID type
[ https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565006#comment-13565006 ] Zhuoluo (Clark) Yang commented on HIVE-2615: I think option 3 is the better choice, from [~david.phillips] says. Is any body working on this issue? CTAS with literal NULL creates VOID type Key: HIVE-2615 URL: https://issues.apache.org/jira/browse/HIVE-2615 Project: Hive Issue Type: Bug Reporter: David Phillips Create the table with a column that always contains NULL: {quote} hive create table bad as select 1 x, null z from dual; {quote} Because there's no type, Hive gives it the VOID type: {quote} hive describe bad; OK x int z void {quote} This seems weird, because AFAIK, there is no normal way to create a column of type VOID. The problem is that the table can't be queried: {quote} hive select * from bad; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Internal error: no LazyObject for VOID {quote} Worse, even if you don't select that field, the query fails at runtime: {quote} hive select x from bad; ... FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1151) Add 'show version' command to Hive CLI
[ https://issues.apache.org/jira/browse/HIVE-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-1151: --- Attachment: HIVE-1151.3.patch Correct some comments Add 'show version' command to Hive CLI -- Key: HIVE-1151 URL: https://issues.apache.org/jira/browse/HIVE-1151 Project: Hive Issue Type: New Feature Components: CLI, Clients Affects Versions: 0.6.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Attachments: HIVE-1151.1.patch, HIVE-1151.2.patch, HIVE-1151.3.patch At a minimum this command should return the version information obtained from the hive-cli jar. Ideally this command will also return version information obtained from each of the hive jar files present in the CLASSPATH, which will allow us to quickly detect cases where people are using incompatible jars. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1151) Add 'show version' command to Hive CLI
[ https://issues.apache.org/jira/browse/HIVE-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-1151: --- Attachment: HIVE-1151.1.patch Let me attach a patch. I add a simple DDL grammar called show version. And the version info is generated by scripts while compiling. Add 'show version' command to Hive CLI -- Key: HIVE-1151 URL: https://issues.apache.org/jira/browse/HIVE-1151 Project: Hive Issue Type: New Feature Components: CLI, Clients Affects Versions: 0.6.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Attachments: HIVE-1151.1.patch At a minimum this command should return the version information obtained from the hive-cli jar. Ideally this command will also return version information obtained from each of the hive jar files present in the CLASSPATH, which will allow us to quickly detect cases where people are using incompatible jars. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1151) Add 'show version' command to Hive CLI
[ https://issues.apache.org/jira/browse/HIVE-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-1151: --- Attachment: HIVE-1151.2.patch Attache an updated patch. 1. Remove the code of eating the stack trace, so the Exception stack trace can be stringified by DDLTask.execute() 2. Sorry for ignorance of git and shamelessly cloning the code, modify saveVersion.sh to get git hostname. 3. Add a hive --version command. Add 'show version' command to Hive CLI -- Key: HIVE-1151 URL: https://issues.apache.org/jira/browse/HIVE-1151 Project: Hive Issue Type: New Feature Components: CLI, Clients Affects Versions: 0.6.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Attachments: HIVE-1151.1.patch, HIVE-1151.2.patch At a minimum this command should return the version information obtained from the hive-cli jar. Ideally this command will also return version information obtained from each of the hive jar files present in the CLASSPATH, which will allow us to quickly detect cases where people are using incompatible jars. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-1649) Ability to update counters and status from TRANSFORM scripts
[ https://issues.apache.org/jira/browse/HIVE-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-1649: --- Affects Version/s: 0.6.0 Ability to update counters and status from TRANSFORM scripts Key: HIVE-1649 URL: https://issues.apache.org/jira/browse/HIVE-1649 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: Carl Steinbach Attachments: HIVE-1649.1.patch Hadoop Streaming supports the ability to update counters and status by writing specially coded messages to the script's stderr stream. A streaming process can use the stderr to emit counter information. {{reporter:counter:group,counter,amount}} should be sent to stderr to update the counter. A streaming process can use the stderr to emit status information. To set a status, {{reporter:status:message}} should be sent to stderr. Hive should support these same features with its TRANSFORM mechanism. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3417) mulit inserts when the from statement is a subquery,this is a bug
[ https://issues.apache.org/jira/browse/HIVE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444756#comment-13444756 ] Zhuoluo (Clark) Yang commented on HIVE-3417: I think this bug was involved by HIVE-1538 for the optimizer prunes the wrong filters. And I think we can modify the optimizer to make it work in good manners. mulit inserts when the from statement is a subquery,this is a bug - Key: HIVE-3417 URL: https://issues.apache.org/jira/browse/HIVE-3417 Project: Hive Issue Type: Bug Components: Query Processor, SQL Affects Versions: 0.8.1 Environment: Linux 3.0.0-14-generic #23-Ubuntu SMP Mon Nov 21 20:34:47 UTC 2011 i686 i686 i386 GNU/Linux java version 1.6.0_25 hadoop-0.20.2-cdh3u0 hive-0.8.1 Reporter: caofangkun vi mulit-insert.sql create table src (key string, value string); load data local inpath './in1.txt' overwrite into table src; drop table if exists test1; drop table if exists test2; create table test1 (key string, value string) partitioned by (dt string); create table test2 (key string, value string) partitioned by (dt string); select * from src; from (select * from src where key is not null ) --there is a bug here insert overwrite table test1 PARTITION (dt='1') select key ,value where key='48' insert overwrite table test2 PARTITION (dt='2') select key, value where key='100'; select * from test1; select * from test2; test1 and test2 shoud both have a single line of context.But it's not . Has a Solution: when set hive.ppd.remove.duplicatefilters=false; this's not such bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2419) CREATE TABLE AS SELECT should create warehouse directory
[ https://issues.apache.org/jira/browse/HIVE-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-2419: --- Attachment: HIVE-2419.1.patch This is an annoying problem, especially for new hive users. We have fixed the problem in our internal version of hive by simply create the hive.metastore.warehouse.dir during the semantic analysis phase. CREATE TABLE AS SELECT should create warehouse directory Key: HIVE-2419 URL: https://issues.apache.org/jira/browse/HIVE-2419 Project: Hive Issue Type: Bug Reporter: David Phillips Attachments: HIVE-2419.1.patch If you run a CTAS statement on a fresh Hive install without a warehouse directory (as is the case with Amazon EMR), it runs the query but errors out at the end: {quote} hive create table foo as select * from t_message limit 1; Total MapReduce jobs = 1 Launching Job 1 out of 1 ... Ended Job = job_201108301753_0001 Moving data to: hdfs://ip-10-202-22-194.ec2.internal:9000/mnt/hive_07_1/warehouse/foo Failed with exception Unable to rename: hdfs://ip-10-202-22-194.ec2.internal:9000/mnt/var/lib/hive_07_1/tmp/scratch/hive_2011-08-30_18-04-36_809_6130923980133666976/-ext-10001 to: hdfs://ip-10-202-22-194.ec2.internal:9000/mnt/hive_07_1/warehouse/foo FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask {quote} This is different behavior from a simple CREATE TABLE, which creates the warehouse directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
[ https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062727#comment-13062727 ] Zhuoluo (Clark) Yang commented on HIVE-896: --- I think it is necessary to have a kind of functions called UDWF (User-Defined Windowing Function). Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive. --- Key: HIVE-896 URL: https://issues.apache.org/jira/browse/HIVE-896 Project: Hive Issue Type: New Feature Reporter: Amr Awadallah Priority: Minor Windowing functions are very useful for click stream processing and similar time-series/sliding-window analytics. More details at: http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032 -- amr -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2227) Remove ProgressCounter enum in Operator
Remove ProgressCounter enum in Operator --- Key: HIVE-2227 URL: https://issues.apache.org/jira/browse/HIVE-2227 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.8.0 Reporter: Zhuoluo (Clark) Yang Priority: Minor Fix For: 0.8.0 After HIVE-1701, it is of no use to keep a heavy counterNameToEnum hashmap. We can use string directly, for the enum is only a hack for hadoop 0.17. The string will be human readable in the jobdetails.jsp instead of C1, C2, ... C1000. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2227) Remove ProgressCounter enum in Operator
[ https://issues.apache.org/jira/browse/HIVE-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-2227: --- Status: Patch Available (was: Open) Remove ProgressCounter enum in Operator --- Key: HIVE-2227 URL: https://issues.apache.org/jira/browse/HIVE-2227 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.8.0 Reporter: Zhuoluo (Clark) Yang Priority: Minor Fix For: 0.8.0 After HIVE-1701, it is of no use to keep a heavy counterNameToEnum hashmap. We can use string directly, for the enum is only a hack for hadoop 0.17. The string will be human readable in the jobdetails.jsp instead of C1, C2, ... C1000. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2227) Remove ProgressCounter enum in Operator
[ https://issues.apache.org/jira/browse/HIVE-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-2227: --- Status: Open (was: Patch Available) Remove ProgressCounter enum in Operator --- Key: HIVE-2227 URL: https://issues.apache.org/jira/browse/HIVE-2227 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.8.0 Reporter: Zhuoluo (Clark) Yang Priority: Minor Fix For: 0.8.0 After HIVE-1701, it is of no use to keep a heavy counterNameToEnum hashmap. We can use string directly, for the enum is only a hack for hadoop 0.17. The string will be human readable in the jobdetails.jsp instead of C1, C2, ... C1000. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2227) Remove ProgressCounter enum in Operator
[ https://issues.apache.org/jira/browse/HIVE-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-2227: --- Status: Patch Available (was: Open) Remove ProgressCounter enum in Operator --- Key: HIVE-2227 URL: https://issues.apache.org/jira/browse/HIVE-2227 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.8.0 Reporter: Zhuoluo (Clark) Yang Priority: Minor Fix For: 0.8.0 After HIVE-1701, it is of no use to keep a heavy counterNameToEnum hashmap. We can use string directly, for the enum is only a hack for hadoop 0.17. The string will be human readable in the jobdetails.jsp instead of C1, C2, ... C1000. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2227) Remove ProgressCounter enum in Operator
[ https://issues.apache.org/jira/browse/HIVE-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-2227: --- Attachment: HIVE-2227-1.patch Here is a patch. Remove ProgressCounter enum in Operator --- Key: HIVE-2227 URL: https://issues.apache.org/jira/browse/HIVE-2227 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.8.0 Reporter: Zhuoluo (Clark) Yang Priority: Minor Fix For: 0.8.0 Attachments: HIVE-2227-1.patch After HIVE-1701, it is of no use to keep a heavy counterNameToEnum hashmap. We can use string directly, for the enum is only a hack for hadoop 0.17. The string will be human readable in the jobdetails.jsp instead of C1, C2, ... C1000. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2227) Remove ProgressCounter enum in Operator
[ https://issues.apache.org/jira/browse/HIVE-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhuoluo (Clark) Yang updated HIVE-2227: --- Status: Open (was: Patch Available) Not reviewed. Remove ProgressCounter enum in Operator --- Key: HIVE-2227 URL: https://issues.apache.org/jira/browse/HIVE-2227 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.8.0 Reporter: Zhuoluo (Clark) Yang Priority: Minor Fix For: 0.8.0 Attachments: HIVE-2227-1.patch After HIVE-1701, it is of no use to keep a heavy counterNameToEnum hashmap. We can use string directly, for the enum is only a hack for hadoop 0.17. The string will be human readable in the jobdetails.jsp instead of C1, C2, ... C1000. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2227) Remove ProgressCounter enum in Operator
[ https://issues.apache.org/jira/browse/HIVE-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050875#comment-13050875 ] Zhuoluo (Clark) Yang commented on HIVE-2227: Review board https://reviews.apache.org/r/931/ Remove ProgressCounter enum in Operator --- Key: HIVE-2227 URL: https://issues.apache.org/jira/browse/HIVE-2227 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.8.0 Reporter: Zhuoluo (Clark) Yang Priority: Minor Fix For: 0.8.0 Attachments: HIVE-2227-1.patch After HIVE-1701, it is of no use to keep a heavy counterNameToEnum hashmap. We can use string directly, for the enum is only a hack for hadoop 0.17. The string will be human readable in the jobdetails.jsp instead of C1, C2, ... C1000. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira