[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2014-05-26 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009286#comment-14009286
 ] 

Zhuoluo (Clark) Yang commented on HIVE-4561:


Thanks, [~navis], It seems more situations needs to be consider.

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0, 0.13.0
Reporter: caofangkun
Assignee: Navis
 Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch, 
 HIVE-4561.4.patch.txt, HIVE-4561.5.patch.txt, HIVE-4561.6.patch.txt, 
 HIVE-4561.7.patch.txt


 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-3421) Column Level Top K Values Statistics

2013-09-09 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13762679#comment-13762679
 ] 

Zhuoluo (Clark) Yang commented on HIVE-3421:


Would any committers review this? IMHO, It looks cool.

 Column Level Top K Values Statistics
 

 Key: HIVE-3421
 URL: https://issues.apache.org/jira/browse/HIVE-3421
 Project: Hive
  Issue Type: New Feature
Reporter: Feng Lu
Assignee: Feng Lu
 Attachments: HIVE-3421.patch.1.txt, HIVE-3421.patch.2.txt, 
 HIVE-3421.patch.3.txt, HIVE-3421.patch.4.txt, HIVE-3421.patch.5.txt, 
 HIVE-3421.patch.6.txt, HIVE-3421.patch.7.txt, HIVE-3421.patch.8.txt, 
 HIVE-3421.patch.9.txt, HIVE-3421.patch.txt


 Compute (estimate) top k values statistics for each column, and put the most 
 skewed column into skewed info, if user hasn't specified skew.
 This feature depends on ListBucketing (create table skewed on) 
 https://cwiki.apache.org/Hive/listbucketing.html.
 All column topk can be added to skewed info, if in the future skewed info 
 supports multiple independent columns.
 The TopK algorithm is based on this paper:
 http://www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-06-06 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-4561:
---

Attachment: (was: HIVE-4561.4.patch)

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun
Assignee: Zhuoluo (Clark) Yang
 Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch


 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-06-06 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676754#comment-13676754
 ] 

Zhuoluo (Clark) Yang commented on HIVE-4561:


[~shreepadma] It seems null value will involve a lot of modification, such as 
ColumnStatsTask or ObjectStore or thrift files.
Currently, statistics are looked up by code, I think it make sense to keep 
Long.Min/Long.Max.
And we can apply HIVE-4561.3.patch instead of HIVE-4561.4.patch.

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun
Assignee: Zhuoluo (Clark) Yang
 Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch


 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-06-05 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-4561:
---

Attachment: HIVE-4561.4.patch

Update patch, make HIGH/LOW values of empty tables return null.

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun
Assignee: Zhuoluo (Clark) Yang
 Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch, 
 HIVE-4561.4.patch


 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-06-05 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-4561:
---

Status: Patch Available  (was: Open)

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun
Assignee: Zhuoluo (Clark) Yang
 Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch, 
 HIVE-4561.4.patch


 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-06-05 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676571#comment-13676571
 ] 

Zhuoluo (Clark) Yang commented on HIVE-4561:


[~ashutoshc] I think it happens when we try to persist a null max/min,I think 
the simplest way is to leave it empty in the ColumnStatsTask. I will try to 
make a new patch and make a full test.

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun
Assignee: Zhuoluo (Clark) Yang
 Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch, 
 HIVE-4561.4.patch


 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-06-05 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676611#comment-13676611
 ] 

Zhuoluo (Clark) Yang commented on HIVE-4561:


[~shreepadma] I think I am wrong. Originally, I want to return like this:
{code}
@@ -189,6 +187,11 @@
 statsObj.setStatsData(statsData);
   }
 } else {
+  // Any null object, such as min/max value of an empty table,
+  // need not be unpacked.
+  if (o == null) {
+return;
+  }
   // invoke the right unpack method depending on data type of the column
   if (statsObj.getStatsData().isSetBooleanStats()) {
 unpackBooleanStats(oi, o, fieldName, statsObj);
{code}
However, I've found that LongColumnStatsData.highValue is required by thrift. 
And also modifications of ObjectStore is required and checks 
LongColumnStatsData.isSetHighValue(). Any suggestions? Thanks!

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun
Assignee: Zhuoluo (Clark) Yang
 Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch, 
 HIVE-4561.4.patch


 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HIVE-2615) CTAS with literal NULL creates VOID type

2013-06-04 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-2615 started by Zhuoluo (Clark) Yang.

 CTAS with literal NULL creates VOID type
 

 Key: HIVE-2615
 URL: https://issues.apache.org/jira/browse/HIVE-2615
 Project: Hive
  Issue Type: Bug
Reporter: David Phillips
Assignee: Zhuoluo (Clark) Yang

 Create the table with a column that always contains NULL:
 {quote}
 hive create table bad as select 1 x, null z from dual; 
 {quote}
 Because there's no type, Hive gives it the VOID type:
 {quote}
 hive describe bad;
 OK
 x int 
 z void
 {quote}
 This seems weird, because AFAIK, there is no normal way to create a column of 
 type VOID.  The problem is that the table can't be queried:
 {quote}
 hive select * from bad;
 OK
 Failed with exception java.io.IOException:java.lang.RuntimeException: 
 Internal error: no LazyObject for VOID
 {quote}
 Worse, even if you don't select that field, the query fails at runtime:
 {quote}
 hive select x from bad;
 ...
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2615) CTAS with literal NULL creates VOID type

2013-06-04 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-2615:
---

Component/s: Query Processor

 CTAS with literal NULL creates VOID type
 

 Key: HIVE-2615
 URL: https://issues.apache.org/jira/browse/HIVE-2615
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: David Phillips
Assignee: Zhuoluo (Clark) Yang

 Create the table with a column that always contains NULL:
 {quote}
 hive create table bad as select 1 x, null z from dual; 
 {quote}
 Because there's no type, Hive gives it the VOID type:
 {quote}
 hive describe bad;
 OK
 x int 
 z void
 {quote}
 This seems weird, because AFAIK, there is no normal way to create a column of 
 type VOID.  The problem is that the table can't be queried:
 {quote}
 hive select * from bad;
 OK
 Failed with exception java.io.IOException:java.lang.RuntimeException: 
 Internal error: no LazyObject for VOID
 {quote}
 Worse, even if you don't select that field, the query fails at runtime:
 {quote}
 hive select x from bad;
 ...
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2615) CTAS with literal NULL creates VOID type

2013-06-04 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-2615:
---

Affects Version/s: 0.6.0

 CTAS with literal NULL creates VOID type
 

 Key: HIVE-2615
 URL: https://issues.apache.org/jira/browse/HIVE-2615
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: David Phillips
Assignee: Zhuoluo (Clark) Yang

 Create the table with a column that always contains NULL:
 {quote}
 hive create table bad as select 1 x, null z from dual; 
 {quote}
 Because there's no type, Hive gives it the VOID type:
 {quote}
 hive describe bad;
 OK
 x int 
 z void
 {quote}
 This seems weird, because AFAIK, there is no normal way to create a column of 
 type VOID.  The problem is that the table can't be queried:
 {quote}
 hive select * from bad;
 OK
 Failed with exception java.io.IOException:java.lang.RuntimeException: 
 Internal error: no LazyObject for VOID
 {quote}
 Worse, even if you don't select that field, the query fails at runtime:
 {quote}
 hive select x from bad;
 ...
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2615) CTAS with literal NULL creates VOID type

2013-06-04 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-2615:
---

Attachment: HIVE-2615.1.patch

Attach a patch.
The checks after result schema is generated.
if CTAS and contains void, it raise an exception and ask user to cast the type.

 CTAS with literal NULL creates VOID type
 

 Key: HIVE-2615
 URL: https://issues.apache.org/jira/browse/HIVE-2615
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: David Phillips
Assignee: Zhuoluo (Clark) Yang
 Attachments: HIVE-2615.1.patch


 Create the table with a column that always contains NULL:
 {quote}
 hive create table bad as select 1 x, null z from dual; 
 {quote}
 Because there's no type, Hive gives it the VOID type:
 {quote}
 hive describe bad;
 OK
 x int 
 z void
 {quote}
 This seems weird, because AFAIK, there is no normal way to create a column of 
 type VOID.  The problem is that the table can't be queried:
 {quote}
 hive select * from bad;
 OK
 Failed with exception java.io.IOException:java.lang.RuntimeException: 
 Internal error: no LazyObject for VOID
 {quote}
 Worse, even if you don't select that field, the query fails at runtime:
 {quote}
 hive select x from bad;
 ...
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2615) CTAS with literal NULL creates VOID type

2013-06-04 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13674136#comment-13674136
 ] 

Zhuoluo (Clark) Yang commented on HIVE-2615:


https://reviews.apache.org/r/11622/

 CTAS with literal NULL creates VOID type
 

 Key: HIVE-2615
 URL: https://issues.apache.org/jira/browse/HIVE-2615
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: David Phillips
Assignee: Zhuoluo (Clark) Yang
 Attachments: HIVE-2615.1.patch


 Create the table with a column that always contains NULL:
 {quote}
 hive create table bad as select 1 x, null z from dual; 
 {quote}
 Because there's no type, Hive gives it the VOID type:
 {quote}
 hive describe bad;
 OK
 x int 
 z void
 {quote}
 This seems weird, because AFAIK, there is no normal way to create a column of 
 type VOID.  The problem is that the table can't be queried:
 {quote}
 hive select * from bad;
 OK
 Failed with exception java.io.IOException:java.lang.RuntimeException: 
 Internal error: no LazyObject for VOID
 {quote}
 Worse, even if you don't select that field, the query fails at runtime:
 {quote}
 hive select x from bad;
 ...
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2615) CTAS with literal NULL creates VOID type

2013-06-04 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-2615:
---

Fix Version/s: 0.12.0
   Status: Patch Available  (was: In Progress)

Would any committer review this issue?

 CTAS with literal NULL creates VOID type
 

 Key: HIVE-2615
 URL: https://issues.apache.org/jira/browse/HIVE-2615
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: David Phillips
Assignee: Zhuoluo (Clark) Yang
 Fix For: 0.12.0

 Attachments: HIVE-2615.1.patch


 Create the table with a column that always contains NULL:
 {quote}
 hive create table bad as select 1 x, null z from dual; 
 {quote}
 Because there's no type, Hive gives it the VOID type:
 {quote}
 hive describe bad;
 OK
 x int 
 z void
 {quote}
 This seems weird, because AFAIK, there is no normal way to create a column of 
 type VOID.  The problem is that the table can't be queried:
 {quote}
 hive select * from bad;
 OK
 Failed with exception java.io.IOException:java.lang.RuntimeException: 
 Internal error: no LazyObject for VOID
 {quote}
 Worse, even if you don't select that field, the query fails at runtime:
 {quote}
 hive select x from bad;
 ...
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-06-04 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-4561:
---

Attachment: HIVE-4561.3.patch

fix compute_stats_empty_table.q test results.

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun
Assignee: Zhuoluo (Clark) Yang
 Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch


 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-06-04 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-4561:
---

Status: Open  (was: Patch Available)

[~ashutoshc] The values sounds quite strange, I will try to make a new patch.

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun
Assignee: Zhuoluo (Clark) Yang
 Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch


 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2615) CTAS with literal NULL creates VOID type

2013-06-03 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-2615:
---

Assignee: Zhuoluo (Clark) Yang

 CTAS with literal NULL creates VOID type
 

 Key: HIVE-2615
 URL: https://issues.apache.org/jira/browse/HIVE-2615
 Project: Hive
  Issue Type: Bug
Reporter: David Phillips
Assignee: Zhuoluo (Clark) Yang

 Create the table with a column that always contains NULL:
 {quote}
 hive create table bad as select 1 x, null z from dual; 
 {quote}
 Because there's no type, Hive gives it the VOID type:
 {quote}
 hive describe bad;
 OK
 x int 
 z void
 {quote}
 This seems weird, because AFAIK, there is no normal way to create a column of 
 type VOID.  The problem is that the table can't be queried:
 {quote}
 hive select * from bad;
 OK
 Failed with exception java.io.IOException:java.lang.RuntimeException: 
 Internal error: no LazyObject for VOID
 {quote}
 Worse, even if you don't select that field, the query fails at runtime:
 {quote}
 hive select x from bad;
 ...
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-06-02 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-4561:
---

Attachment: HIVE-4561.2.patch

Update a new patch.
In case of all the long values are positive, we can get the right min. In case 
of all the values are negative, we can get the right max.
UT compute_stats_long.q reads values from data/files/int.txt which values are 
all above zero. Original ut computes the min value 0, however, the correct 
min value is 4. This patch fixes the bug.

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun
Assignee: Zhuoluo (Clark) Yang
 Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch


 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-06-02 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672777#comment-13672777
 ] 

Zhuoluo (Clark) Yang commented on HIVE-4561:


Hi, [~ashutoshc], Thanks for your comments. I've updated a new patch on the 
reveiwboard.

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun
Assignee: Zhuoluo (Clark) Yang
 Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch


 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2616) Passing user identity from metastore client to server in non-secure mode

2013-05-17 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660463#comment-13660463
 ] 

Zhuoluo (Clark) Yang commented on HIVE-2616:


Hi!
I am curious about this patch, what will happen if 
hive.metastore.sasl.enabled is NOT enabled and 
hive.metastore.execute.setugi is set.
Look into it from the code, I think the ugi is passed to the HMS and meaning 
nothing. The HMS will create/delete HDFS dir use the server side UGI.
Is there a way to use client side ugi to let HMS manipulate HDFS without 
hive.metastore.sasl.enabled?

 Passing user identity from metastore client to server in non-secure mode
 

 Key: HIVE-2616
 URL: https://issues.apache.org/jira/browse/HIVE-2616
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.1, 0.9.0

 Attachments: hive-2616_1.patch, hive-2616_3.patch, hive-2616_4.patch, 
 hive-2616_5.patch, hive-2616.patch


 Currently in unsecure mode client don't pass on user identity. As a result 
 hdfs and other operations done by server gets executed by user running 
 metastore process instead of being done in context of client. This results in 
 problem as reported here: 
 http://mail-archives.apache.org/mod_mbox/hive-user/20.mbox/%3CCAK0mCrRC3aPqtRHDe2J25Rm0JX6TS1KXxd7KPjqJjoqBjg=a...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2616) Passing user identity from metastore client to server in non-secure mode

2013-05-17 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660466#comment-13660466
 ] 

Zhuoluo (Clark) Yang commented on HIVE-2616:


Is there a way to let user create their table/part dir based on their own UGI?

 Passing user identity from metastore client to server in non-secure mode
 

 Key: HIVE-2616
 URL: https://issues.apache.org/jira/browse/HIVE-2616
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.1, 0.9.0

 Attachments: hive-2616_1.patch, hive-2616_3.patch, hive-2616_4.patch, 
 hive-2616_5.patch, hive-2616.patch


 Currently in unsecure mode client don't pass on user identity. As a result 
 hdfs and other operations done by server gets executed by user running 
 metastore process instead of being done in context of client. This results in 
 problem as reported here: 
 http://mail-archives.apache.org/mod_mbox/hive-user/20.mbox/%3CCAK0mCrRC3aPqtRHDe2J25Rm0JX6TS1KXxd7KPjqJjoqBjg=a...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2616) Passing user identity from metastore client to server in non-secure mode

2013-05-17 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660474#comment-13660474
 ] 

Zhuoluo (Clark) Yang commented on HIVE-2616:


I think I've got the point. 
Is TUGIBasedProcessor.process() doing this?

  try {
shim.doAs(clientUgi, pvea);
return true;
  } catch (RuntimeException rte) {

 Passing user identity from metastore client to server in non-secure mode
 

 Key: HIVE-2616
 URL: https://issues.apache.org/jira/browse/HIVE-2616
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.1, 0.9.0

 Attachments: hive-2616_1.patch, hive-2616_3.patch, hive-2616_4.patch, 
 hive-2616_5.patch, hive-2616.patch


 Currently in unsecure mode client don't pass on user identity. As a result 
 hdfs and other operations done by server gets executed by user running 
 metastore process instead of being done in context of client. This results in 
 problem as reported here: 
 http://mail-archives.apache.org/mod_mbox/hive-user/20.mbox/%3CCAK0mCrRC3aPqtRHDe2J25Rm0JX6TS1KXxd7KPjqJjoqBjg=a...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-05-15 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-4561:
---

Priority: Major  (was: Minor)

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun

 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-05-15 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-4561:
---

Status: Patch Available  (was: Open)

A quick fix, would any body assign the issue to me?

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun

 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-05-15 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-4561:
---

Attachment: HIVE-4561.1.patch

A quick fix.

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun
 Attachments: HIVE-4561.1.patch


 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-05-15 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13658133#comment-13658133
 ] 

Zhuoluo (Clark) Yang commented on HIVE-4561:


https://reviews.apache.org/r/11172/

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun
 Attachments: HIVE-4561.1.patch


 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4562) Some jars of Hive are required to be deployed on every salve of hadoop cluster,we'd better separate these jars from common client-side-jars

2013-05-15 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13658154#comment-13658154
 ] 

Zhuoluo (Clark) Yang commented on HIVE-4562:


I think the trunk version has no such problems.
trunk has unzipped this jars and repack them into hive-exec.jar.
[~caofangkun] IMHO, you can use grep -nC5 unzip ql/build.xml to look into the 
logic.
And you can put mysql-jdbc-connector.jar in the HIVE_AUX_JARS_PATH.
For these reasons, shall we mark this issue Won't Fix?

 Some jars of Hive are required to be deployed on every salve of hadoop 
 cluster,we'd better separate these jars from common client-side-jars
 ---

 Key: HIVE-4562
 URL: https://issues.apache.org/jira/browse/HIVE-4562
 Project: Hive
  Issue Type: Bug
  Components: Clients
Reporter: caofangkun
Priority: Minor

 Some jars of Hive are required not only by the client but also the server 
 (every Hadoop slave),
 though we could use 'add jar' command to add all the jars in dis-cache ,
 but in common way ,we may add these jars in $HADOOP_HOME/lib/  of every salve 
 of the Hadoop Cluster,
 and need restart all the tasktrackers .
 For example:
 When using hive stats, If we use mysql as tmp stats db ,every salve of the 
 Hadoop Cluster should contain 
 mysql-connector-java-.jar in $HADOOP_HOME/lib/ 
 And for column stats 
 In all slaves $HADOOP_HOME/lib/ should contain:
 jackson-core-asl-1.8.8.jar
 jackson-jaxrs-1.8.8.jar
 jackson-mapper-asl-1.8.8.jar
 jackson-xc-1.8.8.jar
 These jars should be separated  from other common client-side-jars .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4562) HIVE-3393 brought in Jackson library,and these four jars should be packed into hive-exec.jar

2013-05-15 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13658169#comment-13658169
 ] 

Zhuoluo (Clark) Yang commented on HIVE-4562:


Yes, I think a repack is necessary.

 HIVE-3393 brought in Jackson library,and these four jars should be packed 
 into hive-exec.jar
 

 Key: HIVE-4562
 URL: https://issues.apache.org/jira/browse/HIVE-4562
 Project: Hive
  Issue Type: Bug
  Components: Clients
Reporter: caofangkun
Priority: Minor

 Some jars of Hive are required not only by the client but also the server 
 (every Hadoop slave),
 though we could use 'add jar' command to add all the jars in dis-cache ,
 but in common way ,we may add these jars in $HADOOP_HOME/lib/  of every salve 
 of the Hadoop Cluster,
 and need restart all the tasktrackers .
 For example:
 When using hive stats, If we use mysql as tmp stats db ,every salve of the 
 Hadoop Cluster should contain 
 mysql-connector-java-.jar in $HADOOP_HOME/lib/ 
 And for column stats 
 In all slaves $HADOOP_HOME/lib/ should contain:
 jackson-core-asl-1.8.8.jar
 jackson-jaxrs-1.8.8.jar
 jackson-mapper-asl-1.8.8.jar
 jackson-xc-1.8.8.jar
 These jars should be separated  from other common client-side-jars .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2616) Passing user identity from metastore client to server in non-secure mode

2013-05-06 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-2616:
---

Issue Type: New Feature  (was: Bug)

 Passing user identity from metastore client to server in non-secure mode
 

 Key: HIVE-2616
 URL: https://issues.apache.org/jira/browse/HIVE-2616
 Project: Hive
  Issue Type: New Feature
  Components: Metastore
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.1, 0.9.0

 Attachments: hive-2616_1.patch, hive-2616_3.patch, hive-2616_4.patch, 
 hive-2616_5.patch, hive-2616.patch


 Currently in unsecure mode client don't pass on user identity. As a result 
 hdfs and other operations done by server gets executed by user running 
 metastore process instead of being done in context of client. This results in 
 problem as reported here: 
 http://mail-archives.apache.org/mod_mbox/hive-user/20.mbox/%3CCAK0mCrRC3aPqtRHDe2J25Rm0JX6TS1KXxd7KPjqJjoqBjg=a...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4501) HS2 memory leak - FileSystem objects in FileSystem.CACHE

2013-05-05 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649529#comment-13649529
 ] 

Zhuoluo (Clark) Yang commented on HIVE-4501:


I think HS1 has similar problems...

 HS2 memory leak - FileSystem objects in FileSystem.CACHE
 

 Key: HIVE-4501
 URL: https://issues.apache.org/jira/browse/HIVE-4501
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Thejas M Nair

 org.apache.hadoop.fs.FileSystem objects are getting accumulated in 
 FileSystem.CACHE, with HS2 in unsecure mode.
 As a workaround, it is possible to set fs.hdfs.impl.disable.cache and 
 fs.file.impl.disable.cache to false.
 Users should not have to bother with this extra configuration. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2019) Implement NOW() UDF

2013-04-17 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634112#comment-13634112
 ] 

Zhuoluo (Clark) Yang commented on HIVE-2019:


If we use now() for filters, the result will be uncertain.
Because if a Map Task is scheduled first, the now() of this map task is 
earlier, if this task is scheduled later, the now() of this map task is later.
In our production environment, a lot of hive tasks are scheduled at mid night, 
the now() of the Tasks may cross a day for the scheduling orders.
I think it is necessary to add a kind of UDF called UDCF. (User Defined Client 
Functions). if we get the client side now() and make it a constant during 
compile time, it will be no such problems.

 Implement NOW() UDF
 ---

 Key: HIVE-2019
 URL: https://issues.apache.org/jira/browse/HIVE-2019
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Carl Steinbach
Assignee: Priyadarshini
 Attachments: HIVE-2019.patch


 Reference: 
 http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_now

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2019) Implement NOW() UDF

2013-04-17 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634121#comment-13634121
 ] 

Zhuoluo (Clark) Yang commented on HIVE-2019:


[~priyadarshini] I think the patch is a little bit simple, and should consider 
distributed situation. I think a better way is to fold the NOW() into a 
constant during compile time.

 Implement NOW() UDF
 ---

 Key: HIVE-2019
 URL: https://issues.apache.org/jira/browse/HIVE-2019
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Carl Steinbach
Assignee: Priyadarshini
 Attachments: HIVE-2019.patch


 Reference: 
 http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_now

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2019) Implement NOW() UDF

2013-04-17 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634176#comment-13634176
 ] 

Zhuoluo (Clark) Yang commented on HIVE-2019:


Actually, NOW() is not a non-deterministic UDF like rand(), for every time you 
call it, it returns different answers. Is HIVE-746 a related JIRA issue?

 Implement NOW() UDF
 ---

 Key: HIVE-2019
 URL: https://issues.apache.org/jira/browse/HIVE-2019
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Carl Steinbach
Assignee: Priyadarshini
 Attachments: HIVE-2019.patch


 Reference: 
 http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_now

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3958) support partial scan for analyze command

2013-01-30 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-3958:
---

Description: 
analyze commands allows us to collect statistics on existing tables/partitions. 
It works great but might be slow since it scans all files.

There are 2 ways to speed it up:
1. collect stats without file scan. It may not collect all stats but good and 
fast enough for use case. HIVE-3917 addresses it
2. collect stats via partial file scan. It doesn't scan all content of files 
but part of it to get file metadata. some examples are 
https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) and 
HFile of Hbase

This jira is targeted to address the #2

  was:
analyze commands allows us to collect statistics on existing tables/partitions. 
It works great but might be slow since it scans all files.

There are 2 ways to speed it up:
1. collect stats without file scan. It may not collect all stats but good and 
fast enough for use case. Hive-3917 addresses it
2. collect stats via partial file scan. It doesn't scan all content of files 
but part of it to get file metadata. some examples are 
https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) and 
HFile of Hbase

This jira is targeted to address the #2


 support partial scan for analyze command
 

 Key: HIVE-3958
 URL: https://issues.apache.org/jira/browse/HIVE-3958
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu

 analyze commands allows us to collect statistics on existing 
 tables/partitions. It works great but might be slow since it scans all files.
 There are 2 ways to speed it up:
 1. collect stats without file scan. It may not collect all stats but good and 
 fast enough for use case. HIVE-3917 addresses it
 2. collect stats via partial file scan. It doesn't scan all content of files 
 but part of it to get file metadata. some examples are 
 https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) 
 and HFile of Hbase
 This jira is targeted to address the #2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2615) CTAS with literal NULL creates VOID type

2013-01-28 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565006#comment-13565006
 ] 

Zhuoluo (Clark) Yang commented on HIVE-2615:


I think option 3 is the better choice, from [~david.phillips] says.
Is any body working on this issue?

 CTAS with literal NULL creates VOID type
 

 Key: HIVE-2615
 URL: https://issues.apache.org/jira/browse/HIVE-2615
 Project: Hive
  Issue Type: Bug
Reporter: David Phillips

 Create the table with a column that always contains NULL:
 {quote}
 hive create table bad as select 1 x, null z from dual; 
 {quote}
 Because there's no type, Hive gives it the VOID type:
 {quote}
 hive describe bad;
 OK
 x int 
 z void
 {quote}
 This seems weird, because AFAIK, there is no normal way to create a column of 
 type VOID.  The problem is that the table can't be queried:
 {quote}
 hive select * from bad;
 OK
 Failed with exception java.io.IOException:java.lang.RuntimeException: 
 Internal error: no LazyObject for VOID
 {quote}
 Worse, even if you don't select that field, the query fails at runtime:
 {quote}
 hive select x from bad;
 ...
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.MapRedTask
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1151) Add 'show version' command to Hive CLI

2013-01-17 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-1151:
---

Attachment: HIVE-1151.3.patch

Correct some comments

 Add 'show version' command to Hive CLI
 --

 Key: HIVE-1151
 URL: https://issues.apache.org/jira/browse/HIVE-1151
 Project: Hive
  Issue Type: New Feature
  Components: CLI, Clients
Affects Versions: 0.6.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-1151.1.patch, HIVE-1151.2.patch, HIVE-1151.3.patch


 At a minimum this command should return the version information obtained
 from the hive-cli jar. Ideally this command will also return version 
 information
 obtained from each of the hive jar files present in the CLASSPATH, which
 will allow us to quickly detect cases where people are using incompatible
 jars.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1151) Add 'show version' command to Hive CLI

2013-01-15 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-1151:
---

Attachment: HIVE-1151.1.patch

Let me attach a patch.
I add a simple DDL grammar called show version.
And the version info is generated by scripts while compiling.

 Add 'show version' command to Hive CLI
 --

 Key: HIVE-1151
 URL: https://issues.apache.org/jira/browse/HIVE-1151
 Project: Hive
  Issue Type: New Feature
  Components: CLI, Clients
Affects Versions: 0.6.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-1151.1.patch


 At a minimum this command should return the version information obtained
 from the hive-cli jar. Ideally this command will also return version 
 information
 obtained from each of the hive jar files present in the CLASSPATH, which
 will allow us to quickly detect cases where people are using incompatible
 jars.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1151) Add 'show version' command to Hive CLI

2013-01-15 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-1151:
---

Attachment: HIVE-1151.2.patch

Attache an updated patch.
1. Remove the code of eating the stack trace, so the Exception stack trace can 
be stringified by DDLTask.execute()
2. Sorry for ignorance of git and shamelessly cloning the code, modify 
saveVersion.sh to get git hostname.
3. Add a hive --version command.

 Add 'show version' command to Hive CLI
 --

 Key: HIVE-1151
 URL: https://issues.apache.org/jira/browse/HIVE-1151
 Project: Hive
  Issue Type: New Feature
  Components: CLI, Clients
Affects Versions: 0.6.0
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-1151.1.patch, HIVE-1151.2.patch


 At a minimum this command should return the version information obtained
 from the hive-cli jar. Ideally this command will also return version 
 information
 obtained from each of the hive jar files present in the CLASSPATH, which
 will allow us to quickly detect cases where people are using incompatible
 jars.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1649) Ability to update counters and status from TRANSFORM scripts

2013-01-10 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-1649:
---

Affects Version/s: 0.6.0

 Ability to update counters and status from TRANSFORM scripts
 

 Key: HIVE-1649
 URL: https://issues.apache.org/jira/browse/HIVE-1649
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Carl Steinbach
 Attachments: HIVE-1649.1.patch


 Hadoop Streaming supports the ability to update counters and status by 
 writing specially coded messages to the script's stderr stream.
 A streaming process can use the stderr to emit counter information. 
 {{reporter:counter:group,counter,amount}} should be sent to stderr to 
 update the counter.
 A streaming process can use the stderr to emit status information. To set a 
 status, {{reporter:status:message}} should be sent to stderr.
 Hive should support these same features with its TRANSFORM mechanism.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3417) mulit inserts when the from statement is a subquery,this is a bug

2012-08-30 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444756#comment-13444756
 ] 

Zhuoluo (Clark) Yang commented on HIVE-3417:


I think this bug was involved by HIVE-1538 for the optimizer prunes the wrong 
filters. And I think we can modify the optimizer to make it work in good 
manners.

 mulit inserts when the from statement is a subquery,this is a bug
 -

 Key: HIVE-3417
 URL: https://issues.apache.org/jira/browse/HIVE-3417
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, SQL
Affects Versions: 0.8.1
 Environment: Linux 3.0.0-14-generic #23-Ubuntu SMP Mon Nov 21 
 20:34:47 UTC 2011 i686 i686 i386 GNU/Linux
 java version 1.6.0_25
 hadoop-0.20.2-cdh3u0
 hive-0.8.1
Reporter: caofangkun

 vi mulit-insert.sql
 create table src (key string, value string);
 load data local inpath './in1.txt' overwrite into table src;
 drop table if exists test1;
 drop table if exists test2;
 create table test1 (key string, value string) partitioned by (dt string);
 create table test2 (key string, value string) partitioned by (dt string);
 select * from src;
 from (select * from src
   where key is not null
   ) --there is a bug here 
 insert overwrite table test1 PARTITION (dt='1') select key ,value where 
 key='48'
 insert overwrite table test2 PARTITION (dt='2') select key, value where 
 key='100';
 select * from test1;
 select * from test2;
 test1 and test2 shoud both have a single line of context.But it's not .
 Has a Solution:
 when set hive.ppd.remove.duplicatefilters=false;
 this's not such bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2419) CREATE TABLE AS SELECT should create warehouse directory

2012-08-20 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-2419:
---

Attachment: HIVE-2419.1.patch

This is an annoying problem, especially for new hive users. We have fixed the 
problem in our internal version of hive by simply create the 
hive.metastore.warehouse.dir during the semantic analysis phase.

 CREATE TABLE AS SELECT should create warehouse directory
 

 Key: HIVE-2419
 URL: https://issues.apache.org/jira/browse/HIVE-2419
 Project: Hive
  Issue Type: Bug
Reporter: David Phillips
 Attachments: HIVE-2419.1.patch


 If you run a CTAS statement on a fresh Hive install without a warehouse 
 directory (as is the case with Amazon EMR), it runs the query but errors out 
 at the end:
 {quote}
 hive create table foo as select * from t_message limit 1;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 ...
 Ended Job = job_201108301753_0001
 Moving data to: 
 hdfs://ip-10-202-22-194.ec2.internal:9000/mnt/hive_07_1/warehouse/foo
 Failed with exception Unable to rename: 
 hdfs://ip-10-202-22-194.ec2.internal:9000/mnt/var/lib/hive_07_1/tmp/scratch/hive_2011-08-30_18-04-36_809_6130923980133666976/-ext-10001
  to: hdfs://ip-10-202-22-194.ec2.internal:9000/mnt/hive_07_1/warehouse/foo
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.MoveTask
 {quote}
 This is different behavior from a simple CREATE TABLE, which creates the 
 warehouse directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.

2011-07-10 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062727#comment-13062727
 ] 

Zhuoluo (Clark) Yang commented on HIVE-896:
---

I think it is necessary to have a kind of functions called UDWF (User-Defined 
Windowing Function).

 Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
 ---

 Key: HIVE-896
 URL: https://issues.apache.org/jira/browse/HIVE-896
 Project: Hive
  Issue Type: New Feature
Reporter: Amr Awadallah
Priority: Minor

 Windowing functions are very useful for click stream processing and similar 
 time-series/sliding-window analytics.
 More details at:
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
 http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
 -- amr

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-2227) Remove ProgressCounter enum in Operator

2011-06-16 Thread Zhuoluo (Clark) Yang (JIRA)
Remove ProgressCounter enum in Operator
---

 Key: HIVE-2227
 URL: https://issues.apache.org/jira/browse/HIVE-2227
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.8.0
Reporter: Zhuoluo (Clark) Yang
Priority: Minor
 Fix For: 0.8.0


After HIVE-1701, it is of no use to keep a heavy counterNameToEnum hashmap. We 
can use string directly, for the enum is only a hack for hadoop 0.17. The 
string will be human readable in the jobdetails.jsp instead of C1, C2, ... 
C1000.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2227) Remove ProgressCounter enum in Operator

2011-06-16 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-2227:
---

Status: Patch Available  (was: Open)

 Remove ProgressCounter enum in Operator
 ---

 Key: HIVE-2227
 URL: https://issues.apache.org/jira/browse/HIVE-2227
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.8.0
Reporter: Zhuoluo (Clark) Yang
Priority: Minor
 Fix For: 0.8.0


 After HIVE-1701, it is of no use to keep a heavy counterNameToEnum hashmap. 
 We can use string directly, for the enum is only a hack for hadoop 0.17. The 
 string will be human readable in the jobdetails.jsp instead of C1, C2, ... 
 C1000.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2227) Remove ProgressCounter enum in Operator

2011-06-16 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-2227:
---

Status: Open  (was: Patch Available)

 Remove ProgressCounter enum in Operator
 ---

 Key: HIVE-2227
 URL: https://issues.apache.org/jira/browse/HIVE-2227
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.8.0
Reporter: Zhuoluo (Clark) Yang
Priority: Minor
 Fix For: 0.8.0


 After HIVE-1701, it is of no use to keep a heavy counterNameToEnum hashmap. 
 We can use string directly, for the enum is only a hack for hadoop 0.17. The 
 string will be human readable in the jobdetails.jsp instead of C1, C2, ... 
 C1000.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2227) Remove ProgressCounter enum in Operator

2011-06-16 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-2227:
---

Status: Patch Available  (was: Open)

 Remove ProgressCounter enum in Operator
 ---

 Key: HIVE-2227
 URL: https://issues.apache.org/jira/browse/HIVE-2227
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.8.0
Reporter: Zhuoluo (Clark) Yang
Priority: Minor
 Fix For: 0.8.0


 After HIVE-1701, it is of no use to keep a heavy counterNameToEnum hashmap. 
 We can use string directly, for the enum is only a hack for hadoop 0.17. The 
 string will be human readable in the jobdetails.jsp instead of C1, C2, ... 
 C1000.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2227) Remove ProgressCounter enum in Operator

2011-06-16 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-2227:
---

Attachment: HIVE-2227-1.patch

Here is a patch.

 Remove ProgressCounter enum in Operator
 ---

 Key: HIVE-2227
 URL: https://issues.apache.org/jira/browse/HIVE-2227
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.8.0
Reporter: Zhuoluo (Clark) Yang
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2227-1.patch


 After HIVE-1701, it is of no use to keep a heavy counterNameToEnum hashmap. 
 We can use string directly, for the enum is only a hack for hadoop 0.17. The 
 string will be human readable in the jobdetails.jsp instead of C1, C2, ... 
 C1000.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-2227) Remove ProgressCounter enum in Operator

2011-06-16 Thread Zhuoluo (Clark) Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhuoluo (Clark) Yang updated HIVE-2227:
---

Status: Open  (was: Patch Available)

Not reviewed.

 Remove ProgressCounter enum in Operator
 ---

 Key: HIVE-2227
 URL: https://issues.apache.org/jira/browse/HIVE-2227
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.8.0
Reporter: Zhuoluo (Clark) Yang
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2227-1.patch


 After HIVE-1701, it is of no use to keep a heavy counterNameToEnum hashmap. 
 We can use string directly, for the enum is only a hack for hadoop 0.17. The 
 string will be human readable in the jobdetails.jsp instead of C1, C2, ... 
 C1000.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-2227) Remove ProgressCounter enum in Operator

2011-06-16 Thread Zhuoluo (Clark) Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050875#comment-13050875
 ] 

Zhuoluo (Clark) Yang commented on HIVE-2227:


Review board
https://reviews.apache.org/r/931/

 Remove ProgressCounter enum in Operator
 ---

 Key: HIVE-2227
 URL: https://issues.apache.org/jira/browse/HIVE-2227
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.8.0
Reporter: Zhuoluo (Clark) Yang
Priority: Minor
 Fix For: 0.8.0

 Attachments: HIVE-2227-1.patch


 After HIVE-1701, it is of no use to keep a heavy counterNameToEnum hashmap. 
 We can use string directly, for the enum is only a hack for hadoop 0.17. The 
 string will be human readable in the jobdetails.jsp instead of C1, C2, ... 
 C1000.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira