from:"Gang Tim Liu"

[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-09-29 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13781438#comment-13781438
 ] 

Gang Tim Liu commented on HIVE-3959:


Yes,assign it to dilip

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Dilip Joseph
Priority: Minor
 Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
 HIVE-3959.patch.12.txt, HIVE-3959.patch.2


 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Assigned] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-09-29 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu reassigned HIVE-3959:
--

Assignee: Dilip Joseph  (was: Gang Tim Liu)

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Dilip Joseph
Priority: Minor
 Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
 HIVE-3959.patch.12.txt, HIVE-3959.patch.2


 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Assigned] (HIVE-3745) Hive does improper = based string comparisons for strings with trailing whitespaces

2013-09-13 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu reassigned HIVE-3745:
--

Assignee: Kevin Wilfong  (was: Gang Tim Liu)

 Hive does improper = based string comparisons for strings with trailing 
 whitespaces
 -

 Key: HIVE-3745
 URL: https://issues.apache.org/jira/browse/HIVE-3745
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.9.0
Reporter: Harsh J
Assignee: Kevin Wilfong

 Compared to other systems such as DB2, MySQL, etc., which disregard trailing 
 whitespaces in a string used when comparing two strings with the {{=}} 
 relational operator, Hive does not do this.
 For example, note the following line from the MySQL manual: 
 http://dev.mysql.com/doc/refman/5.1/en/char.html
 {quote}
 All MySQL collations are of type PADSPACE. This means that all CHAR and 
 VARCHAR values in MySQL are compared without regard to any trailing spaces. 
 {quote}
 Hive still is whitespace sensitive and regards trailing spaces of a string as 
 worthy elements when comparing. Ideally {{LIKE}} should consider this 
 strongly, but {{=}} should not.
 Is there a specific reason behind this difference of implementation in Hive's 
 SQL?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3949) Some test failures in hadoop 23

2013-06-10 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680052#comment-13680052
 ] 

Gang Tim Liu commented on HIVE-3949:


sure, please feel free to work on it. thanks

 Some test failures in hadoop 23
 ---

 Key: HIVE-3949
 URL: https://issues.apache.org/jira/browse/HIVE-3949
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu

 This is follow up on hive-3873.
 We have fixed some test failures in 3873 and a few other jira issues.
 We will use this jira to track the rest failures: 
 https://builds.apache.org/job/Hive-trunk-hadoop2/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-3949) Some test failures in hadoop 23

2013-06-10 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu reassigned HIVE-3949:
--

Assignee: Brock Noland  (was: Gang Tim Liu)

 Some test failures in hadoop 23
 ---

 Key: HIVE-3949
 URL: https://issues.apache.org/jira/browse/HIVE-3949
 Project: Hive
  Issue Type: Bug
Reporter: Gang Tim Liu
Assignee: Brock Noland

 This is follow up on hive-3873.
 We have fixed some test failures in 3873 and a few other jira issues.
 We will use this jira to track the rest failures: 
 https://builds.apache.org/job/Hive-trunk-hadoop2/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4474) Column access not tracked properly for partitioned tables

2013-05-03 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648537#comment-13648537
 ] 

Gang Tim Liu commented on HIVE-4474:


Committed. thank Samuel Yuan

 Column access not tracked properly for partitioned tables
 -

 Key: HIVE-4474
 URL: https://issues.apache.org/jira/browse/HIVE-4474
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Samuel Yuan
Assignee: Samuel Yuan
 Attachments: HIVE-4474.1.patch.txt


 The columns recorded as being accessed is incorrect for partitioned tables. 
 The index of accessed columns is a position in the list of non-partition 
 columns, but a list of all columns is being used right now to do the lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-05-03 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3959:
---

Attachment: (was: HIVE-3959.patch.9.txt)

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
 HIVE-3959.patch.12.txt, HIVE-3959.patch.2


 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-05-03 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3959:
---

Attachment: HIVE-3959.patch.12.txt

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
 HIVE-3959.patch.12.txt, HIVE-3959.patch.2


 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-05-03 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3959 started by Gang Tim Liu.

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
 HIVE-3959.patch.12.txt, HIVE-3959.patch.2


 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-05-03 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3959:
---

Status: Patch Available  (was: In Progress)

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
 HIVE-3959.patch.12.txt, HIVE-3959.patch.2


 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4474) Column access not tracked properly for partitioned tables

2013-05-02 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647763#comment-13647763
 ] 

Gang Tim Liu commented on HIVE-4474:


running test.

 Column access not tracked properly for partitioned tables
 -

 Key: HIVE-4474
 URL: https://issues.apache.org/jira/browse/HIVE-4474
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Samuel Yuan
Assignee: Samuel Yuan
 Attachments: HIVE-4474.1.patch.txt


 The columns recorded as being accessed is incorrect for partitioned tables. 
 The index of accessed columns is a position in the list of non-partition 
 columns, but a list of all columns is being used right now to do the lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-05-02 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3959:
---

Attachment: HIVE-3959.patch.11.txt

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
 HIVE-3959.patch.2, HIVE-3959.patch.9.txt


 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-05-01 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3959:
---

Attachment: (was: HIVE-3959.patch.2.nohcat)

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3959.patch.1, HIVE-3959.patch.2, 
 HIVE-3959.patch.9.txt


 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-05-01 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3959:
---

Attachment: HIVE-3959.patch.9.txt

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3959.patch.1, HIVE-3959.patch.2, 
 HIVE-3959.patch.9.txt


 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4474) Column access not tracked properly for partitioned tables

2013-05-01 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647206#comment-13647206
 ] 

Gang Tim Liu commented on HIVE-4474:


+1

 Column access not tracked properly for partitioned tables
 -

 Key: HIVE-4474
 URL: https://issues.apache.org/jira/browse/HIVE-4474
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Samuel Yuan
Assignee: Samuel Yuan
 Attachments: HIVE-4474.1.patch.txt


 The columns recorded as being accessed is incorrect for partitioned tables. 
 The index of accessed columns is a position in the list of non-partition 
 columns, but a list of all columns is being used right now to do the lookup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4456) Datanucleus throws NPE after passing a config from test file (.q) to hive metastore

2013-04-30 Thread Gang Tim Liu (JIRA)

Gang Tim Liu created HIVE-4456:
--

 Summary: Datanucleus throws NPE after passing a config from test 
file (.q) to hive metastore
 Key: HIVE-4456
 URL: https://issues.apache.org/jira/browse/HIVE-4456
 Project: Hive
  Issue Type: Bug
  Components: Configuration, Metastore
Reporter: Gang Tim Liu
Priority: Critical


create a configuration file with the following:
set hive.metastore.ds.retry.interval=2000;
create table analyze_srcpart like srcpart;

run ant test -Dtestcase=TestCliDriver -Dqfile=file

NPE is thrown. See attached files.

Anything special for hive.metastore.ds.retry.interval?

It is a config listed under HiveConf.metaVars. Then, HiveConf.get(HiveConf c) 
will recreate a new conf while detecting a difference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4456) Datanucleus throws NPE after passing a config from test file (.q) to hive metastore

2013-04-30 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4456:
---

Attachment: err.txt

 Datanucleus throws NPE after passing a config from test file (.q) to hive 
 metastore
 ---

 Key: HIVE-4456
 URL: https://issues.apache.org/jira/browse/HIVE-4456
 Project: Hive
  Issue Type: Bug
  Components: Configuration, Metastore
Reporter: Gang Tim Liu
Priority: Critical
 Attachments: err.txt


 create a configuration file with the following:
 set hive.metastore.ds.retry.interval=2000;
 create table analyze_srcpart like srcpart;
 run ant test -Dtestcase=TestCliDriver -Dqfile=file
 NPE is thrown. See attached files.
 Anything special for hive.metastore.ds.retry.interval?
 It is a config listed under HiveConf.metaVars. Then, HiveConf.get(HiveConf c) 
 will recreate a new conf while detecting a difference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4389) thrift files are re-generated by compiling

2013-04-29 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644631#comment-13644631
 ] 

Gang Tim Liu commented on HIVE-4389:


+1

 thrift files are re-generated by compiling
 --

 Key: HIVE-4389
 URL: https://issues.apache.org/jira/browse/HIVE-4389
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4389.1.patch


 I am not sure what is going on, but there seems to be a bunch of thrift 
 changes
 if I perform ant thriftif.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

How to pass config from qfile to Hive Metastore

2013-04-26 Thread Gang Tim Liu

Hi Dear all,

I want to set a configuration in file and pass it to Hive Metastore for
example logic in HiveAlterHandler.java. In order to do that, this
configuration should be in HiveConf.metaVars.

But, a simple test got NPE. Anyone has experience to pass config from
qfile to Hive metastore?

Attached has status.q. It has set hive.metastore.ds.retry.interval=2000
which is part of HiveConf.metaVars. Attached has error.txt.

If we remove the config line from status.q, it works.

Thanks

Tim

2013-04-26 14:34:41,603 ERROR exec.Task (SessionState.java:printError(388)) - 
FAILED: Error in metadata: Unable to fetch table srcpart
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table srcpart
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:957)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891)
at 
org.apache.hadoop.hive.ql.exec.DDLTask.createTableLike(DDLTask.java:3803)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:279)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:145)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1355)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1139)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:945)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:790)
at 
org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:124)
at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats60(TestCliDriver.java:108)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:154)
at junit.framework.TestCase.runBare(TestCase.java:127)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at junit.framework.TestSuite.runTest(TestSuite.java:208)
at junit.framework.TestSuite.run(TestSuite.java:203)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785)
Caused by: java.lang.NullPointerException
at org.datanucleus.sco.simple.Set.init(Set.java:68)
at org.datanucleus.sco.backed.Set.init(Set.java:94)
at org.datanucleus.sco.backed.Map.entrySet(Map.java:418)
at 
org.apache.hadoop.hive.metastore.api.SerDeInfo.init(SerDeInfo.java:157)
at 
org.apache.hadoop.hive.metastore.api.StorageDescriptor.init(StorageDescriptor.java:256)
at org.apache.hadoop.hive.metastore.api.Table.init(Table.java:260)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopy(HiveMetaStoreClient.java:1177)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:854)
at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
at $Proxy7.getTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:949)
... 30 more

2013-04-26 14:34:41,603 DEBUG exec.DDLTask (DDLTask.java:execute(459)) - 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table srcpart
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:957)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891)
at 
org.apache.hadoop.hive.ql.exec.DDLTask.createTableLike(DDLTask.java:3803)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:279)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:145)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at

[jira] [Assigned] (HIVE-3682) when output hive table to file,users should could have a separator of their own choice

2013-04-24 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu reassigned HIVE-3682:
--

Assignee: (was: Gang Tim Liu)

 when output hive table to file,users should could have a separator of their 
 own choice
 --

 Key: HIVE-3682
 URL: https://issues.apache.org/jira/browse/HIVE-3682
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Affects Versions: 0.8.1
 Environment: Linux 3.0.0-14-generic #23-Ubuntu SMP Mon Nov 21 
 20:34:47 UTC 2011 i686 i686 i386 GNU/Linux
 java version 1.6.0_25
 hadoop-0.20.2-cdh3u0
 hive-0.8.1
Reporter: caofangkun
 Attachments: HIVE-3682-1.patch, HIVE-3682.D10275.1.patch, 
 HIVE-3682.with.serde.patch


 By default,when output hive table to file ,columns of the Hive table are 
 separated by ^A character (that is \001).
 But indeed users should have the right to set a seperator of their own choice.
 Usage Example:
 create table for_test (key string, value string);
 load data local inpath './in1.txt' into table for_test
 select * from for_test;
 UT-01：default separator is \001 line separator is \n
 insert overwrite local directory './test-01' 
 select * from src ;
 create table array_table (a arraystring, b arraystring)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ',';
 load data local inpath ../hive/examples/files/arraytest.txt overwrite into 
 table table2;
 CREATE TABLE map_table (foo STRING , bar MAPSTRING, STRING)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ','
 MAP KEYS TERMINATED BY ':'
 STORED AS TEXTFILE;
 UT-02：defined field separator as ':'
 insert overwrite local directory './test-02' 
 row format delimited 
 FIELDS TERMINATED BY ':' 
 select * from src ;
 UT-03: line separator DO NOT ALLOWED to define as other separator 
 insert overwrite local directory './test-03' 
 row format delimited 
 FIELDS TERMINATED BY ':' 
 select * from src ;
 UT-04: define map separators 
 insert overwrite local directory './test-04' 
 row format delimited 
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ','
 MAP KEYS TERMINATED BY ':'
 select * from src;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-3682) when output hive table to file,users should could have a separator of their own choice

2013-04-24 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu reassigned HIVE-3682:
--

Assignee: Sushanth Sowmyan

 when output hive table to file,users should could have a separator of their 
 own choice
 --

 Key: HIVE-3682
 URL: https://issues.apache.org/jira/browse/HIVE-3682
 Project: Hive
  Issue Type: New Feature
  Components: CLI
Affects Versions: 0.8.1
 Environment: Linux 3.0.0-14-generic #23-Ubuntu SMP Mon Nov 21 
 20:34:47 UTC 2011 i686 i686 i386 GNU/Linux
 java version 1.6.0_25
 hadoop-0.20.2-cdh3u0
 hive-0.8.1
Reporter: caofangkun
Assignee: Sushanth Sowmyan
 Attachments: HIVE-3682-1.patch, HIVE-3682.D10275.1.patch, 
 HIVE-3682.with.serde.patch


 By default,when output hive table to file ,columns of the Hive table are 
 separated by ^A character (that is \001).
 But indeed users should have the right to set a seperator of their own choice.
 Usage Example:
 create table for_test (key string, value string);
 load data local inpath './in1.txt' into table for_test
 select * from for_test;
 UT-01：default separator is \001 line separator is \n
 insert overwrite local directory './test-01' 
 select * from src ;
 create table array_table (a arraystring, b arraystring)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ',';
 load data local inpath ../hive/examples/files/arraytest.txt overwrite into 
 table table2;
 CREATE TABLE map_table (foo STRING , bar MAPSTRING, STRING)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ','
 MAP KEYS TERMINATED BY ':'
 STORED AS TEXTFILE;
 UT-02：defined field separator as ':'
 insert overwrite local directory './test-02' 
 row format delimited 
 FIELDS TERMINATED BY ':' 
 select * from src ;
 UT-03: line separator DO NOT ALLOWED to define as other separator 
 insert overwrite local directory './test-03' 
 row format delimited 
 FIELDS TERMINATED BY ':' 
 select * from src ;
 UT-04: define map separators 
 insert overwrite local directory './test-04' 
 row format delimited 
 FIELDS TERMINATED BY '\t'
 COLLECTION ITEMS TERMINATED BY ','
 MAP KEYS TERMINATED BY ':'
 select * from src;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4310) optimize count(distinct) with hive.map.groupby.sorted

2013-04-19 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637097#comment-13637097
 ] 

Gang Tim Liu commented on HIVE-4310:


+1

 optimize count(distinct) with hive.map.groupby.sorted
 -

 Key: HIVE-4310
 URL: https://issues.apache.org/jira/browse/HIVE-4310
 Project: Hive
  Issue Type: Improvement
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4310.1.patch, hive.4310.1.patch-nohcat, 
 hive.4310.2.patch-nohcat, hive.4310.3.patch-nohcat, hive.4310.4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: hi

2013-04-18 Thread Gang Tim Liu

Super like it.

On 4/18/13 5:31 AM, Namit Jain nj...@fb.com wrote:

Hi,

Since we are developing at a very fast pace, it would be really useful to
think about maintainability and testing of the large codebase.
Historically, we have not focussed on a few things, and they might soon
bite us. I wanted to propose the following for all checkins:


  1.  Javadoc for all public/private functions, except for
setters/getters. For any complex function, clear examples (input/output)
would really help.
  2.  Convention for variable/function names  do we have any ?
  3.  If possible, the test name (.q file) where the function is being
invoked, or the query which would potentially test that scenario, if it
is a query processor change.
  4.  Specially, for query optimizations, it might be a good idea to have
a simple working query at the top, and the expected changes. For e.g..
The operator tree for that query at each step, or a detailed explanation
at the top.
  5.  Comments in each test (.q file) that should include the jira
number,  what is it trying to test. Assumptions about each query.
  6.  Reduce the output for each test  whenever query is outputting more
than 10 results, it should have a reason. Otherwise, each query result
should be bounded by 10 rows.

In general, focussing on a lot of comments in the code will go a long way
for everyone to follow along.

Thanks,
-namit

[jira] [Created] (HIVE-4377) Add more comment to https://reviews.facebook.net/D1209 (HIVE-2340)

2013-04-18 Thread Gang Tim Liu (JIRA)

Gang Tim Liu created HIVE-4377:
--

 Summary: Add more comment to https://reviews.facebook.net/D1209 
(HIVE-2340)
 Key: HIVE-4377
 URL: https://issues.apache.org/jira/browse/HIVE-4377
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Gang Tim Liu
Assignee: Navis


thanks a lot for addressing optimization in HIVE-2340. Awesome!

Since we are developing at a very fast pace, it would be really useful to
think about maintainability and testing of the large codebase. Highlights which 
are applicable for D1209:

  1.  Javadoc for all public/private functions, except for
setters/getters. For any complex function, clear examples (input/output)
would really help.
  2.  Specially, for query optimizations, it might be a good idea to have
a simple working query at the top, and the expected changes. For e.g..
The operator tree for that query at each step, or a detailed explanation
at the top.
  3.  If possible, the test name (.q file) where the function is being
invoked, or the query which would potentially test that scenario, if it
is a query processor change.
  4.  Comments in each test (.q file) that should include the jira
number,  what is it trying to test. Assumptions about each query.
  5.  Reduce the output for each test  whenever query is outputting more
than 10 results, it should have a reason. Otherwise, each query result
should be bounded by 10 rows.

thanks a lot

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-446) Implement TRUNCATE

2013-04-12 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630258#comment-13630258
 ] 

Gang Tim Liu commented on HIVE-446:
---

External table is used in the context where data is not fully managed. If it 
ends up that there is a need to remove data behind external table, a question 
can be asked why do you define it as external table?.

Saying that, possibly the proposed syntax and semantics are not consistent to 
external table use case.

thanks

 Implement TRUNCATE
 --

 Key: HIVE-446
 URL: https://issues.apache.org/jira/browse/HIVE-446
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Prasad Chakka
Assignee: Navis
 Fix For: 0.11.0

 Attachments: HIVE-446.D7371.1.patch, HIVE-446.D7371.2.patch, 
 HIVE-446.D7371.3.patch, HIVE-446.D7371.4.patch


 truncate the data but leave the table and metadata intact.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4322) SkewedInfo in Metastore Thrift API cannot be deserialized in Python

2013-04-12 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630546#comment-13630546
 ] 

Gang Tim Liu commented on HIVE-4322:


+1 after test passes

 SkewedInfo in Metastore Thrift API cannot be deserialized in Python
 ---

 Key: HIVE-4322
 URL: https://issues.apache.org/jira/browse/HIVE-4322
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Thrift API
Affects Versions: 0.11.0
Reporter: Samuel Yuan
Assignee: Samuel Yuan
Priority: Minor
 Attachments: HIVE-4322.HIVE-4322.HIVE-4322.HIVE-4322.D10203.1.patch


 The Thrift-generated Python code that deserializes Thrift objects fails 
 whenever a complex type is used as a map key, because by default mutable 
 Python objects such as lists do not have a hash function. See 
 https://issues.apache.org/jira/browse/THRIFT-162 for related discussion.
 The SkewedInfo struct contains a map which uses a list as a key, breaking the 
 Python Thrift interface. It is not possible to specify the mapping from 
 Thrift types to Python types, or otherwise we could map Thrift lists to 
 Python tuples. Instead, the proposed workaround wraps the list inside a new 
 struct. This alone does not accomplish anything, but allows Python clients to 
 define a hash function for the struct class, e.g.:
 def f(object):
 return hash(tuple(object.skewedValueList))
 SkewedValueList.__hash__ = f
 In practice a more efficient hash might be defined that does not involve 
 copying the list. The advantage of wrapping the list inside a struct is that 
 the client does not have to define the hash on the list itself, which would 
 change the behaviour of lists everywhere else in the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4351) Thrift code generation fails due to hcatalog

2013-04-12 Thread Gang Tim Liu (JIRA)

Gang Tim Liu created HIVE-4351:
--

 Summary: Thrift code generation fails due to hcatalog
 Key: HIVE-4351
 URL: https://issues.apache.org/jira/browse/HIVE-4351
 Project: Hive
  Issue Type: Bug
  Components: Thrift API
Affects Versions: 0.11.0
Reporter: Gang Tim Liu
Assignee: Ashutosh Chauhan


It fails to generate thrift code since hcatalog doesn't have Target thriftif

ant thriftif -Dthrift.home=/usr/local
.
BUILD FAILED

Target thriftif does not exist in the project hcatalog. 




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4322) SkewedInfo in Metastore Thrift API cannot be deserialized in Python

2013-04-12 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630689#comment-13630689
 ] 

Gang Tim Liu commented on HIVE-4322:


Committed. thank Samuel Yuan.

 SkewedInfo in Metastore Thrift API cannot be deserialized in Python
 ---

 Key: HIVE-4322
 URL: https://issues.apache.org/jira/browse/HIVE-4322
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Thrift API
Affects Versions: 0.11.0
Reporter: Samuel Yuan
Assignee: Samuel Yuan
Priority: Minor
 Attachments: HIVE-4322.HIVE-4322.HIVE-4322.HIVE-4322.D10203.1.patch


 The Thrift-generated Python code that deserializes Thrift objects fails 
 whenever a complex type is used as a map key, because by default mutable 
 Python objects such as lists do not have a hash function. See 
 https://issues.apache.org/jira/browse/THRIFT-162 for related discussion.
 The SkewedInfo struct contains a map which uses a list as a key, breaking the 
 Python Thrift interface. It is not possible to specify the mapping from 
 Thrift types to Python types, or otherwise we could map Thrift lists to 
 Python tuples. Instead, the proposed workaround wraps the list inside a new 
 struct. This alone does not accomplish anything, but allows Python clients to 
 define a hash function for the struct class, e.g.:
 def f(object):
 return hash(tuple(object.skewedValueList))
 SkewedValueList.__hash__ = f
 In practice a more efficient hash might be defined that does not involve 
 copying the list. The advantage of wrapping the list inside a struct is that 
 the client does not have to define the hash on the list itself, which would 
 change the behaviour of lists everywhere else in the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4322) SkewedInfo in Metastore Thrift API cannot be deserialized in Python

2013-04-12 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4322:
---

   Resolution: Fixed
Fix Version/s: 0.11.0
   Status: Resolved  (was: Patch Available)

 SkewedInfo in Metastore Thrift API cannot be deserialized in Python
 ---

 Key: HIVE-4322
 URL: https://issues.apache.org/jira/browse/HIVE-4322
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Thrift API
Affects Versions: 0.11.0
Reporter: Samuel Yuan
Assignee: Samuel Yuan
Priority: Minor
 Fix For: 0.11.0

 Attachments: HIVE-4322.HIVE-4322.HIVE-4322.HIVE-4322.D10203.1.patch


 The Thrift-generated Python code that deserializes Thrift objects fails 
 whenever a complex type is used as a map key, because by default mutable 
 Python objects such as lists do not have a hash function. See 
 https://issues.apache.org/jira/browse/THRIFT-162 for related discussion.
 The SkewedInfo struct contains a map which uses a list as a key, breaking the 
 Python Thrift interface. It is not possible to specify the mapping from 
 Thrift types to Python types, or otherwise we could map Thrift lists to 
 Python tuples. Instead, the proposed workaround wraps the list inside a new 
 struct. This alone does not accomplish anything, but allows Python clients to 
 define a hash function for the struct class, e.g.:
 def f(object):
 return hash(tuple(object.skewedValueList))
 SkewedValueList.__hash__ = f
 In practice a more efficient hash might be defined that does not involve 
 copying the list. The advantage of wrapping the list inside a struct is that 
 the client does not have to define the hash on the list itself, which would 
 change the behaviour of lists everywhere else in the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4351) Thrift code generation fails due to hcatalog

2013-04-12 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630691#comment-13630691
 ] 

Gang Tim Liu commented on HIVE-4351:


thank [~ashutoshc] very much

 Thrift code generation fails due to hcatalog
 

 Key: HIVE-4351
 URL: https://issues.apache.org/jira/browse/HIVE-4351
 Project: Hive
  Issue Type: Bug
  Components: Thrift API
Affects Versions: 0.11.0
Reporter: Gang Tim Liu
Assignee: Ashutosh Chauhan
 Fix For: 0.12.0


 It fails to generate thrift code since hcatalog doesn't have Target thriftif
 ant thriftif -Dthrift.home=/usr/local
 .
 BUILD FAILED
 
 Target thriftif does not exist in the project hcatalog. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4241) optimize hive.enforce.sorting and hive.enforce bucketing join

2013-04-11 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629237#comment-13629237
 ] 

Gang Tim Liu commented on HIVE-4241:


+1

 optimize hive.enforce.sorting and hive.enforce bucketing join
 -

 Key: HIVE-4241
 URL: https://issues.apache.org/jira/browse/HIVE-4241
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4241.1.patch, hive.4241.1.patch-nohcat, 
 hive.4241.2.patch-nohcat


 Consider the following scenario:
 T1: sorted and bucketed by key into 2 buckets
 T2: sorted and bucketed by key into 2 buckets
 T3: sorted and bucketed by key into 2 buckets
 set hive.enforce.sorting=true;
 set hive.enforce.bucketing=true;
 insert overwrite table T3
 select .. from T1 join T2 on T1.key = T2.key;
 Since T1, T2 and T3 are sorted/bucketed by the join, and the above join is
 being performed as a sort-merge join, T3 should be bucketed/sorted without
 the need for an extra reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4337) Update list bucketing test results

2013-04-10 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628419#comment-13628419
 ] 

Gang Tim Liu commented on HIVE-4337:


+1

 Update list bucketing test results
 --

 Key: HIVE-4337
 URL: https://issues.apache.org/jira/browse/HIVE-4337
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.11.0
Reporter: Samuel Yuan
Assignee: Samuel Yuan
Priority: Trivial
 Attachments: HIVE-4337.HIVE-4337.HIVE-4337.D10131.1.patch


 A recent change resulted in different output for the list bucketing tests, 
 which run for Hadoop23. The output files were not updated to reflect this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4337) Update list bucketing test results

2013-04-10 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4337:
---

Status: Patch Available  (was: Open)

 Update list bucketing test results
 --

 Key: HIVE-4337
 URL: https://issues.apache.org/jira/browse/HIVE-4337
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.11.0
Reporter: Samuel Yuan
Assignee: Samuel Yuan
Priority: Trivial
 Attachments: HIVE-4337.HIVE-4337.HIVE-4337.D10131.1.patch


 A recent change resulted in different output for the list bucketing tests, 
 which run for Hadoop23. The output files were not updated to reflect this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4337) Update list bucketing test results

2013-04-10 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4337:
---

   Resolution: Fixed
Fix Version/s: 0.11.0
   Status: Resolved  (was: Patch Available)

 Update list bucketing test results
 --

 Key: HIVE-4337
 URL: https://issues.apache.org/jira/browse/HIVE-4337
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.11.0
Reporter: Samuel Yuan
Assignee: Samuel Yuan
Priority: Trivial
 Fix For: 0.11.0

 Attachments: HIVE-4337.HIVE-4337.HIVE-4337.D10131.1.patch


 A recent change resulted in different output for the list bucketing tests, 
 which run for Hadoop23. The output files were not updated to reflect this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4322) SkewedInfo in Metastore Thrift API cannot be deserialized in Python

2013-04-09 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627275#comment-13627275
 ] 

Gang Tim Liu commented on HIVE-4322:


[~sxyuan] Good write up. thank you for working on it.

 SkewedInfo in Metastore Thrift API cannot be deserialized in Python
 ---

 Key: HIVE-4322
 URL: https://issues.apache.org/jira/browse/HIVE-4322
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Thrift API
Affects Versions: 0.11.0
Reporter: Samuel Yuan
Assignee: Samuel Yuan
Priority: Minor

 The Thrift-generated Python code that deserializes Thrift objects fails 
 whenever a complex type is used as a map key, because by default mutable 
 Python objects such as lists do not have a hash function. See 
 https://issues.apache.org/jira/browse/THRIFT-162 for related discussion.
 The SkewedInfo struct contains a map which uses a list as a key, breaking the 
 Python Thrift interface. It is not possible to specify the mapping from 
 Thrift types to Python types, or otherwise we could map Thrift lists to 
 Python tuples. Instead, the proposed workaround wraps the list inside a new 
 struct. This alone does not accomplish anything, but allows Python clients to 
 define a hash function for the struct class, e.g.:
 def f(object):
 return hash(tuple(object.skewedValueList))
 SkewedValueList.__hash__ = f
 In practice a more efficient hash might be defined that does not involve 
 copying the list. The advantage of wrapping the list inside a struct is that 
 the client does not have to define the hash on the list itself, which would 
 change the behaviour of lists everywhere else in the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4298) add tests for distincts for hive.map.groutp.sorted

2013-04-05 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624261#comment-13624261
 ] 

Gang Tim Liu commented on HIVE-4298:


+1

 add tests for distincts for hive.map.groutp.sorted
 --

 Key: HIVE-4298
 URL: https://issues.apache.org/jira/browse/HIVE-4298
 Project: Hive
  Issue Type: Test
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4298.1.patch, hive.4298.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4298) add tests for distincts for hive.map.groutp.sorted

2013-04-05 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624275#comment-13624275
 ] 

Gang Tim Liu commented on HIVE-4298:


Committed. thank Namit.

 add tests for distincts for hive.map.groutp.sorted
 --

 Key: HIVE-4298
 URL: https://issues.apache.org/jira/browse/HIVE-4298
 Project: Hive
  Issue Type: Test
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4298.1.patch, hive.4298.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4298) add tests for distincts for hive.map.groutp.sorted

2013-04-05 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624316#comment-13624316
 ] 

Gang Tim Liu commented on HIVE-4298:


Woo, thank Ashutosh




 add tests for distincts for hive.map.groutp.sorted
 --

 Key: HIVE-4298
 URL: https://issues.apache.org/jira/browse/HIVE-4298
 Project: Hive
  Issue Type: Test
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.11.0

 Attachments: hive.4298.1.patch, hive.4298.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-4213) List bucketing error too restrictive

2013-04-03 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu reassigned HIVE-4213:
--

Assignee: Gang Tim Liu

 List bucketing error too restrictive
 

 Key: HIVE-4213
 URL: https://issues.apache.org/jira/browse/HIVE-4213
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mark Grover
Assignee: Gang Tim Liu
 Fix For: 0.11.0


 With the introduction of List bucketing, we introduced a config validation 
 step where we say:
 {code}
   SUPPORT_DIR_MUST_TRUE_FOR_LIST_BUCKETING(
   10199,
   hive.mapred.supports.subdirectories must be true
   +  if any one of following is true: 
 hive.internal.ddl.list.bucketing.enable,
   +  hive.optimize.listbucketing and mapred.input.dir.recursive),
 {code}
 This seems overly restrictive to because there are use cases where people may 
 want to use {{mapred.input.dir.recursive}} to {{true}} even when they don't 
 care about list bucketing.
 Is that not true?
 For example, here is the unit test code for {{clientpositive/recursive_dir.q}}
 {code}
 CREATE TABLE fact_daily(x int) PARTITIONED BY (ds STRING);
 CREATE TABLE fact_tz(x int) PARTITIONED BY (ds STRING, hr STRING)
 LOCATION 'pfile:${system:test.tmp.dir}/fact_tz';
 INSERT OVERWRITE TABLE fact_tz PARTITION (ds='1', hr='1')
 SELECT key+11 FROM src WHERE key=484;
 ALTER TABLE fact_daily SET TBLPROPERTIES('EXTERNAL'='TRUE');
 ALTER TABLE fact_daily ADD PARTITION (ds='1')
 LOCATION 'pfile:${system:test.tmp.dir}/fact_tz/ds=1';
 set hive.mapred.supports.subdirectories=true;
 set mapred.input.dir.recursive=true;
 set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
 SELECT * FROM fact_daily WHERE ds='1';
 SELECT count(1) FROM fact_daily WHERE ds='1';
 {code}
 The unit test doesn't seem to be concerned about list bucketing but wants to 
 set {{mapred.input.dir.recursive}} to {{true}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-4213) List bucketing error too restrictive

2013-04-03 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu resolved HIVE-4213.


Resolution: Not A Problem

 List bucketing error too restrictive
 

 Key: HIVE-4213
 URL: https://issues.apache.org/jira/browse/HIVE-4213
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mark Grover
Assignee: Gang Tim Liu
 Fix For: 0.11.0


 With the introduction of List bucketing, we introduced a config validation 
 step where we say:
 {code}
   SUPPORT_DIR_MUST_TRUE_FOR_LIST_BUCKETING(
   10199,
   hive.mapred.supports.subdirectories must be true
   +  if any one of following is true: 
 hive.internal.ddl.list.bucketing.enable,
   +  hive.optimize.listbucketing and mapred.input.dir.recursive),
 {code}
 This seems overly restrictive to because there are use cases where people may 
 want to use {{mapred.input.dir.recursive}} to {{true}} even when they don't 
 care about list bucketing.
 Is that not true?
 For example, here is the unit test code for {{clientpositive/recursive_dir.q}}
 {code}
 CREATE TABLE fact_daily(x int) PARTITIONED BY (ds STRING);
 CREATE TABLE fact_tz(x int) PARTITIONED BY (ds STRING, hr STRING)
 LOCATION 'pfile:${system:test.tmp.dir}/fact_tz';
 INSERT OVERWRITE TABLE fact_tz PARTITION (ds='1', hr='1')
 SELECT key+11 FROM src WHERE key=484;
 ALTER TABLE fact_daily SET TBLPROPERTIES('EXTERNAL'='TRUE');
 ALTER TABLE fact_daily ADD PARTITION (ds='1')
 LOCATION 'pfile:${system:test.tmp.dir}/fact_tz/ds=1';
 set hive.mapred.supports.subdirectories=true;
 set mapred.input.dir.recursive=true;
 set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
 SELECT * FROM fact_daily WHERE ds='1';
 SELECT count(1) FROM fact_daily WHERE ds='1';
 {code}
 The unit test doesn't seem to be concerned about list bucketing but wants to 
 set {{mapred.input.dir.recursive}} to {{true}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4213) List bucketing error too restrictive

2013-04-03 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621169#comment-13621169
 ] 

Gang Tim Liu commented on HIVE-4213:


Hi [~mgrover]

No problem.

Not sure it is valid if mapred.input.dir.recursive is true but 
hive.mapred.supports.subdirectories is false.

cc [~namitjain] would you please confirm?

thanks

 List bucketing error too restrictive
 

 Key: HIVE-4213
 URL: https://issues.apache.org/jira/browse/HIVE-4213
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mark Grover
Assignee: Gang Tim Liu
 Fix For: 0.11.0


 With the introduction of List bucketing, we introduced a config validation 
 step where we say:
 {code}
   SUPPORT_DIR_MUST_TRUE_FOR_LIST_BUCKETING(
   10199,
   hive.mapred.supports.subdirectories must be true
   +  if any one of following is true: 
 hive.internal.ddl.list.bucketing.enable,
   +  hive.optimize.listbucketing and mapred.input.dir.recursive),
 {code}
 This seems overly restrictive to because there are use cases where people may 
 want to use {{mapred.input.dir.recursive}} to {{true}} even when they don't 
 care about list bucketing.
 Is that not true?
 For example, here is the unit test code for {{clientpositive/recursive_dir.q}}
 {code}
 CREATE TABLE fact_daily(x int) PARTITIONED BY (ds STRING);
 CREATE TABLE fact_tz(x int) PARTITIONED BY (ds STRING, hr STRING)
 LOCATION 'pfile:${system:test.tmp.dir}/fact_tz';
 INSERT OVERWRITE TABLE fact_tz PARTITION (ds='1', hr='1')
 SELECT key+11 FROM src WHERE key=484;
 ALTER TABLE fact_daily SET TBLPROPERTIES('EXTERNAL'='TRUE');
 ALTER TABLE fact_daily ADD PARTITION (ds='1')
 LOCATION 'pfile:${system:test.tmp.dir}/fact_tz/ds=1';
 set hive.mapred.supports.subdirectories=true;
 set mapred.input.dir.recursive=true;
 set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
 SELECT * FROM fact_daily WHERE ds='1';
 SELECT count(1) FROM fact_daily WHERE ds='1';
 {code}
 The unit test doesn't seem to be concerned about list bucketing but wants to 
 set {{mapred.input.dir.recursive}} to {{true}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-04-03 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3959:
---

Attachment: HIVE-3959.patch.1

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3959.patch.1


 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-04-03 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3959:
---

Attachment: HIVE-3959.patch.2

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3959.patch.1, HIVE-3959.patch.2


 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-04-03 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3959:
---

Attachment: HIVE-3959.patch.2.nohcat

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3959.patch.1, HIVE-3959.patch.2, 
 HIVE-3959.patch.2.nohcat


 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4281) add hive.map.groupby.sorted.testmode

2013-04-03 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621408#comment-13621408
 ] 

Gang Tim Liu commented on HIVE-4281:


+1

 add hive.map.groupby.sorted.testmode
 

 Key: HIVE-4281
 URL: https://issues.apache.org/jira/browse/HIVE-4281
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4281.1.patch, hive.4281.2.patch, 
 hive.4281.2.patch-nohcat, hive.4281.3.patch


 The idea behind this would be to test hive.map.groupby.sorted.
 Since this is a new feature, it might be a good idea to run it in test mode,
 where a query property would denote that this query plan would have changed.
 If a customer wants, they can run those queries offline, compare the results
 for correctness, and set hive.map.groupby.sorted only if all the results are
 the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4272) partition wise metadata does not work for text files

2013-04-02 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619856#comment-13619856
 ] 

Gang Tim Liu commented on HIVE-4272:


+1

 partition wise metadata does not work for text files
 

 Key: HIVE-4272
 URL: https://issues.apache.org/jira/browse/HIVE-4272
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4272.1.patch, hive.4272.2.patch, 
 hive.4272.2.patch-nohcat


 The following test fails:
 set hive.input.format = org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
 -- This tests that the schema can be changed for binary serde data
 create table partition_test_partitioned(key string, value string)
 partitioned by (dt string) stored as textfile;
 insert overwrite table partition_test_partitioned partition(dt='1')
 select * from src where key = 238;
 select * from partition_test_partitioned where dt is not null;
 select key+key, value from partition_test_partitioned where dt is not null;
 alter table partition_test_partitioned change key key int;
 select key+key, value from partition_test_partitioned where dt is not null;
 select * from partition_test_partitioned where dt is not null;
 It works fine for a RCFile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-04-02 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620364#comment-13620364
 ] 

Gang Tim Liu commented on HIVE-3959:


rebase https://reviews.facebook.net/D9885

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor

 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4240) optimize hive.enforce.bucketing and hive.enforce sorting insert

2013-04-02 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620604#comment-13620604
 ] 

Gang Tim Liu commented on HIVE-4240:


+1

 optimize hive.enforce.bucketing and hive.enforce sorting insert
 ---

 Key: HIVE-4240
 URL: https://issues.apache.org/jira/browse/HIVE-4240
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4240.1.patch, hive.4240.2.patch, hive.4240.3.patch, 
 hive.4240.4.patch, hive.4240.5.patch


 Consider the following scenario:
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 set hive.input.format = 
 org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 set hive.enforce.bucketing=true;
 set hive.enforce.sorting=true;
 set hive.exec.reducers.max = 1;
 set hive.merge.mapfiles=false;
 set hive.merge.mapredfiles=false;
 -- Create two bucketed and sorted tables
 CREATE TABLE test_table1 (key INT, value STRING) PARTITIONED BY (ds STRING) 
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS;
 CREATE TABLE test_table2 (key INT, value STRING) PARTITIONED BY (ds STRING) 
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS;
 FROM src
 INSERT OVERWRITE TABLE test_table1 PARTITION (ds = '1') SELECT *;
 -- Insert data into the bucketed table by selecting from another bucketed 
 table
 -- This should be a map-only operation
 INSERT OVERWRITE TABLE test_table2 PARTITION (ds = '1')
 SELECT a.key, a.value FROM test_table1 a WHERE a.ds = '1';
 We should not need a reducer to perform the above operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4270) bug in hive.map.groupby.sorted in the presence of multiple input partitions

2013-04-01 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618918#comment-13618918
 ] 

Gang Tim Liu commented on HIVE-4270:


+1

 bug in hive.map.groupby.sorted in the presence of multiple input partitions
 ---

 Key: HIVE-4270
 URL: https://issues.apache.org/jira/browse/HIVE-4270
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.11.0

 Attachments: hive.4270.1.patch


 This can lead to wrong results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-03-28 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu reassigned HIVE-3959:
--

Assignee: Gang Tim Liu  (was: Bhushan Mandhani)

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor

 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4159) RetryingHMSHandler doesn't retry in enough cases

2013-03-28 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616502#comment-13616502
 ] 

Gang Tim Liu commented on HIVE-4159:


+1

 RetryingHMSHandler doesn't retry in enough cases
 

 Key: HIVE-4159
 URL: https://issues.apache.org/jira/browse/HIVE-4159
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4159.1.patch.txt


 HIVE-3524 introduced a change which caused JDOExceptions to be wrapped in 
 MetaExceptions.  This caused the RetryingHMSHandler to not retry on these 
 exceptions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4155) Expose ORC's FileDump as a service

2013-03-28 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616529#comment-13616529
 ] 

Gang Tim Liu commented on HIVE-4155:


+1

 Expose ORC's FileDump as a service
 --

 Key: HIVE-4155
 URL: https://issues.apache.org/jira/browse/HIVE-4155
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4155.1.patch.txt


 Expose ORC's FileDump class as a service similar to RC File Cat
 e.g.
 hive --orcfiledump path_to_file
 Should run FileDump on the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4157) ORC runs out of heap when writing

2013-03-28 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616557#comment-13616557
 ] 

Gang Tim Liu commented on HIVE-4157:


+1

 ORC runs out of heap when writing
 -

 Key: HIVE-4157
 URL: https://issues.apache.org/jira/browse/HIVE-4157
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4157.1.patch.txt


 The OutStream class used by the ORC file format seems to aggressively 
 allocate memory for ByteBuffers and doesn't seem too eager to give it back.
 This causes issues with heap space, particularly when a wide tables/dynamic 
 partitions are involved.
 As a first step to resolving this problem, the OutStream class can be 
 modified to lazily allocate memory, and more actively make it available for 
 garbage collection.
 Follow ups could include checking the amount of free memory as part of 
 determining if a spill is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4159) RetryingHMSHandler doesn't retry in enough cases

2013-03-28 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616821#comment-13616821
 ] 

Gang Tim Liu commented on HIVE-4159:


Committed. thanks Kevin.

 RetryingHMSHandler doesn't retry in enough cases
 

 Key: HIVE-4159
 URL: https://issues.apache.org/jira/browse/HIVE-4159
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4159.1.patch.txt


 HIVE-3524 introduced a change which caused JDOExceptions to be wrapped in 
 MetaExceptions.  This caused the RetryingHMSHandler to not retry on these 
 exceptions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4159) RetryingHMSHandler doesn't retry in enough cases

2013-03-28 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4159:
---

Fix Version/s: 0.11.0

 RetryingHMSHandler doesn't retry in enough cases
 

 Key: HIVE-4159
 URL: https://issues.apache.org/jira/browse/HIVE-4159
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.11.0

 Attachments: HIVE-4159.1.patch.txt


 HIVE-3524 introduced a change which caused JDOExceptions to be wrapped in 
 MetaExceptions.  This caused the RetryingHMSHandler to not retry on these 
 exceptions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4159) RetryingHMSHandler doesn't retry in enough cases

2013-03-28 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4159:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 RetryingHMSHandler doesn't retry in enough cases
 

 Key: HIVE-4159
 URL: https://issues.apache.org/jira/browse/HIVE-4159
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.11.0

 Attachments: HIVE-4159.1.patch.txt


 HIVE-3524 introduced a change which caused JDOExceptions to be wrapped in 
 MetaExceptions.  This caused the RetryingHMSHandler to not retry on these 
 exceptions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4155) Expose ORC's FileDump as a service

2013-03-28 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616828#comment-13616828
 ] 

Gang Tim Liu commented on HIVE-4155:


Committed. thanks Kevin

 Expose ORC's FileDump as a service
 --

 Key: HIVE-4155
 URL: https://issues.apache.org/jira/browse/HIVE-4155
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4155.1.patch.txt


 Expose ORC's FileDump class as a service similar to RC File Cat
 e.g.
 hive --orcfiledump path_to_file
 Should run FileDump on the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4155) Expose ORC's FileDump as a service

2013-03-28 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4155:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Expose ORC's FileDump as a service
 --

 Key: HIVE-4155
 URL: https://issues.apache.org/jira/browse/HIVE-4155
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.11.0

 Attachments: HIVE-4155.1.patch.txt


 Expose ORC's FileDump class as a service similar to RC File Cat
 e.g.
 hive --orcfiledump path_to_file
 Should run FileDump on the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4155) Expose ORC's FileDump as a service

2013-03-28 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4155:
---

Fix Version/s: 0.11.0

 Expose ORC's FileDump as a service
 --

 Key: HIVE-4155
 URL: https://issues.apache.org/jira/browse/HIVE-4155
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.11.0

 Attachments: HIVE-4155.1.patch.txt


 Expose ORC's FileDump class as a service similar to RC File Cat
 e.g.
 hive --orcfiledump path_to_file
 Should run FileDump on the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4157) ORC runs out of heap when writing

2013-03-28 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616898#comment-13616898
 ] 

Gang Tim Liu commented on HIVE-4157:


Committed. thanks Kevin

 ORC runs out of heap when writing
 -

 Key: HIVE-4157
 URL: https://issues.apache.org/jira/browse/HIVE-4157
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4157.1.patch.txt


 The OutStream class used by the ORC file format seems to aggressively 
 allocate memory for ByteBuffers and doesn't seem too eager to give it back.
 This causes issues with heap space, particularly when a wide tables/dynamic 
 partitions are involved.
 As a first step to resolving this problem, the OutStream class can be 
 modified to lazily allocate memory, and more actively make it available for 
 garbage collection.
 Follow ups could include checking the amount of free memory as part of 
 determining if a spill is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4157) ORC runs out of heap when writing

2013-03-28 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4157:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 ORC runs out of heap when writing
 -

 Key: HIVE-4157
 URL: https://issues.apache.org/jira/browse/HIVE-4157
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.11.0

 Attachments: HIVE-4157.1.patch.txt


 The OutStream class used by the ORC file format seems to aggressively 
 allocate memory for ByteBuffers and doesn't seem too eager to give it back.
 This causes issues with heap space, particularly when a wide tables/dynamic 
 partitions are involved.
 As a first step to resolving this problem, the OutStream class can be 
 modified to lazily allocate memory, and more actively make it available for 
 garbage collection.
 Follow ups could include checking the amount of free memory as part of 
 determining if a spill is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4157) ORC runs out of heap when writing

2013-03-28 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4157:
---

Fix Version/s: 0.11.0

 ORC runs out of heap when writing
 -

 Key: HIVE-4157
 URL: https://issues.apache.org/jira/browse/HIVE-4157
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.11.0

 Attachments: HIVE-4157.1.patch.txt


 The OutStream class used by the ORC file format seems to aggressively 
 allocate memory for ByteBuffers and doesn't seem too eager to give it back.
 This causes issues with heap space, particularly when a wide tables/dynamic 
 partitions are involved.
 As a first step to resolving this problem, the OutStream class can be 
 modified to lazily allocate memory, and more actively make it available for 
 garbage collection.
 Follow ups could include checking the amount of free memory as part of 
 determining if a spill is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4157) ORC runs out of heap when writing

2013-03-28 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616901#comment-13616901
 ] 

Gang Tim Liu commented on HIVE-4157:


Forgot to mention: tests passed. sorry

 ORC runs out of heap when writing
 -

 Key: HIVE-4157
 URL: https://issues.apache.org/jira/browse/HIVE-4157
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.11.0

 Attachments: HIVE-4157.1.patch.txt


 The OutStream class used by the ORC file format seems to aggressively 
 allocate memory for ByteBuffers and doesn't seem too eager to give it back.
 This causes issues with heap space, particularly when a wide tables/dynamic 
 partitions are involved.
 As a first step to resolving this problem, the OutStream class can be 
 modified to lazily allocate memory, and more actively make it available for 
 garbage collection.
 Follow ups could include checking the amount of free memory as part of 
 determining if a spill is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4159) RetryingHMSHandler doesn't retry in enough cases

2013-03-28 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616902#comment-13616902
 ] 

Gang Tim Liu commented on HIVE-4159:


Forgot to mention: tests passed. sorry

 RetryingHMSHandler doesn't retry in enough cases
 

 Key: HIVE-4159
 URL: https://issues.apache.org/jira/browse/HIVE-4159
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.11.0

 Attachments: HIVE-4159.1.patch.txt


 HIVE-3524 introduced a change which caused JDOExceptions to be wrapped in 
 MetaExceptions.  This caused the RetryingHMSHandler to not retry on these 
 exceptions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4155) Expose ORC's FileDump as a service

2013-03-28 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616903#comment-13616903
 ] 

Gang Tim Liu commented on HIVE-4155:


Forgot to mention: tests passed. sorry

 Expose ORC's FileDump as a service
 --

 Key: HIVE-4155
 URL: https://issues.apache.org/jira/browse/HIVE-4155
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.11.0

 Attachments: HIVE-4155.1.patch.txt


 Expose ORC's FileDump class as a service similar to RC File Cat
 e.g.
 hive --orcfiledump path_to_file
 Should run FileDump on the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4235) CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists

2013-03-28 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616964#comment-13616964
 ] 

Gang Tim Liu commented on HIVE-4235:


Kevin, thank you very much. Tim





 CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists
 

 Key: HIVE-4235
 URL: https://issues.apache.org/jira/browse/HIVE-4235
 Project: Hive
  Issue Type: Bug
  Components: JDBC, Query Processor, SQL
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Fix For: 0.11.0

 Attachments: HIVE-4235.patch.1


 CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists.
 It uses Hive.java's getTablesByPattern(...) to check if table exists. It 
 involves regular expression and eventually database join. Very efficient. It 
 can cause database lock time increase and hurt db performance if a lot of 
 such commands hit database.
 The suggested approach is to use getTable(...) since we know tablename already

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3958) support partial scan for analyze command - RCFile

2013-03-27 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13615329#comment-13615329
 ] 

Gang Tim Liu commented on HIVE-3958:


Namit thank you very much



Sent from my iPhone 




 support partial scan for analyze command - RCFile
 -

 Key: HIVE-3958
 URL: https://issues.apache.org/jira/browse/HIVE-3958
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Fix For: 0.11.0

 Attachments: HIVE-3958.patch.1, HIVE-3958.patch.2, HIVE-3958.patch.3, 
 HIVE-3958.patch.4, HIVE-3958.patch.5, HIVE-3958.patch.6


 analyze commands allows us to collect statistics on existing 
 tables/partitions. It works great but might be slow since it scans all files.
 There are 2 ways to speed it up:
 1. collect stats without file scan. It may not collect all stats but good and 
 fast enough for use case. HIVE-3917 addresses it
 2. collect stats via partial file scan. It doesn't scan all content of files 
 but part of it to get file metadata. some examples are 
 https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) 
 and HFile of Hbase
 This jira is targeted to address the #2. More specifically RCFile format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3958) support partial scan for analyze command - RCFile

2013-03-26 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3958:
---

Attachment: HIVE-3958.patch.5

 support partial scan for analyze command - RCFile
 -

 Key: HIVE-3958
 URL: https://issues.apache.org/jira/browse/HIVE-3958
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3958.patch.1, HIVE-3958.patch.2, HIVE-3958.patch.3, 
 HIVE-3958.patch.4, HIVE-3958.patch.5


 analyze commands allows us to collect statistics on existing 
 tables/partitions. It works great but might be slow since it scans all files.
 There are 2 ways to speed it up:
 1. collect stats without file scan. It may not collect all stats but good and 
 fast enough for use case. HIVE-3917 addresses it
 2. collect stats via partial file scan. It doesn't scan all content of files 
 but part of it to get file metadata. some examples are 
 https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) 
 and HFile of Hbase
 This jira is targeted to address the #2. More specifically RCFile format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work stopped] (HIVE-3958) support partial scan for analyze command - RCFile

2013-03-26 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3958 stopped by Gang Tim Liu.

 support partial scan for analyze command - RCFile
 -

 Key: HIVE-3958
 URL: https://issues.apache.org/jira/browse/HIVE-3958
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3958.patch.1, HIVE-3958.patch.2, HIVE-3958.patch.3, 
 HIVE-3958.patch.4, HIVE-3958.patch.5


 analyze commands allows us to collect statistics on existing 
 tables/partitions. It works great but might be slow since it scans all files.
 There are 2 ways to speed it up:
 1. collect stats without file scan. It may not collect all stats but good and 
 fast enough for use case. HIVE-3917 addresses it
 2. collect stats via partial file scan. It doesn't scan all content of files 
 but part of it to get file metadata. some examples are 
 https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) 
 and HFile of Hbase
 This jira is targeted to address the #2. More specifically RCFile format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3958) support partial scan for analyze command - RCFile

2013-03-26 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3958:
---

Status: Patch Available  (was: In Progress)

Another diff is ready. thanks

 support partial scan for analyze command - RCFile
 -

 Key: HIVE-3958
 URL: https://issues.apache.org/jira/browse/HIVE-3958
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3958.patch.1, HIVE-3958.patch.2, HIVE-3958.patch.3, 
 HIVE-3958.patch.4, HIVE-3958.patch.5


 analyze commands allows us to collect statistics on existing 
 tables/partitions. It works great but might be slow since it scans all files.
 There are 2 ways to speed it up:
 1. collect stats without file scan. It may not collect all stats but good and 
 fast enough for use case. HIVE-3917 addresses it
 2. collect stats via partial file scan. It doesn't scan all content of files 
 but part of it to get file metadata. some examples are 
 https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) 
 and HFile of Hbase
 This jira is targeted to address the #2. More specifically RCFile format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HIVE-3958) support partial scan for analyze command - RCFile

2013-03-26 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3958 started by Gang Tim Liu.

 support partial scan for analyze command - RCFile
 -

 Key: HIVE-3958
 URL: https://issues.apache.org/jira/browse/HIVE-3958
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3958.patch.1, HIVE-3958.patch.2, HIVE-3958.patch.3, 
 HIVE-3958.patch.4, HIVE-3958.patch.5


 analyze commands allows us to collect statistics on existing 
 tables/partitions. It works great but might be slow since it scans all files.
 There are 2 ways to speed it up:
 1. collect stats without file scan. It may not collect all stats but good and 
 fast enough for use case. HIVE-3917 addresses it
 2. collect stats via partial file scan. It doesn't scan all content of files 
 but part of it to get file metadata. some examples are 
 https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) 
 and HFile of Hbase
 This jira is targeted to address the #2. More specifically RCFile format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3958) support partial scan for analyze command - RCFile

2013-03-26 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3958:
---

Attachment: HIVE-3958.patch.6

 support partial scan for analyze command - RCFile
 -

 Key: HIVE-3958
 URL: https://issues.apache.org/jira/browse/HIVE-3958
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3958.patch.1, HIVE-3958.patch.2, HIVE-3958.patch.3, 
 HIVE-3958.patch.4, HIVE-3958.patch.5, HIVE-3958.patch.6


 analyze commands allows us to collect statistics on existing 
 tables/partitions. It works great but might be slow since it scans all files.
 There are 2 ways to speed it up:
 1. collect stats without file scan. It may not collect all stats but good and 
 fast enough for use case. HIVE-3917 addresses it
 2. collect stats via partial file scan. It doesn't scan all content of files 
 but part of it to get file metadata. some examples are 
 https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) 
 and HFile of Hbase
 This jira is targeted to address the #2. More specifically RCFile format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4235) CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists

2013-03-26 Thread Gang Tim Liu (JIRA)

Gang Tim Liu created HIVE-4235:
--

 Summary: CREATE TABLE IF NOT EXISTS uses inefficient way to check 
if table exists
 Key: HIVE-4235
 URL: https://issues.apache.org/jira/browse/HIVE-4235
 Project: Hive
  Issue Type: Bug
  Components: JDBC, Query Processor, SQL
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu


CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists.

It uses Hive.java's getTablesByPattern(...) to check if table exists. It 
involves regular expression and eventually database join. Very efficient. May 
cause database lock time increases and hurt db performance if a lot of such 
commands hit database.

The suggested approach is to use getTable(...) since we know tablename already

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4235) CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists

2013-03-26 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4235:
---

Description: 
CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists.

It uses Hive.java's getTablesByPattern(...) to check if table exists. It 
involves regular expression and eventually database join. Very efficient. It 
can cause database lock time increase and hurt db performance if a lot of such 
commands hit database.

The suggested approach is to use getTable(...) since we know tablename already

  was:
CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists.

It uses Hive.java's getTablesByPattern(...) to check if table exists. It 
involves regular expression and eventually database join. Very efficient. May 
cause database lock time increases and hurt db performance if a lot of such 
commands hit database.

The suggested approach is to use getTable(...) since we know tablename already


 CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists
 

 Key: HIVE-4235
 URL: https://issues.apache.org/jira/browse/HIVE-4235
 Project: Hive
  Issue Type: Bug
  Components: JDBC, Query Processor, SQL
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu

 CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists.
 It uses Hive.java's getTablesByPattern(...) to check if table exists. It 
 involves regular expression and eventually database join. Very efficient. It 
 can cause database lock time increase and hurt db performance if a lot of 
 such commands hit database.
 The suggested approach is to use getTable(...) since we know tablename already

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HIVE-4235) CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists

2013-03-26 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-4235 started by Gang Tim Liu.

 CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists
 

 Key: HIVE-4235
 URL: https://issues.apache.org/jira/browse/HIVE-4235
 Project: Hive
  Issue Type: Bug
  Components: JDBC, Query Processor, SQL
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu

 CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists.
 It uses Hive.java's getTablesByPattern(...) to check if table exists. It 
 involves regular expression and eventually database join. Very efficient. It 
 can cause database lock time increase and hurt db performance if a lot of 
 such commands hit database.
 The suggested approach is to use getTable(...) since we know tablename already

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4235) CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists

2013-03-26 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614649#comment-13614649
 ] 

Gang Tim Liu commented on HIVE-4235:


https://reviews.facebook.net/D9729

 CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists
 

 Key: HIVE-4235
 URL: https://issues.apache.org/jira/browse/HIVE-4235
 Project: Hive
  Issue Type: Bug
  Components: JDBC, Query Processor, SQL
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-4235.patch.1


 CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists.
 It uses Hive.java's getTablesByPattern(...) to check if table exists. It 
 involves regular expression and eventually database join. Very efficient. It 
 can cause database lock time increase and hurt db performance if a lot of 
 such commands hit database.
 The suggested approach is to use getTable(...) since we know tablename already

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4235) CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists

2013-03-26 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4235:
---

Attachment: HIVE-4235.patch.1

 CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists
 

 Key: HIVE-4235
 URL: https://issues.apache.org/jira/browse/HIVE-4235
 Project: Hive
  Issue Type: Bug
  Components: JDBC, Query Processor, SQL
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-4235.patch.1


 CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists.
 It uses Hive.java's getTablesByPattern(...) to check if table exists. It 
 involves regular expression and eventually database join. Very efficient. It 
 can cause database lock time increase and hurt db performance if a lot of 
 such commands hit database.
 The suggested approach is to use getTable(...) since we know tablename already

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4235) CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists

2013-03-26 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4235:
---

Status: Patch Available  (was: In Progress)

diff ready

 CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists
 

 Key: HIVE-4235
 URL: https://issues.apache.org/jira/browse/HIVE-4235
 Project: Hive
  Issue Type: Bug
  Components: JDBC, Query Processor, SQL
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-4235.patch.1


 CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists.
 It uses Hive.java's getTablesByPattern(...) to check if table exists. It 
 involves regular expression and eventually database join. Very efficient. It 
 can cause database lock time increase and hurt db performance if a lot of 
 such commands hit database.
 The suggested approach is to use getTable(...) since we know tablename already

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3958) support partial scan for analyze command - RCFile

2013-03-25 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612824#comment-13612824
 ] 

Gang Tim Liu commented on HIVE-3958:


new diff is ready. thanks

 support partial scan for analyze command - RCFile
 -

 Key: HIVE-3958
 URL: https://issues.apache.org/jira/browse/HIVE-3958
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3958.patch.1, HIVE-3958.patch.2, HIVE-3958.patch.3, 
 HIVE-3958.patch.4


 analyze commands allows us to collect statistics on existing 
 tables/partitions. It works great but might be slow since it scans all files.
 There are 2 ways to speed it up:
 1. collect stats without file scan. It may not collect all stats but good and 
 fast enough for use case. HIVE-3917 addresses it
 2. collect stats via partial file scan. It doesn't scan all content of files 
 but part of it to get file metadata. some examples are 
 https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) 
 and HFile of Hbase
 This jira is targeted to address the #2. More specifically RCFile format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4219) explain dependency does not capture the input table

2013-03-24 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4219:
---

Attachment: hive.4219.3.patch

 explain dependency does not capture the input table
 ---

 Key: HIVE-4219
 URL: https://issues.apache.org/jira/browse/HIVE-4219
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4219.1.patch, hive.4219.2.patch, hive.4219.3.patch


 hive explain dependency select * from srcpart where ds is not null;
 OK
 {input_partitions:[{partitionName:default@srcpart@ds=2008-04-08/hr=11},{partitionName:default@srcpart@ds=2008-04-08/hr=12},{partitionName:default@srcpart@ds=2008-04-09/hr=11},{partitionName:default@srcpart@ds=2008-04-09/hr=12}],input_tables:[]}
 input_tables should contain srcpart

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HIVE-3958) support partial scan for analyze command - RCFile

2013-03-22 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3958 started by Gang Tim Liu.

 support partial scan for analyze command - RCFile
 -

 Key: HIVE-3958
 URL: https://issues.apache.org/jira/browse/HIVE-3958
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3958.patch.1, HIVE-3958.patch.2, HIVE-3958.patch.3


 analyze commands allows us to collect statistics on existing 
 tables/partitions. It works great but might be slow since it scans all files.
 There are 2 ways to speed it up:
 1. collect stats without file scan. It may not collect all stats but good and 
 fast enough for use case. HIVE-3917 addresses it
 2. collect stats via partial file scan. It doesn't scan all content of files 
 but part of it to get file metadata. some examples are 
 https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) 
 and HFile of Hbase
 This jira is targeted to address the #2. More specifically RCFile format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3958) support partial scan for analyze command - RCFile

2013-03-22 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3958:
---

Attachment: HIVE-3958.patch.3

 support partial scan for analyze command - RCFile
 -

 Key: HIVE-3958
 URL: https://issues.apache.org/jira/browse/HIVE-3958
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3958.patch.1, HIVE-3958.patch.2, HIVE-3958.patch.3


 analyze commands allows us to collect statistics on existing 
 tables/partitions. It works great but might be slow since it scans all files.
 There are 2 ways to speed it up:
 1. collect stats without file scan. It may not collect all stats but good and 
 fast enough for use case. HIVE-3917 addresses it
 2. collect stats via partial file scan. It doesn't scan all content of files 
 but part of it to get file metadata. some examples are 
 https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) 
 and HFile of Hbase
 This jira is targeted to address the #2. More specifically RCFile format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3958) support partial scan for analyze command - RCFile

2013-03-22 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3958:
---

Status: Patch Available  (was: In Progress)

Another diff is ready for review.

 support partial scan for analyze command - RCFile
 -

 Key: HIVE-3958
 URL: https://issues.apache.org/jira/browse/HIVE-3958
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3958.patch.1, HIVE-3958.patch.2, HIVE-3958.patch.3


 analyze commands allows us to collect statistics on existing 
 tables/partitions. It works great but might be slow since it scans all files.
 There are 2 ways to speed it up:
 1. collect stats without file scan. It may not collect all stats but good and 
 fast enough for use case. HIVE-3917 addresses it
 2. collect stats via partial file scan. It doesn't scan all content of files 
 but part of it to get file metadata. some examples are 
 https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) 
 and HFile of Hbase
 This jira is targeted to address the #2. More specifically RCFile format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4219) explain dependency does not capture the input table

2013-03-22 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13611047#comment-13611047
 ] 

Gang Tim Liu commented on HIVE-4219:


+1

 explain dependency does not capture the input table
 ---

 Key: HIVE-4219
 URL: https://issues.apache.org/jira/browse/HIVE-4219
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4219.1.patch, hive.4219.2.patch


 hive explain dependency select * from srcpart where ds is not null;
 OK
 {input_partitions:[{partitionName:default@srcpart@ds=2008-04-08/hr=11},{partitionName:default@srcpart@ds=2008-04-08/hr=12},{partitionName:default@srcpart@ds=2008-04-09/hr=11},{partitionName:default@srcpart@ds=2008-04-09/hr=12}],input_tables:[]}
 input_tables should contain srcpart

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4206) Sort merge join does not work for outer joins for 7 inputs

2013-03-21 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609121#comment-13609121
 ] 

Gang Tim Liu commented on HIVE-4206:


+1

 Sort merge join does not work for outer joins for 7 inputs
 --

 Key: HIVE-4206
 URL: https://issues.apache.org/jira/browse/HIVE-4206
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4206.1.patch, hive.4206.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4213) List bucketing error too restrictive

2013-03-21 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609586#comment-13609586
 ] 

Gang Tim Liu commented on HIVE-4213:


[~mgrover]

I am a little confused. Please correct me. The current logic is not 
restrictive. 

For example, it is legal for the following case: 
set hive.mapred.supports.subdirectories=true;
set mapred.input.dir.recursive=true;
set hive.optimize.listbucketing=false;

 List bucketing error too restrictive
 

 Key: HIVE-4213
 URL: https://issues.apache.org/jira/browse/HIVE-4213
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mark Grover
 Fix For: 0.11.0


 With the introduction of List bucketing, we introduced a config validation 
 step where we say:
 {code}
   SUPPORT_DIR_MUST_TRUE_FOR_LIST_BUCKETING(
   10199,
   hive.mapred.supports.subdirectories must be true
   +  if any one of following is true: 
 hive.internal.ddl.list.bucketing.enable,
   +  hive.optimize.listbucketing and mapred.input.dir.recursive),
 {code}
 This seems overly restrictive to because there are use cases where people may 
 want to use {{mapred.input.dir.recursive}} to {{true}} even when they don't 
 care about list bucketing.
 Is that not true?
 For example, here is the unit test code for {{clientpositive/recursive_dir.q}}
 {code}
 CREATE TABLE fact_daily(x int) PARTITIONED BY (ds STRING);
 CREATE TABLE fact_tz(x int) PARTITIONED BY (ds STRING, hr STRING)
 LOCATION 'pfile:${system:test.tmp.dir}/fact_tz';
 INSERT OVERWRITE TABLE fact_tz PARTITION (ds='1', hr='1')
 SELECT key+11 FROM src WHERE key=484;
 ALTER TABLE fact_daily SET TBLPROPERTIES('EXTERNAL'='TRUE');
 ALTER TABLE fact_daily ADD PARTITION (ds='1')
 LOCATION 'pfile:${system:test.tmp.dir}/fact_tz/ds=1';
 set hive.mapred.supports.subdirectories=true;
 set mapred.input.dir.recursive=true;
 set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
 SELECT * FROM fact_daily WHERE ds='1';
 SELECT count(1) FROM fact_daily WHERE ds='1';
 {code}
 The unit test doesn't seem to be concerned about list bucketing but wants to 
 set {{mapred.input.dir.recursive}} to {{true}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4146) bug with hive.auto.convert.join.noconditionaltask with outer joins

2013-03-20 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607672#comment-13607672
 ] 

Gang Tim Liu commented on HIVE-4146:


+1

 bug with hive.auto.convert.join.noconditionaltask with outer joins
 --

 Key: HIVE-4146
 URL: https://issues.apache.org/jira/browse/HIVE-4146
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4146.1.patch, hive.4146.2.patch, hive.4146.3.patch, 
 hive.4146.4.patch, hive.4146.5.patch, hive.4146.6.patch


 Consider the following scenario:
 create table s1 as select * from src where key = 0;
 set hive.auto.convert.join.noconditionaltask=false;   
 
 SELECT * FROM s1 src1 LEFT OUTER JOIN s1 src2 ON (src1.key = src2.key AND 
 src2.key  10);
 gives correct results
 0 val_0   NULLNULL
 0 val_0   NULLNULL
 0 val_0   NULLNULL
 whereas it gives no results with hive.auto.convert.join.noconditionaltask set
 to true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4146) bug with hive.auto.convert.join.noconditionaltask with outer joins

2013-03-20 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607670#comment-13607670
 ] 

Gang Tim Liu commented on HIVE-4146:


comment is false positive.

 bug with hive.auto.convert.join.noconditionaltask with outer joins
 --

 Key: HIVE-4146
 URL: https://issues.apache.org/jira/browse/HIVE-4146
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4146.1.patch, hive.4146.2.patch, hive.4146.3.patch, 
 hive.4146.4.patch, hive.4146.5.patch, hive.4146.6.patch


 Consider the following scenario:
 create table s1 as select * from src where key = 0;
 set hive.auto.convert.join.noconditionaltask=false;   
 
 SELECT * FROM s1 src1 LEFT OUTER JOIN s1 src2 ON (src1.key = src2.key AND 
 src2.key  10);
 gives correct results
 0 val_0   NULLNULL
 0 val_0   NULLNULL
 0 val_0   NULLNULL
 whereas it gives no results with hive.auto.convert.join.noconditionaltask set
 to true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4146) bug with hive.auto.convert.join.noconditionaltask with outer joins

2013-03-19 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13607307#comment-13607307
 ] 

Gang Tim Liu commented on HIVE-4146:


A very small comment in D9327.

 bug with hive.auto.convert.join.noconditionaltask with outer joins
 --

 Key: HIVE-4146
 URL: https://issues.apache.org/jira/browse/HIVE-4146
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4146.1.patch, hive.4146.2.patch, hive.4146.3.patch, 
 hive.4146.4.patch, hive.4146.5.patch, hive.4146.6.patch


 Consider the following scenario:
 create table s1 as select * from src where key = 0;
 set hive.auto.convert.join.noconditionaltask=false;   
 
 SELECT * FROM s1 src1 LEFT OUTER JOIN s1 src2 ON (src1.key = src2.key AND 
 src2.key  10);
 gives correct results
 0 val_0   NULLNULL
 0 val_0   NULLNULL
 0 val_0   NULLNULL
 whereas it gives no results with hive.auto.convert.join.noconditionaltask set
 to true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3958) support partial scan for analyze command - RCFile

2013-03-18 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3958:
---

Attachment: HIVE-3958.patch.2

 support partial scan for analyze command - RCFile
 -

 Key: HIVE-3958
 URL: https://issues.apache.org/jira/browse/HIVE-3958
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3958.patch.1, HIVE-3958.patch.2


 analyze commands allows us to collect statistics on existing 
 tables/partitions. It works great but might be slow since it scans all files.
 There are 2 ways to speed it up:
 1. collect stats without file scan. It may not collect all stats but good and 
 fast enough for use case. HIVE-3917 addresses it
 2. collect stats via partial file scan. It doesn't scan all content of files 
 but part of it to get file metadata. some examples are 
 https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) 
 and HFile of Hbase
 This jira is targeted to address the #2. More specifically RCFile format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4145) Create hcatalog stub directory and add it to the build

2013-03-15 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603604#comment-13603604
 ] 

Gang Tim Liu commented on HIVE-4145:


+1

 Create hcatalog stub directory and add it to the build
 --

 Key: HIVE-4145
 URL: https://issues.apache.org/jira/browse/HIVE-4145
 Project: Hive
  Issue Type: Task
  Components: Build Infrastructure
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-4145.1.patch.txt


 Alan has requested that we create a directory for hcatalog and give the 
 HCatalog submodule committers karma on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3958) support partial scan for analyze command - RCFile

2013-03-15 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3958:
---

Attachment: HIVE-3958.patch.1

 support partial scan for analyze command - RCFile
 -

 Key: HIVE-3958
 URL: https://issues.apache.org/jira/browse/HIVE-3958
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3958.patch.1


 analyze commands allows us to collect statistics on existing 
 tables/partitions. It works great but might be slow since it scans all files.
 There are 2 ways to speed it up:
 1. collect stats without file scan. It may not collect all stats but good and 
 fast enough for use case. HIVE-3917 addresses it
 2. collect stats via partial file scan. It doesn't scan all content of files 
 but part of it to get file metadata. some examples are 
 https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) 
 and HFile of Hbase
 This jira is targeted to address the #2. More specifically RCFile format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3958) support partial scan for analyze command - RCFile

2013-03-15 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3958:
---

Status: Patch Available  (was: In Progress)

 support partial scan for analyze command - RCFile
 -

 Key: HIVE-3958
 URL: https://issues.apache.org/jira/browse/HIVE-3958
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3958.patch.1


 analyze commands allows us to collect statistics on existing 
 tables/partitions. It works great but might be slow since it scans all files.
 There are 2 ways to speed it up:
 1. collect stats without file scan. It may not collect all stats but good and 
 fast enough for use case. HIVE-3917 addresses it
 2. collect stats via partial file scan. It doesn't scan all content of files 
 but part of it to get file metadata. some examples are 
 https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) 
 and HFile of Hbase
 This jira is targeted to address the #2. More specifically RCFile format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-3958) support partial scan for analyze command - RCFile

2013-03-14 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3958:
---

Summary: support partial scan for analyze command - RCFile  (was: support 
partial scan for analyze command)

 support partial scan for analyze command - RCFile
 -

 Key: HIVE-3958
 URL: https://issues.apache.org/jira/browse/HIVE-3958
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu

 analyze commands allows us to collect statistics on existing 
 tables/partitions. It works great but might be slow since it scans all files.
 There are 2 ways to speed it up:
 1. collect stats without file scan. It may not collect all stats but good and 
 fast enough for use case. HIVE-3917 addresses it
 2. collect stats via partial file scan. It doesn't scan all content of files 
 but part of it to get file metadata. some examples are 
 https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) 
 and HFile of Hbase
 This jira is targeted to address the #2. More specifically RCFile format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4177) support partial scan for analyze command - ORC

2013-03-14 Thread Gang Tim Liu (JIRA)

Gang Tim Liu created HIVE-4177:
--

 Summary: support partial scan for analyze command - ORC
 Key: HIVE-4177
 URL: https://issues.apache.org/jira/browse/HIVE-4177
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu


This is follow up on hive 3958.

This jira will focus on ORC format

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3958) support partial scan for analyze command - RCFile

2013-03-14 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602827#comment-13602827
 ] 

Gang Tim Liu commented on HIVE-3958:


submit a follow up HIVE-4177 which focuses on ORC.

 support partial scan for analyze command - RCFile
 -

 Key: HIVE-3958
 URL: https://issues.apache.org/jira/browse/HIVE-3958
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu

 analyze commands allows us to collect statistics on existing 
 tables/partitions. It works great but might be slow since it scans all files.
 There are 2 ways to speed it up:
 1. collect stats without file scan. It may not collect all stats but good and 
 fast enough for use case. HIVE-3917 addresses it
 2. collect stats via partial file scan. It doesn't scan all content of files 
 but part of it to get file metadata. some examples are 
 https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) 
 and HFile of Hbase
 This jira is targeted to address the #2. More specifically RCFile format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3958) support partial scan for analyze command - RCFile

2013-03-14 Thread Gang Tim Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13602828#comment-13602828
 ] 

Gang Tim Liu commented on HIVE-3958:


Initial draft https://reviews.facebook.net/D9417

 support partial scan for analyze command - RCFile
 -

 Key: HIVE-3958
 URL: https://issues.apache.org/jira/browse/HIVE-3958
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu

 analyze commands allows us to collect statistics on existing 
 tables/partitions. It works great but might be slow since it scans all files.
 There are 2 ways to speed it up:
 1. collect stats without file scan. It may not collect all stats but good and 
 fast enough for use case. HIVE-3917 addresses it
 2. collect stats via partial file scan. It doesn't scan all content of files 
 but part of it to get file metadata. some examples are 
 https://cwiki.apache.org/Hive/rcfilecat.html for RCFile, ORC ( HIVE-3874 ) 
 and HFile of Hbase
 This jira is targeted to address the #2. More specifically RCFile format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4177) support partial scan for analyze command - ORC

2013-03-14 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4177:
---

Description: 
This is follow up on Hive-3958.

This jira will focus on ORC format  HIVE-3874

  was:
This is follow up on hive 3958.

This jira will focus on ORC format


 support partial scan for analyze command - ORC
 --

 Key: HIVE-4177
 URL: https://issues.apache.org/jira/browse/HIVE-4177
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu

 This is follow up on Hive-3958.
 This jira will focus on ORC format  HIVE-3874

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4177) support partial scan for analyze command - ORC

2013-03-14 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4177:
---

Description: 
This is follow up on HIVE-3958.

This jira will focus on ORC format  HIVE-3874

  was:
This is follow up on Hive-3958.

This jira will focus on ORC format  HIVE-3874


 support partial scan for analyze command - ORC
 --

 Key: HIVE-4177
 URL: https://issues.apache.org/jira/browse/HIVE-4177
 Project: Hive
  Issue Type: Improvement
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu

 This is follow up on HIVE-3958.
 This jira will focus on ORC format  HIVE-3874

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-4150) optimize queries like 'select count(1) from T where conditions on partition columns'

2013-03-12 Thread Gang Tim Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu reassigned HIVE-4150:
--

Assignee: Gang Tim Liu

 optimize queries like 'select count(1) from T where conditions on partition 
 columns'
 --

 Key: HIVE-4150
 URL: https://issues.apache.org/jira/browse/HIVE-4150
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Gang Tim Liu

 If accurate stats are available in the metastore, they should be used to
 optimize the above query.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 3 4 5 6 >

1 - 100 of 583 matches

Mail list logo