[jira] [Commented] (HIVE-4018) MapJoin failing with Distributed Cache error

2013-03-28 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616123#comment-13616123
 ] 

Namit Jain commented on HIVE-4018:
--

+1

Missed this -- running tests

 MapJoin failing with Distributed Cache error
 

 Key: HIVE-4018
 URL: https://issues.apache.org/jira/browse/HIVE-4018
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.11.0

 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, 
 HIVE-4018-test.patch


 When I'm a running a star join query after HIVE-3784, it is failing with 
 following error:
 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: 
 Load Distributed Cache Error
 2013-02-13 08:36:04,585 FATAL ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4018) MapJoin failing with Distributed Cache error

2013-03-28 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4018:
-

Status: Open  (was: Patch Available)

Can you refresh ?
phabricator diff is not applying cleanly.
Can you also load the latest patch ?

 MapJoin failing with Distributed Cache error
 

 Key: HIVE-4018
 URL: https://issues.apache.org/jira/browse/HIVE-4018
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.11.0

 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, 
 HIVE-4018-test.patch


 When I'm a running a star join query after HIVE-3784, it is failing with 
 following error:
 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: 
 Load Distributed Cache Error
 2013-02-13 08:36:04,585 FATAL ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2905) Desc table can't read Chinese (UTF-8 character code)

2013-03-28 Thread Xiaozhe Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaozhe Wang updated HIVE-2905:
---

Affects Version/s: 0.10.0

 Desc table can't read Chinese (UTF-8 character code)
 

 Key: HIVE-2905
 URL: https://issues.apache.org/jira/browse/HIVE-2905
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.7.0, 0.10.0
 Environment: hive 0.7.0, mysql 5.1.45
Reporter: Sheng Zhou

 When desc a table with command line or hive jdbc way, the table's comment 
 can't be read.
 1. I have updated javax.jdo.option.ConnectionURL parameter in hive-site.xml 
 file.
jdbc:mysql://*.*.*.*:3306/hive?characterEncoding=UTF-8
 2. In mysql database, the comment field of COLUMNS table can be read normally.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2905) Desc table can't read Chinese (UTF-8 character code)

2013-03-28 Thread Xiaozhe Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaozhe Wang updated HIVE-2905:
---

Environment: 
hive 0.7.0, mysql 5.1.45
hive 0.10.0, mysql 5.5.30

  was:hive 0.7.0, mysql 5.1.45


 Desc table can't read Chinese (UTF-8 character code)
 

 Key: HIVE-2905
 URL: https://issues.apache.org/jira/browse/HIVE-2905
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.7.0, 0.10.0
 Environment: hive 0.7.0, mysql 5.1.45
 hive 0.10.0, mysql 5.5.30
Reporter: Sheng Zhou

 When desc a table with command line or hive jdbc way, the table's comment 
 can't be read.
 1. I have updated javax.jdo.option.ConnectionURL parameter in hive-site.xml 
 file.
jdbc:mysql://*.*.*.*:3306/hive?characterEncoding=UTF-8
 2. In mysql database, the comment field of COLUMNS table can be read normally.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4221) Stripe-level merge for ORC files

2013-03-28 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4221:
--

Attachment: HIVE-4221.HIVE-4221.HIVE-4221.HIVE-4221.D9759.1.patch

sxyuan requested code review of HIVE-4221 [jira] Stripe-level merge for ORC 
files
.

Reviewers: kevinwilfong, omalley

As with RC files, we would like to be able to merge ORC files efficiently by 
reading/writing stripes without deserializing each row. Most of the logic is 
unchanged from merging for RC files, so the original code has been refactored 
for reuse.

TEST PLAN
  Copied and modified RC file merge tests to use ORC file format. Added a test 
case to TestOrcFile to make sure file level column stats are merged properly.

REVISION DETAIL
  https://reviews.facebook.net/D9759

AFFECTED FILES
  data/files/smbbucket_1.orc
  data/files/smbbucket_3.orc
  data/files/smbbucket_2.orc
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  ql/src/test/results/clientpositive/orc_createas1.q.out
  ql/src/test/results/clientpositive/orcfile_merge3.q.out
  ql/src/test/results/clientpositive/orcfile_merge2.q.out
  ql/src/test/results/clientpositive/alter_merge_orc2.q.out
  ql/src/test/results/clientpositive/alter_merge_orc.q.out
  ql/src/test/results/clientpositive/orcfile_merge1.q.out
  ql/src/test/results/clientpositive/orcfile_merge4.q.out
  ql/src/test/results/clientpositive/alter_merge_orc_stats.q.out
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
  ql/src/test/queries/clientpositive/orcfile_merge2.q
  ql/src/test/queries/clientpositive/orcfile_merge3.q
  ql/src/test/queries/clientpositive/alter_merge_orc.q
  ql/src/test/queries/clientpositive/orcfile_merge4.q
  ql/src/test/queries/clientpositive/alter_merge_orc_stats.q
  ql/src/test/queries/clientpositive/orcfile_merge1.q
  ql/src/test/queries/clientpositive/alter_merge_orc2.q
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/AlterTablePartMergeFilesDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
  ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java
  
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeOutputFormat.java
  ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java
  ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcBlockMergeRecordReader.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcBlockMergeInputFormat.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcMergeMapper.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StripeReader.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
  ql/src/java/org/apache/hadoop/hive/ql/io/merge
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeWork.java
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeMapper.java
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/BlockMergeOutputFormat.java
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/BlockMergeTask.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/23295/

To: kevinwilfong, omalley, sxyuan
Cc: JIRA


 Stripe-level merge for ORC files
 

 Key: HIVE-4221
 URL: https://issues.apache.org/jira/browse/HIVE-4221
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Samuel Yuan
Assignee: Samuel Yuan
 Attachments: HIVE-4221.HIVE-4221.HIVE-4221.HIVE-4221.D9759.1.patch


 As with RC files, we would like to be able to merge ORC files efficiently by 
 reading/writing stripes without decompressing/recompressing them. This will 
 be similar to the RC file merge, except that footers will have to be updated 
 with the stripe positions in the new file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4221) Stripe-level merge for ORC files

2013-03-28 Thread Samuel Yuan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samuel Yuan updated HIVE-4221:
--

Status: Patch Available  (was: Open)

 Stripe-level merge for ORC files
 

 Key: HIVE-4221
 URL: https://issues.apache.org/jira/browse/HIVE-4221
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Samuel Yuan
Assignee: Samuel Yuan
 Attachments: HIVE-4221.HIVE-4221.HIVE-4221.HIVE-4221.D9759.1.patch


 As with RC files, we would like to be able to merge ORC files efficiently by 
 reading/writing stripes without decompressing/recompressing them. This will 
 be similar to the RC file merge, except that footers will have to be updated 
 with the stripe positions in the new file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4018) MapJoin failing with Distributed Cache error

2013-03-28 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616156#comment-13616156
 ] 

Amareshwari Sriramadasu commented on HIVE-4018:
---

After Updating the patch to trunk, the test fails with NPE again. Will see 
whats the cause and update.

 MapJoin failing with Distributed Cache error
 

 Key: HIVE-4018
 URL: https://issues.apache.org/jira/browse/HIVE-4018
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.11.0

 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, 
 HIVE-4018-test.patch


 When I'm a running a star join query after HIVE-3784, it is failing with 
 following error:
 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: 
 Load Distributed Cache Error
 2013-02-13 08:36:04,585 FATAL ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4240) optimize hive.enforce.bucketing and hive.enforce sorting insert

2013-03-28 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616174#comment-13616174
 ] 

Namit Jain commented on HIVE-4240:
--

https://reviews.facebook.net/D9765

 optimize hive.enforce.bucketing and hive.enforce sorting insert
 ---

 Key: HIVE-4240
 URL: https://issues.apache.org/jira/browse/HIVE-4240
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 Consider the following scenario:
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 set hive.input.format = 
 org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 set hive.enforce.bucketing=true;
 set hive.enforce.sorting=true;
 set hive.exec.reducers.max = 1;
 set hive.merge.mapfiles=false;
 set hive.merge.mapredfiles=false;
 -- Create two bucketed and sorted tables
 CREATE TABLE test_table1 (key INT, value STRING) PARTITIONED BY (ds STRING) 
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS;
 CREATE TABLE test_table2 (key INT, value STRING) PARTITIONED BY (ds STRING) 
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS;
 FROM src
 INSERT OVERWRITE TABLE test_table1 PARTITION (ds = '1') SELECT *;
 -- Insert data into the bucketed table by selecting from another bucketed 
 table
 -- This should be a map-only operation
 INSERT OVERWRITE TABLE test_table2 PARTITION (ds = '1')
 SELECT a.key, a.value FROM test_table1 a WHERE a.ds = '1';
 We should not need a reducer to perform the above operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4240) optimize hive.enforce.bucketing and hive.enforce sorting insert

2013-03-28 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4240:
-

Attachment: hive.4240.1.patch

 optimize hive.enforce.bucketing and hive.enforce sorting insert
 ---

 Key: HIVE-4240
 URL: https://issues.apache.org/jira/browse/HIVE-4240
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4240.1.patch


 Consider the following scenario:
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 set hive.input.format = 
 org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 set hive.enforce.bucketing=true;
 set hive.enforce.sorting=true;
 set hive.exec.reducers.max = 1;
 set hive.merge.mapfiles=false;
 set hive.merge.mapredfiles=false;
 -- Create two bucketed and sorted tables
 CREATE TABLE test_table1 (key INT, value STRING) PARTITIONED BY (ds STRING) 
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS;
 CREATE TABLE test_table2 (key INT, value STRING) PARTITIONED BY (ds STRING) 
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS;
 FROM src
 INSERT OVERWRITE TABLE test_table1 PARTITION (ds = '1') SELECT *;
 -- Insert data into the bucketed table by selecting from another bucketed 
 table
 -- This should be a map-only operation
 INSERT OVERWRITE TABLE test_table2 PARTITION (ds = '1')
 SELECT a.key, a.value FROM test_table1 a WHERE a.ds = '1';
 We should not need a reducer to perform the above operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2905) Desc table can't read Chinese (UTF-8 character code)

2013-03-28 Thread Xiaozhe Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaozhe Wang updated HIVE-2905:
---

Labels: patch  (was: )
Status: Patch Available  (was: Open)

The problem is 
org.apache.hadoop.hive.ql.metadata.formatting.TextMetaDataFormatter.describeTable()
 use DataOutputStream.writeBytes() to output column info string. Unfortunately, 
DataOutputStream.writeBytes() will only write out lower byte of each character 
in the String, which cause garbling problem when column comment contains 
non-latin1 characters.

This simple patch solved Unicode character garbling problem when describe table 
in Hive client.

 Desc table can't read Chinese (UTF-8 character code)
 

 Key: HIVE-2905
 URL: https://issues.apache.org/jira/browse/HIVE-2905
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.10.0, 0.7.0
 Environment: hive 0.7.0, mysql 5.1.45
 hive 0.10.0, mysql 5.5.30
Reporter: Sheng Zhou
  Labels: patch

 When desc a table with command line or hive jdbc way, the table's comment 
 can't be read.
 1. I have updated javax.jdo.option.ConnectionURL parameter in hive-site.xml 
 file.
jdbc:mysql://*.*.*.*:3306/hive?characterEncoding=UTF-8
 2. In mysql database, the comment field of COLUMNS table can be read normally.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2905) Desc table can't read Chinese (UTF-8 character code)

2013-03-28 Thread Xiaozhe Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaozhe Wang updated HIVE-2905:
---

Attachment: utf8-desc-comment.patch

Simple patch to resolve the garbling problem of column comment which contains 
unicode characters.

 Desc table can't read Chinese (UTF-8 character code)
 

 Key: HIVE-2905
 URL: https://issues.apache.org/jira/browse/HIVE-2905
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.7.0, 0.10.0
 Environment: hive 0.7.0, mysql 5.1.45
 hive 0.10.0, mysql 5.5.30
Reporter: Sheng Zhou
  Labels: patch
 Attachments: utf8-desc-comment.patch


 When desc a table with command line or hive jdbc way, the table's comment 
 can't be read.
 1. I have updated javax.jdo.option.ConnectionURL parameter in hive-site.xml 
 file.
jdbc:mysql://*.*.*.*:3306/hive?characterEncoding=UTF-8
 2. In mysql database, the comment field of COLUMNS table can be read normally.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4242) Predicate push down should also be provided to InputFormats

2013-03-28 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-4242:
---

 Summary: Predicate push down should also be provided to 
InputFormats
 Key: HIVE-4242
 URL: https://issues.apache.org/jira/browse/HIVE-4242
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently, the push down predicate is only provided to native tables if the 
hive.optimize.index.filter configuration variable is set. There is no reason to 
prevent InputFormats from getting the required information to do predicate push 
down.

Obviously, this will be very useful for ORC.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4227) Add column level encryption to ORC files

2013-03-28 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616329#comment-13616329
 ] 

Owen O'Malley commented on HIVE-4227:
-

Supun,
  I've tagged this for Google Summer of Code. Take a look at:
http://www.google-melange.com/gsoc/homepage/google/gsoc2013

 Add column level encryption to ORC files
 

 Key: HIVE-4227
 URL: https://issues.apache.org/jira/browse/HIVE-4227
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
  Labels: gsoc, gsoc2013

 It would be useful to support column level encryption in ORC files. Since 
 each column and its associated index is stored separately, encrypting a 
 column separately isn't difficult. In terms of key distribution, it would 
 make sense to use an external server like the one in HADOOP-9331.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4243) Fix column names in FileSinkOperator

2013-03-28 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-4243:
---

 Summary: Fix column names in FileSinkOperator
 Key: HIVE-4243
 URL: https://issues.apache.org/jira/browse/HIVE-4243
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley


All of the ObjectInspectors given to SerDe's by FileSinkOperator have virtual 
column names. Since the files are part of tables, Hive knows the column names. 
For self-describing file formats like ORC, having the real column names will 
improve the understandability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-2162) Upgrade dependencies to Hadoop 0.20.2 and 0.20.203.0

2013-03-28 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-2162.
-

Resolution: Duplicate

This has been fixed already.

 Upgrade dependencies to Hadoop 0.20.2 and 0.20.203.0
 

 Key: HIVE-2162
 URL: https://issues.apache.org/jira/browse/HIVE-2162
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley

 Hadoop has released 0.20.203.0 and we should upgrade Hive's dependency to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4244) Make string dictionaries adaptive in ORC

2013-03-28 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-4244:
---

 Summary: Make string dictionaries adaptive in ORC
 Key: HIVE-4244
 URL: https://issues.apache.org/jira/browse/HIVE-4244
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley


The ORC writer should adaptively switch between dictionary and direct encoding. 
I'd propose looking at the first 100,000 values in each column and decide 
whether there is sufficient loading in the dictionary to use dictionary 
encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4245) Implement numeric dictionaries in ORC

2013-03-28 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-4245:
---

 Summary: Implement numeric dictionaries in ORC
 Key: HIVE-4245
 URL: https://issues.apache.org/jira/browse/HIVE-4245
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley


For many applications, especially in de-normalized data, there is a lot of 
redundancy in the numeric columns. Therefore, it would make sense to adaptively 
use dictionary encodings for numeric columns in addition to string columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4121) ORC should have optional dictionaries for both strings and numeric types

2013-03-28 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-4121.
-

Resolution: Duplicate

I forgot I had filed this and filed the split apart on as HIVE-4244 and 
HIVE-4245.

 ORC should have optional dictionaries for both strings and numeric types
 

 Key: HIVE-4121
 URL: https://issues.apache.org/jira/browse/HIVE-4121
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 Currently string columns always have dictionaries and numerics are always 
 directly encoded. It would be better to make the encoding depend on a sample 
 of the data. Perhaps the first 100k values should be evaluated for repeated 
 values and the encoding picked for the stripe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4227) Add column level encryption to ORC files

2013-03-28 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616393#comment-13616393
 ] 

Andrew Purtell commented on HIVE-4227:
--

So do you envision this as using the facilities provided by HADOOP-9331?

 Add column level encryption to ORC files
 

 Key: HIVE-4227
 URL: https://issues.apache.org/jira/browse/HIVE-4227
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
  Labels: gsoc, gsoc2013

 It would be useful to support column level encryption in ORC files. Since 
 each column and its associated index is stored separately, encrypting a 
 column separately isn't difficult. In terms of key distribution, it would 
 make sense to use an external server like the one in HADOOP-9331.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-03-28 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu reassigned HIVE-3959:
--

Assignee: Gang Tim Liu  (was: Bhushan Mandhani)

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor

 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4246) Implement predicate pushdown for ORC

2013-03-28 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-4246:
---

 Summary: Implement predicate pushdown for ORC
 Key: HIVE-4246
 URL: https://issues.apache.org/jira/browse/HIVE-4246
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley


By using the push down predicates from the table scan operator, ORC can skip 
over 10,000 rows at a time that won't satisfy the predicate. This will help a 
lot, especially if the file is sorted by the column that is used in the 
predicate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4159) RetryingHMSHandler doesn't retry in enough cases

2013-03-28 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616502#comment-13616502
 ] 

Gang Tim Liu commented on HIVE-4159:


+1

 RetryingHMSHandler doesn't retry in enough cases
 

 Key: HIVE-4159
 URL: https://issues.apache.org/jira/browse/HIVE-4159
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4159.1.patch.txt


 HIVE-3524 introduced a change which caused JDOExceptions to be wrapped in 
 MetaExceptions.  This caused the RetryingHMSHandler to not retry on these 
 exceptions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4155) Expose ORC's FileDump as a service

2013-03-28 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616529#comment-13616529
 ] 

Gang Tim Liu commented on HIVE-4155:


+1

 Expose ORC's FileDump as a service
 --

 Key: HIVE-4155
 URL: https://issues.apache.org/jira/browse/HIVE-4155
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4155.1.patch.txt


 Expose ORC's FileDump class as a service similar to RC File Cat
 e.g.
 hive --orcfiledump path_to_file
 Should run FileDump on the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-4244) Make string dictionaries adaptive in ORC

2013-03-28 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong reassigned HIVE-4244:
---

Assignee: Kevin Wilfong  (was: Owen O'Malley)

 Make string dictionaries adaptive in ORC
 

 Key: HIVE-4244
 URL: https://issues.apache.org/jira/browse/HIVE-4244
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Kevin Wilfong

 The ORC writer should adaptively switch between dictionary and direct 
 encoding. I'd propose looking at the first 100,000 values in each column and 
 decide whether there is sufficient loading in the dictionary to use 
 dictionary encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-4245) Implement numeric dictionaries in ORC

2013-03-28 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata reassigned HIVE-4245:
---

Assignee: Pamela Vagata  (was: Owen O'Malley)

 Implement numeric dictionaries in ORC
 -

 Key: HIVE-4245
 URL: https://issues.apache.org/jira/browse/HIVE-4245
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Pamela Vagata

 For many applications, especially in de-normalized data, there is a lot of 
 redundancy in the numeric columns. Therefore, it would make sense to 
 adaptively use dictionary encodings for numeric columns in addition to string 
 columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4244) Make string dictionaries adaptive in ORC

2013-03-28 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616547#comment-13616547
 ] 

Kevin Wilfong commented on HIVE-4244:
-

Some initial thoughts based on some experiments.

Dicitonary encoding seems to be less effective than just Zlib at compressing 
values if the number of distinct values is  ~80% of the total number of 
values.  This number can be configurable.  It's still smaller in memory, so we 
may be able to get away with on writing the stripe, writing out the data 
directly there.  This should be comparable in performance to converting the 
dictionary index that is already done.

Also, if the uncompressed (but encoded) size of the dictionary + index (data 
stream) is greater than the size of the uncompressed size of the original data, 
the compressed data tends to be larger as well despite the sorting.  This will 
be more expensive to figure out as we don't know the size of the index until it 
has been run length encoded.

 Make string dictionaries adaptive in ORC
 

 Key: HIVE-4244
 URL: https://issues.apache.org/jira/browse/HIVE-4244
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Kevin Wilfong

 The ORC writer should adaptively switch between dictionary and direct 
 encoding. I'd propose looking at the first 100,000 values in each column and 
 decide whether there is sufficient loading in the dictionary to use 
 dictionary encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4157) ORC runs out of heap when writing

2013-03-28 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616557#comment-13616557
 ] 

Gang Tim Liu commented on HIVE-4157:


+1

 ORC runs out of heap when writing
 -

 Key: HIVE-4157
 URL: https://issues.apache.org/jira/browse/HIVE-4157
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4157.1.patch.txt


 The OutStream class used by the ORC file format seems to aggressively 
 allocate memory for ByteBuffers and doesn't seem too eager to give it back.
 This causes issues with heap space, particularly when a wide tables/dynamic 
 partitions are involved.
 As a first step to resolving this problem, the OutStream class can be 
 modified to lazily allocate memory, and more actively make it available for 
 garbage collection.
 Follow ups could include checking the amount of free memory as part of 
 determining if a spill is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


do we know release date for hive-0.11 eom

2013-03-28 Thread ur lops



[jira] [Commented] (HIVE-3464) Merging join tree may reorder joins which could be invalid

2013-03-28 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616598#comment-13616598
 ] 

Phabricator commented on HIVE-3464:
---

vikram has commented on the revision HIVE-3464 [jira] Merging join tree may 
reorder joins which could be invalid.

  Comments.

INLINE COMMENTS
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinResolver.java:332
 I am not sure these changes are relevant to this jira. There are already other 
jiras - HIVE-3996 and HIVE-4071 raised for issues in this section of code and 
currently blocked on HIVE-3891 which moves these changes into a different class.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinResolver.java:357
 Same comment as above.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinResolver.java:369
 Same as above.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinResolver.java:381
 Same as above.

REVISION DETAIL
  https://reviews.facebook.net/D5409

To: JIRA, navis
Cc: njain, vikram


 Merging join tree may reorder joins which could be invalid
 --

 Key: HIVE-3464
 URL: https://issues.apache.org/jira/browse/HIVE-3464
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Navis
Assignee: Navis
 Attachments: HIVE-3464.D5409.2.patch, HIVE-3464.D5409.3.patch, 
 HIVE-3464.D5409.4.patch, HIVE-3464.D5409.5.patch


 Currently, hive merges join tree from right to left regardless of join types, 
 which may introduce join reordering. For example,
 select * from a join a b on a.key=b.key join a c on b.key=c.key join a d on 
 a.key=d.key; 
 Hive tries to merge join tree in a-d=b-d, a-d=a-b, b-c=a-b order and a-d=a-b 
 and b-c=a-b will be merged. Final join tree is a-(bdc).
 With this, ab-d join will be executed prior to ab-c. But if join type of -c 
 and -d is different, this is not valid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4247) Filtering on a hbase row key duplicates results across multiple mappers

2013-03-28 Thread Karthik Kumara (JIRA)
Karthik Kumara created HIVE-4247:


 Summary: Filtering on a hbase row key duplicates results across 
multiple mappers
 Key: HIVE-4247
 URL: https://issues.apache.org/jira/browse/HIVE-4247
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.9.0
 Environment: All Platforms
Reporter: Karthik Kumara


Steps to reproduce
1. Create a Hive external table with HiveHbaseHandler with enough data in the 
hbase table to spawn multiple mappers for the hive query.
2. Write a query which has a filter (in the where clause) based on the hbase 
row key. 
3. Running the map reduce job leads to each mapper querying the entire data 
set.  duplicating the data for each mapper. Each mapper processes the entire 
filtered range and the results get multiplied as the number of mappers run.

Expected behavior:
Each mapper should process a different part of the data and should not 
duplicate.


Cause:
The cause seems to be the convertFilter method in HiveHBaseTableInputFormat. 
convertFilter has this piece of code which rewrites the start and the stop row 
for each split which leads each mapper to process the entire range

 if (tableSplit != null) {
  tableSplit = new TableSplit(
tableSplit.getTableName(),
startRow,
stopRow,
tableSplit.getRegionLocation());
}

The scan already has the start and stop row set when the splits are created. So 
this piece of code is probably redundant.
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4245) Implement numeric dictionaries in ORC

2013-03-28 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616613#comment-13616613
 ] 

Owen O'Malley commented on HIVE-4245:
-

If you look at the original ORC github, you can see a float and double redblack 
tree that I pulled out in getting it ready for the initial push into Apache. 

https://github.com/hortonworks/orc/tree/9cdb2e88d377c801655fbb9015938ea3a93e12ca/src/main/java/org/apache/hadoop/hive/ql/io/orc

 Implement numeric dictionaries in ORC
 -

 Key: HIVE-4245
 URL: https://issues.apache.org/jira/browse/HIVE-4245
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Pamela Vagata

 For many applications, especially in de-normalized data, there is a lot of 
 redundancy in the numeric columns. Therefore, it would make sense to 
 adaptively use dictionary encodings for numeric columns in addition to string 
 columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4247) Filtering on a hbase row key duplicates results across multiple mappers

2013-03-28 Thread Karthik Kumara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kumara updated HIVE-4247:
-

Attachment: HiveHBaseTableInputFormat.patch

Suggested patch

 Filtering on a hbase row key duplicates results across multiple mappers
 ---

 Key: HIVE-4247
 URL: https://issues.apache.org/jira/browse/HIVE-4247
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.9.0
 Environment: All Platforms
Reporter: Karthik Kumara
  Labels: patch
 Attachments: HiveHBaseTableInputFormat.patch


 Steps to reproduce
 1. Create a Hive external table with HiveHbaseHandler with enough data in the 
 hbase table to spawn multiple mappers for the hive query.
 2. Write a query which has a filter (in the where clause) based on the hbase 
 row key. 
 3. Running the map reduce job leads to each mapper querying the entire data 
 set.  duplicating the data for each mapper. Each mapper processes the entire 
 filtered range and the results get multiplied as the number of mappers run.
 Expected behavior:
 Each mapper should process a different part of the data and should not 
 duplicate.
 Cause:
 The cause seems to be the convertFilter method in HiveHBaseTableInputFormat. 
 convertFilter has this piece of code which rewrites the start and the stop 
 row for each split which leads each mapper to process the entire range
  if (tableSplit != null) {
   tableSplit = new TableSplit(
 tableSplit.getTableName(),
 startRow,
 stopRow,
 tableSplit.getRegionLocation());
 }
 The scan already has the start and stop row set when the splits are created. 
 So this piece of code is probably redundant.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4244) Make string dictionaries adaptive in ORC

2013-03-28 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616657#comment-13616657
 ] 

Owen O'Malley commented on HIVE-4244:
-

We should play with different values, but I was guessing the right cutover 
point for the heuristic was at a loading of 2 to 3 (50% to 33% distinct values).

We aren't really going to know whether the heuristic is right or wrong unless 
we compare both encodings, which is much too expensive. By taking a good guess 
after looking at the start of the stripe, we can get good performance most of 
the time.

 Make string dictionaries adaptive in ORC
 

 Key: HIVE-4244
 URL: https://issues.apache.org/jira/browse/HIVE-4244
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Kevin Wilfong

 The ORC writer should adaptively switch between dictionary and direct 
 encoding. I'd propose looking at the first 100,000 values in each column and 
 decide whether there is sufficient loading in the dictionary to use 
 dictionary encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4248) Implement a memory manager for ORC

2013-03-28 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-4248:
---

 Summary: Implement a memory manager for ORC
 Key: HIVE-4248
 URL: https://issues.apache.org/jira/browse/HIVE-4248
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley


With the large default stripe size (256MB) and dynamic partitions, it is quite 
easy for users to run out of memory when writing ORC files. We probably need a 
solution that keeps track of the total number of concurrent ORC writers and 
divides the available heap space between them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4248) Implement a memory manager for ORC

2013-03-28 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616691#comment-13616691
 ] 

Owen O'Malley commented on HIVE-4248:
-

This may result in ORC files with smaller stripes, but that seems far better 
than letting the users get out of memory exceptions.

 Implement a memory manager for ORC
 --

 Key: HIVE-4248
 URL: https://issues.apache.org/jira/browse/HIVE-4248
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 With the large default stripe size (256MB) and dynamic partitions, it is 
 quite easy for users to run out of memory when writing ORC files. We probably 
 need a solution that keeps track of the total number of concurrent ORC 
 writers and divides the available heap space between them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4197) Bring windowing support inline with SQL Standard

2013-03-28 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4197.


Resolution: Fixed
  Assignee: Harish Butani

Committed to branch. Thanks, Harish!

 Bring windowing support inline with SQL Standard
 

 Key: HIVE-4197
 URL: https://issues.apache.org/jira/browse/HIVE-4197
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: WindowingSpecification.pdf


 The current behavior defers from the Standard in several significant places.
 Please review attached doc; there are still a few open issues. Once we agree 
 on the behavior, can proceed with fixing the implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4190) OVER clauses with ORDER BY not getting windowing set properly

2013-03-28 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4190.


Resolution: Fixed

This patch is subsumed in HIVE-4197 which is now fixed.

 OVER clauses with ORDER BY not getting windowing set properly
 -

 Key: HIVE-4190
 URL: https://issues.apache.org/jira/browse/HIVE-4190
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Affects Versions: 0.11.0
Reporter: Alan Gates

 Given a query like:
 select s, avg(f) over (partition by si order by d) from over100k;
 Hive is not setting the window frame properly.  The order by creates an 
 implicit window frame of 'unbounded preceding' but Hive is treating the above 
 query as if it has no window.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4199) ORC writer doesn't handle non-UTF8 encoded Text properly

2013-03-28 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4199:
--

Attachment: HIVE-4199.HIVE-4199.HIVE-4199.D9501.4.patch

sxyuan updated the revision HIVE-4199 [jira] ORC writer doesn't handle 
non-UTF8 encoded Text properly.

  Updated test case to clarify the expected behaviour.

Reviewers: kevinwilfong

REVISION DETAIL
  https://reviews.facebook.net/D9501

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D9501?vs=30009id=30675#toc

AFFECTED FILES
  data/files/nonutf8.txt
  ql/src/test/results/clientpositive/orc_nonutf8.q.out
  ql/src/test/queries/clientpositive/orc_nonutf8.q
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java

To: kevinwilfong, sxyuan
Cc: JIRA


 ORC writer doesn't handle non-UTF8 encoded Text properly
 

 Key: HIVE-4199
 URL: https://issues.apache.org/jira/browse/HIVE-4199
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Samuel Yuan
Assignee: Samuel Yuan
Priority: Minor
 Attachments: HIVE-4199.HIVE-4199.HIVE-4199.D9501.1.patch, 
 HIVE-4199.HIVE-4199.HIVE-4199.D9501.2.patch, 
 HIVE-4199.HIVE-4199.HIVE-4199.D9501.3.patch, 
 HIVE-4199.HIVE-4199.HIVE-4199.D9501.4.patch


 StringTreeWriter currently converts fields stored as Text objects into 
 Strings. This can lose information (see 
 http://en.wikipedia.org/wiki/Replacement_character#Replacement_character), 
 and is also unnecessary since the dictionary stores Text objects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4249) current database is retained between sessions in hive server2

2013-03-28 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-4249:
---

 Summary: current database is retained between sessions in hive 
server2
 Key: HIVE-4249
 URL: https://issues.apache.org/jira/browse/HIVE-4249
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.11.0


current database is retained between sessions in hive server2.

To reproduce -

Run this serveral times -
bin/beeline  -e '!connect jdbc:hive2://localhost:1 scott tiger 
org.apache.hive.jdbc.HiveDriver' -e 'show tables;' -e ' use newdb;' -e ' show 
tables;'

table ab is a table in default database, newtab is a table in newdb database.
Expected result is 
{code}
+---+
| tab_name  |
+---+
| ab|
+---+
1 row selected (0.457 seconds)
No rows affected (0.039 seconds)
+---+
| tab_name  |
+---+
| newtab|
+---+
{code}

But after running it several, times you see threads having newdb as default 
database, ie the output of above command becomes -

{code}
+---+
| tab_name  |
+---+
| newtab|
+---+
1 row selected (0.518 seconds)
No rows affected (0.052 seconds)
+---+
| tab_name  |
+---+
| newtab|
+---+
1 row selected (0.232 seconds)

{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4250) Closing lots of RecordWriters is slow

2013-03-28 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-4250:
---

 Summary: Closing lots of RecordWriters is slow
 Key: HIVE-4250
 URL: https://issues.apache.org/jira/browse/HIVE-4250
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley


In FileSinkOperator, all of the RecordWriters are closed sequentially. For 
queries with a lot of dynamic partitions this can add substantially to the task 
time. For one query in particular, after processing all of the records in a few 
minutes the reduces spend 15 minutes closing all of the RC files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4249) current database is retained between sessions in hive server2

2013-03-28 Thread Prasad Mujumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616754#comment-13616754
 ] 

Prasad Mujumdar commented on HIVE-4249:
---

Looks like duplicate of 
[HIVE-4171|https://issues.apache.org/jira/browse/HIVE-4171]

 current database is retained between sessions in hive server2
 ---

 Key: HIVE-4249
 URL: https://issues.apache.org/jira/browse/HIVE-4249
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.11.0


 current database is retained between sessions in hive server2.
 To reproduce -
 Run this serveral times -
 bin/beeline  -e '!connect jdbc:hive2://localhost:1 scott tiger 
 org.apache.hive.jdbc.HiveDriver' -e 'show tables;' -e ' use newdb;' -e ' show 
 tables;'
 table ab is a table in default database, newtab is a table in newdb database.
 Expected result is 
 {code}
 +---+
 | tab_name  |
 +---+
 | ab|
 +---+
 1 row selected (0.457 seconds)
 No rows affected (0.039 seconds)
 +---+
 | tab_name  |
 +---+
 | newtab|
 +---+
 {code}
 But after running it several, times you see threads having newdb as default 
 database, ie the output of above command becomes -
 {code}
 +---+
 | tab_name  |
 +---+
 | newtab|
 +---+
 1 row selected (0.518 seconds)
 No rows affected (0.052 seconds)
 +---+
 | tab_name  |
 +---+
 | newtab|
 +---+
 1 row selected (0.232 seconds)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4159) RetryingHMSHandler doesn't retry in enough cases

2013-03-28 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616821#comment-13616821
 ] 

Gang Tim Liu commented on HIVE-4159:


Committed. thanks Kevin.

 RetryingHMSHandler doesn't retry in enough cases
 

 Key: HIVE-4159
 URL: https://issues.apache.org/jira/browse/HIVE-4159
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4159.1.patch.txt


 HIVE-3524 introduced a change which caused JDOExceptions to be wrapped in 
 MetaExceptions.  This caused the RetryingHMSHandler to not retry on these 
 exceptions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4159) RetryingHMSHandler doesn't retry in enough cases

2013-03-28 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4159:
---

Fix Version/s: 0.11.0

 RetryingHMSHandler doesn't retry in enough cases
 

 Key: HIVE-4159
 URL: https://issues.apache.org/jira/browse/HIVE-4159
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.11.0

 Attachments: HIVE-4159.1.patch.txt


 HIVE-3524 introduced a change which caused JDOExceptions to be wrapped in 
 MetaExceptions.  This caused the RetryingHMSHandler to not retry on these 
 exceptions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4159) RetryingHMSHandler doesn't retry in enough cases

2013-03-28 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4159:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 RetryingHMSHandler doesn't retry in enough cases
 

 Key: HIVE-4159
 URL: https://issues.apache.org/jira/browse/HIVE-4159
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.11.0

 Attachments: HIVE-4159.1.patch.txt


 HIVE-3524 introduced a change which caused JDOExceptions to be wrapped in 
 MetaExceptions.  This caused the RetryingHMSHandler to not retry on these 
 exceptions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4155) Expose ORC's FileDump as a service

2013-03-28 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616828#comment-13616828
 ] 

Gang Tim Liu commented on HIVE-4155:


Committed. thanks Kevin

 Expose ORC's FileDump as a service
 --

 Key: HIVE-4155
 URL: https://issues.apache.org/jira/browse/HIVE-4155
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4155.1.patch.txt


 Expose ORC's FileDump class as a service similar to RC File Cat
 e.g.
 hive --orcfiledump path_to_file
 Should run FileDump on the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4155) Expose ORC's FileDump as a service

2013-03-28 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4155:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Expose ORC's FileDump as a service
 --

 Key: HIVE-4155
 URL: https://issues.apache.org/jira/browse/HIVE-4155
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.11.0

 Attachments: HIVE-4155.1.patch.txt


 Expose ORC's FileDump class as a service similar to RC File Cat
 e.g.
 hive --orcfiledump path_to_file
 Should run FileDump on the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4155) Expose ORC's FileDump as a service

2013-03-28 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4155:
---

Fix Version/s: 0.11.0

 Expose ORC's FileDump as a service
 --

 Key: HIVE-4155
 URL: https://issues.apache.org/jira/browse/HIVE-4155
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.11.0

 Attachments: HIVE-4155.1.patch.txt


 Expose ORC's FileDump class as a service similar to RC File Cat
 e.g.
 hive --orcfiledump path_to_file
 Should run FileDump on the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4119) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table is empty

2013-03-28 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616855#comment-13616855
 ] 

Carl Steinbach commented on HIVE-4119:
--

+1. Will commit if tests pass.

 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 -

 Key: HIVE-4119
 URL: https://issues.apache.org/jira/browse/HIVE-4119
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.10.0
Reporter: Lenni Kuff
Assignee: Shreepadma Venugopalan
Priority: Critical
 Attachments: HIVE-4119.1.patch, HIVE-4119.2.patch


 ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails with NPE if the table 
 is empty
 {code}
 hive -e create table empty_table (i int); select compute_stats(i, 16) from 
 empty_table
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1132)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:558)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:567)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:193)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:231)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:35)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getInt(PrimitiveObjectInspectorUtils.java:535)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFLongStatsEvaluator.iterate(GenericUDAFComputeStats.java:477)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:139)
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1099)
   ... 15 more
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
   at 
 

[jira] [Created] (HIVE-4251) Indices can't be built on tables who's schema info comes from SerDe

2013-03-28 Thread Mark Wagner (JIRA)
Mark Wagner created HIVE-4251:
-

 Summary: Indices can't be built on tables who's schema info comes 
from SerDe
 Key: HIVE-4251
 URL: https://issues.apache.org/jira/browse/HIVE-4251
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.10.1
Reporter: Mark Wagner
Assignee: Mark Wagner


Building indices on tables who get the schema information from the deserializer 
(e.g. Avro backed tables) doesn't work because when the column is checked to 
exist, the correct API isn't used.

{code}
hive describe doctors; 
 
OK
# col_name  data_type   comment 
 
number  int from deserializer   
first_name  string  from deserializer   
last_name   string  from deserializer   
Time taken: 0.215 seconds, Fetched: 5 row(s)
hive create index doctors_index on table doctors(number) as 'compact' with 
deferred rebuild; 
FAILED: Error in metadata: java.lang.RuntimeException: Check the index columns, 
they should appear in the table being indexed.
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4251) Indices can't be built on tables who's schema info comes from SerDe

2013-03-28 Thread Mark Wagner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner updated HIVE-4251:
--

Attachment: HIVE-4251.1.patch

The attached patch fixes this for both 0.10 branch and trunk.

 Indices can't be built on tables who's schema info comes from SerDe
 ---

 Key: HIVE-4251
 URL: https://issues.apache.org/jira/browse/HIVE-4251
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.10.1
Reporter: Mark Wagner
Assignee: Mark Wagner
 Attachments: HIVE-4251.1.patch


 Building indices on tables who get the schema information from the 
 deserializer (e.g. Avro backed tables) doesn't work because when the column 
 is checked to exist, the correct API isn't used.
 {code}
 hive describe doctors;   

 OK
 # col_namedata_type   comment 

 numberint from deserializer   
 first_namestring  from deserializer   
 last_name string  from deserializer   
 Time taken: 0.215 seconds, Fetched: 5 row(s)
 hive create index doctors_index on table doctors(number) as 'compact' with 
 deferred rebuild; 
 FAILED: Error in metadata: java.lang.RuntimeException: Check the index 
 columns, they should appear in the table being indexed.
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4251) Indices can't be built on tables who's schema info comes from SerDe

2013-03-28 Thread Mark Wagner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner updated HIVE-4251:
--

Fix Version/s: 0.10.1
   0.11.0
Affects Version/s: 0.10.0
   Status: Patch Available  (was: Open)

 Indices can't be built on tables who's schema info comes from SerDe
 ---

 Key: HIVE-4251
 URL: https://issues.apache.org/jira/browse/HIVE-4251
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0, 0.10.1
Reporter: Mark Wagner
Assignee: Mark Wagner
 Fix For: 0.11.0, 0.10.1

 Attachments: HIVE-4251.1.patch


 Building indices on tables who get the schema information from the 
 deserializer (e.g. Avro backed tables) doesn't work because when the column 
 is checked to exist, the correct API isn't used.
 {code}
 hive describe doctors;   

 OK
 # col_namedata_type   comment 

 numberint from deserializer   
 first_namestring  from deserializer   
 last_name string  from deserializer   
 Time taken: 0.215 seconds, Fetched: 5 row(s)
 hive create index doctors_index on table doctors(number) as 'compact' with 
 deferred rebuild; 
 FAILED: Error in metadata: java.lang.RuntimeException: Check the index 
 columns, they should appear in the table being indexed.
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4157) ORC runs out of heap when writing

2013-03-28 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616898#comment-13616898
 ] 

Gang Tim Liu commented on HIVE-4157:


Committed. thanks Kevin

 ORC runs out of heap when writing
 -

 Key: HIVE-4157
 URL: https://issues.apache.org/jira/browse/HIVE-4157
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4157.1.patch.txt


 The OutStream class used by the ORC file format seems to aggressively 
 allocate memory for ByteBuffers and doesn't seem too eager to give it back.
 This causes issues with heap space, particularly when a wide tables/dynamic 
 partitions are involved.
 As a first step to resolving this problem, the OutStream class can be 
 modified to lazily allocate memory, and more actively make it available for 
 garbage collection.
 Follow ups could include checking the amount of free memory as part of 
 determining if a spill is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4157) ORC runs out of heap when writing

2013-03-28 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4157:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 ORC runs out of heap when writing
 -

 Key: HIVE-4157
 URL: https://issues.apache.org/jira/browse/HIVE-4157
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.11.0

 Attachments: HIVE-4157.1.patch.txt


 The OutStream class used by the ORC file format seems to aggressively 
 allocate memory for ByteBuffers and doesn't seem too eager to give it back.
 This causes issues with heap space, particularly when a wide tables/dynamic 
 partitions are involved.
 As a first step to resolving this problem, the OutStream class can be 
 modified to lazily allocate memory, and more actively make it available for 
 garbage collection.
 Follow ups could include checking the amount of free memory as part of 
 determining if a spill is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4157) ORC runs out of heap when writing

2013-03-28 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-4157:
---

Fix Version/s: 0.11.0

 ORC runs out of heap when writing
 -

 Key: HIVE-4157
 URL: https://issues.apache.org/jira/browse/HIVE-4157
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.11.0

 Attachments: HIVE-4157.1.patch.txt


 The OutStream class used by the ORC file format seems to aggressively 
 allocate memory for ByteBuffers and doesn't seem too eager to give it back.
 This causes issues with heap space, particularly when a wide tables/dynamic 
 partitions are involved.
 As a first step to resolving this problem, the OutStream class can be 
 modified to lazily allocate memory, and more actively make it available for 
 garbage collection.
 Follow ups could include checking the amount of free memory as part of 
 determining if a spill is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4157) ORC runs out of heap when writing

2013-03-28 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616901#comment-13616901
 ] 

Gang Tim Liu commented on HIVE-4157:


Forgot to mention: tests passed. sorry

 ORC runs out of heap when writing
 -

 Key: HIVE-4157
 URL: https://issues.apache.org/jira/browse/HIVE-4157
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.11.0

 Attachments: HIVE-4157.1.patch.txt


 The OutStream class used by the ORC file format seems to aggressively 
 allocate memory for ByteBuffers and doesn't seem too eager to give it back.
 This causes issues with heap space, particularly when a wide tables/dynamic 
 partitions are involved.
 As a first step to resolving this problem, the OutStream class can be 
 modified to lazily allocate memory, and more actively make it available for 
 garbage collection.
 Follow ups could include checking the amount of free memory as part of 
 determining if a spill is needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4159) RetryingHMSHandler doesn't retry in enough cases

2013-03-28 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616902#comment-13616902
 ] 

Gang Tim Liu commented on HIVE-4159:


Forgot to mention: tests passed. sorry

 RetryingHMSHandler doesn't retry in enough cases
 

 Key: HIVE-4159
 URL: https://issues.apache.org/jira/browse/HIVE-4159
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.11.0

 Attachments: HIVE-4159.1.patch.txt


 HIVE-3524 introduced a change which caused JDOExceptions to be wrapped in 
 MetaExceptions.  This caused the RetryingHMSHandler to not retry on these 
 exceptions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4155) Expose ORC's FileDump as a service

2013-03-28 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616903#comment-13616903
 ] 

Gang Tim Liu commented on HIVE-4155:


Forgot to mention: tests passed. sorry

 Expose ORC's FileDump as a service
 --

 Key: HIVE-4155
 URL: https://issues.apache.org/jira/browse/HIVE-4155
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.11.0

 Attachments: HIVE-4155.1.patch.txt


 Expose ORC's FileDump class as a service similar to RC File Cat
 e.g.
 hive --orcfiledump path_to_file
 Should run FileDump on the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3464) Merging join tree may reorder joins which could be invalid

2013-03-28 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616932#comment-13616932
 ] 

Phabricator commented on HIVE-3464:
---

navis has commented on the revision HIVE-3464 [jira] Merging join tree may 
reorder joins which could be invalid.

INLINE COMMENTS
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinResolver.java:332
 Ok, it's on other issue. Will be removed.

  Any other comments on changes?

REVISION DETAIL
  https://reviews.facebook.net/D5409

To: JIRA, navis
Cc: njain, vikram


 Merging join tree may reorder joins which could be invalid
 --

 Key: HIVE-3464
 URL: https://issues.apache.org/jira/browse/HIVE-3464
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Navis
Assignee: Navis
 Attachments: HIVE-3464.D5409.2.patch, HIVE-3464.D5409.3.patch, 
 HIVE-3464.D5409.4.patch, HIVE-3464.D5409.5.patch


 Currently, hive merges join tree from right to left regardless of join types, 
 which may introduce join reordering. For example,
 select * from a join a b on a.key=b.key join a c on b.key=c.key join a d on 
 a.key=d.key; 
 Hive tries to merge join tree in a-d=b-d, a-d=a-b, b-c=a-b order and a-d=a-b 
 and b-c=a-b will be merged. Final join tree is a-(bdc).
 With this, ab-d join will be executed prior to ab-c. But if join type of -c 
 and -d is different, this is not valid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-4249) current database is retained between sessions in hive server2

2013-03-28 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair resolved HIVE-4249.
-

Resolution: Duplicate

Thanks Prasad for pointing that out!
Marking as duplicate.


 current database is retained between sessions in hive server2
 ---

 Key: HIVE-4249
 URL: https://issues.apache.org/jira/browse/HIVE-4249
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.11.0


 current database is retained between sessions in hive server2.
 To reproduce -
 Run this serveral times -
 bin/beeline  -e '!connect jdbc:hive2://localhost:1 scott tiger 
 org.apache.hive.jdbc.HiveDriver' -e 'show tables;' -e ' use newdb;' -e ' show 
 tables;'
 table ab is a table in default database, newtab is a table in newdb database.
 Expected result is 
 {code}
 +---+
 | tab_name  |
 +---+
 | ab|
 +---+
 1 row selected (0.457 seconds)
 No rows affected (0.039 seconds)
 +---+
 | tab_name  |
 +---+
 | newtab|
 +---+
 {code}
 But after running it several, times you see threads having newdb as default 
 database, ie the output of above command becomes -
 {code}
 +---+
 | tab_name  |
 +---+
 | newtab|
 +---+
 1 row selected (0.518 seconds)
 No rows affected (0.052 seconds)
 +---+
 | tab_name  |
 +---+
 | newtab|
 +---+
 1 row selected (0.232 seconds)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4179) NonBlockingOpDeDup does not merge SEL operators correctly

2013-03-28 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616938#comment-13616938
 ] 

Navis commented on HIVE-4179:
-

minor comments on phabricator

 NonBlockingOpDeDup does not merge SEL operators correctly
 -

 Key: HIVE-4179
 URL: https://issues.apache.org/jira/browse/HIVE-4179
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
Priority: Critical
 Fix For: 0.11.0

 Attachments: HIVE-4179.1.patch, HIVE-4179.2.patch, HIVE-4179.3.patch


 The input columns list for SEL operations isn't merged properly in the 
 optimization. The best way to see this is running union_remove_22.q with 
 -Dhadoop.mr.rev=23. The plan shows lost UDFs and a broken lineage for one 
 column.
 Note: union_remove tests do not run on hadoop 1 or 0.20.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4171) Current database in metastore.Hive is not consistent with SessionState

2013-03-28 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4171:


Attachment: HIVE-4171.3.patch

I think it would be better to store this information in HiveConf and remove the 
member from Hive class. This would mean that there will be only one source for 
truth for his information (instead of having it in Hive and SessionState 
classes).
I can submit another patch with fix for the TODO in patch and unit tests if you 
agree . HIVE-4171.3.patch (also in https://reviews.apache.org/r/10180/ )

 Current database in metastore.Hive is not consistent with SessionState
 --

 Key: HIVE-4171
 URL: https://issues.apache.org/jira/browse/HIVE-4171
 Project: Hive
  Issue Type: Bug
  Components: CLI
Reporter: Navis
Assignee: Navis
  Labels: HiveServer2
 Attachments: HIVE-4171.3.patch, HIVE-4171.D9399.1.patch, 
 HIVE-4171.D9399.2.patch


 metastore.Hive is thread local instance, which can have different status with 
 SessionState. Currently the only status in metastore.Hive is database name in 
 use.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4235) CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists

2013-03-28 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-4235:


   Resolution: Fixed
Fix Version/s: 0.11.0
   Status: Resolved  (was: Patch Available)

Committed, thanks Tim.

 CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists
 

 Key: HIVE-4235
 URL: https://issues.apache.org/jira/browse/HIVE-4235
 Project: Hive
  Issue Type: Bug
  Components: JDBC, Query Processor, SQL
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Fix For: 0.11.0

 Attachments: HIVE-4235.patch.1


 CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists.
 It uses Hive.java's getTablesByPattern(...) to check if table exists. It 
 involves regular expression and eventually database join. Very efficient. It 
 can cause database lock time increase and hurt db performance if a lot of 
 such commands hit database.
 The suggested approach is to use getTable(...) since we know tablename already

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4235) CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists

2013-03-28 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616964#comment-13616964
 ] 

Gang Tim Liu commented on HIVE-4235:


Kevin, thank you very much. Tim





 CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists
 

 Key: HIVE-4235
 URL: https://issues.apache.org/jira/browse/HIVE-4235
 Project: Hive
  Issue Type: Bug
  Components: JDBC, Query Processor, SQL
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Fix For: 0.11.0

 Attachments: HIVE-4235.patch.1


 CREATE TABLE IF NOT EXISTS uses inefficient way to check if table exists.
 It uses Hive.java's getTablesByPattern(...) to check if table exists. It 
 involves regular expression and eventually database join. Very efficient. It 
 can cause database lock time increase and hurt db performance if a lot of 
 such commands hit database.
 The suggested approach is to use getTable(...) since we know tablename already

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4194) JDBC2: HiveDriver should not throw RuntimeException when passed an invalid URL

2013-03-28 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616988#comment-13616988
 ] 

Thejas M Nair commented on HIVE-4194:
-

Another non-binding +1 .
(do non-binding +1's add up :) )


 JDBC2: HiveDriver should not throw RuntimeException when passed an invalid URL
 --

 Key: HIVE-4194
 URL: https://issues.apache.org/jira/browse/HIVE-4194
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.11.0

 Attachments: HIVE-4194.patch


 As per JDBC 3.0 Spec (section 9.2)
 If the Driver implementation understands the URL, it will return a 
 Connection object; otherwise it returns null
 Currently HiveConnection constructor will throw IllegalArgumentException if 
 url string doesn't start with jdbc:hive2. This exception should be caught 
 by HiveDriver.connect and return null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4252) hiveserver2 string representation of complex types are inconsistent with cli

2013-03-28 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-4252:
---

 Summary: hiveserver2 string representation of complex types are 
inconsistent with cli
 Key: HIVE-4252
 URL: https://issues.apache.org/jira/browse/HIVE-4252
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair


For example, it prints struct as [null, null, null] instead of  
{\r\:null,\s\:null,\t\:null}
And for maps it is printing it as {k=v} instead of {\k\:\v\}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4252) hiveserver2 string representation of complex types are inconsistent with cli

2013-03-28 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4252:


Attachment: HIVE-4252.1.patch

 hiveserver2 string representation of complex types are inconsistent with cli
 

 Key: HIVE-4252
 URL: https://issues.apache.org/jira/browse/HIVE-4252
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4252.1.patch


 For example, it prints struct as [null, null, null] instead of  
 {\r\:null,\s\:null,\t\:null}
 And for maps it is printing it as {k=v} instead of {\k\:\v\}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4252) hiveserver2 string representation of complex types are inconsistent with cli

2013-03-28 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4252:


Status: Patch Available  (was: Open)

 hiveserver2 string representation of complex types are inconsistent with cli
 

 Key: HIVE-4252
 URL: https://issues.apache.org/jira/browse/HIVE-4252
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4252.1.patch


 For example, it prints struct as [null, null, null] instead of  
 {\r\:null,\s\:null,\t\:null}
 And for maps it is printing it as {k=v} instead of {\k\:\v\}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4253) use jdbc complex types for hive complex types

2013-03-28 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-4253:
---

 Summary: use jdbc complex types for hive complex types 
 Key: HIVE-4253
 URL: https://issues.apache.org/jira/browse/HIVE-4253
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Thejas M Nair


The hiveserver2 jdbc driver is converting the complex types into strings. It 
will be better to use suitable java objects as per jdbc spec.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4109) Partition by column does not have to be in order by

2013-03-28 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617045#comment-13617045
 ] 

Harish Butani commented on HIVE-4109:
-

this should be fixed with 4197

 Partition by column does not have to be in order by
 ---

 Key: HIVE-4109
 URL: https://issues.apache.org/jira/browse/HIVE-4109
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Brock Noland

 Cam up in the review of HIVE-4093.
 Ashutosh
 {noformat}
 I am not sure if this is illegal query. I tried following two queries in 
 postgres, both of them succeeded.
 select p_mfgr, avg(p_retailprice) over(partition by p_mfgr, p_type order by 
 p_mfgr) from part;
 select p_mfgr, avg(p_retailprice) over(partition by p_mfgr order by 
 p_type,p_mfgr) from part;
 {noformat}
 Harish
 {noformat}
 The first one doesn't make sense, right? Order on a subset of the partition 
 columns
 The second one: Can we do this with the Hive ReduceOp have the orderColumns 
 be in a different order than the key columns?
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4254) Code cleanup : debug methods, having clause associated with Windowing

2013-03-28 Thread Harish Butani (JIRA)
Harish Butani created HIVE-4254:
---

 Summary: Code cleanup : debug methods, having clause associated 
with Windowing
 Key: HIVE-4254
 URL: https://issues.apache.org/jira/browse/HIVE-4254
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani


- remove debug functions in SemanticAnalyzer
- remove code dealing with having clause associated with Windowing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4194) JDBC2: HiveDriver should not throw RuntimeException when passed an invalid URL

2013-03-28 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-4194:
-

Status: Open  (was: Patch Available)

We shouldn't let people instantiate malformed HiveConnection objects. Please 
make the HiveConnection constructor private and add static builder methods to 
HiveConnection (e.g. HiveConnection.newConnection(String url, Properties info)) 
that validate the input URL and return null if it's invalid. Please also 
relocated acceptsURL() to HiveConnection and make it private. Thanks.

 JDBC2: HiveDriver should not throw RuntimeException when passed an invalid URL
 --

 Key: HIVE-4194
 URL: https://issues.apache.org/jira/browse/HIVE-4194
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 0.11.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.11.0

 Attachments: HIVE-4194.patch


 As per JDBC 3.0 Spec (section 9.2)
 If the Driver implementation understands the URL, it will return a 
 Connection object; otherwise it returns null
 Currently HiveConnection constructor will throw IllegalArgumentException if 
 url string doesn't start with jdbc:hive2. This exception should be caught 
 by HiveDriver.connect and return null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2264) Hive server is SHUTTING DOWN when invalid queries beeing executed.

2013-03-28 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617075#comment-13617075
 ] 

Carl Steinbach commented on HIVE-2264:
--

@Navis: we relaxed that rule. You can commit your own patches as long as you 
get a +1 from another committer. You're good to go.

 Hive server is SHUTTING DOWN when invalid queries beeing executed.
 --

 Key: HIVE-2264
 URL: https://issues.apache.org/jira/browse/HIVE-2264
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Affects Versions: 0.9.0
 Environment: SuSE-Linux-11
Reporter: rohithsharma
Assignee: Navis
Priority: Blocker
 Fix For: 0.11.0

 Attachments: HIVE-2264.1.patch.txt, HIVE-2264-2.patch, 
 HIVE-2264.D9489.1.patch


 When invalid query is beeing executed, Hive server is shutting down.
 {noformat}
 CREATE TABLE SAMPLETABLE(IP STRING , showtime BIGINT ) partitioned by (ds 
 string,ipz int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\040'
 ALTER TABLE SAMPLETABLE add Partition(ds='sf') location 
 '/user/hive/warehouse' Partition(ipz=100) location '/user/hive/warehouse'
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4254) Code cleanup : debug methods, having clause associated with Windowing

2013-03-28 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4254:
--

Attachment: HIVE-4254.D9795.1.patch

hbutani requested code review of HIVE-4254 [jira] Code cleanup : debug 
methods, having clause associated with Windowing.

Reviewers: JIRA, ashutoshc

cleanup code

remove debug functions in SemanticAnalyzer
remove code dealing with having clause associated with Windowing

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D9795

AFFECTED FILES
  data/files/flights_tiny.txt
  data/files/part.rc
  data/files/part.seq
  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingComponentizer.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingSpec.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/23361/

To: JIRA, ashutoshc, hbutani


 Code cleanup : debug methods, having clause associated with Windowing
 -

 Key: HIVE-4254
 URL: https://issues.apache.org/jira/browse/HIVE-4254
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4254.D9795.1.patch


 - remove debug functions in SemanticAnalyzer
 - remove code dealing with having clause associated with Windowing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4255) update show_functions.q.out for functions added for windowing

2013-03-28 Thread Harish Butani (JIRA)
Harish Butani created HIVE-4255:
---

 Summary: update show_functions.q.out for functions added for 
windowing
 Key: HIVE-4255
 URL: https://issues.apache.org/jira/browse/HIVE-4255
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4255) update show_functions.q.out for functions added for windowing

2013-03-28 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617093#comment-13617093
 ] 

Harish Butani commented on HIVE-4255:
-

patch is attached.

 update show_functions.q.out for functions added for windowing
 -

 Key: HIVE-4255
 URL: https://issues.apache.org/jira/browse/HIVE-4255
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4255.1.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4255) update show_functions.q.out for functions added for windowing

2013-03-28 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-4255:


Attachment: HIVE-4255.1.patch.txt

 update show_functions.q.out for functions added for windowing
 -

 Key: HIVE-4255
 URL: https://issues.apache.org/jira/browse/HIVE-4255
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-4255.1.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2264) Hive server is SHUTTING DOWN when invalid queries beeing executed.

2013-03-28 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-2264:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed and was my first commit. Thanks to all.

 Hive server is SHUTTING DOWN when invalid queries beeing executed.
 --

 Key: HIVE-2264
 URL: https://issues.apache.org/jira/browse/HIVE-2264
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Affects Versions: 0.9.0
 Environment: SuSE-Linux-11
Reporter: rohithsharma
Assignee: Navis
Priority: Blocker
 Fix For: 0.11.0

 Attachments: HIVE-2264.1.patch.txt, HIVE-2264-2.patch, 
 HIVE-2264.D9489.1.patch


 When invalid query is beeing executed, Hive server is shutting down.
 {noformat}
 CREATE TABLE SAMPLETABLE(IP STRING , showtime BIGINT ) partitioned by (ds 
 string,ipz int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\040'
 ALTER TABLE SAMPLETABLE add Partition(ds='sf') location 
 '/user/hive/warehouse' Partition(ipz=100) location '/user/hive/warehouse'
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2264) Hive server is SHUTTING DOWN when invalid queries beeing executed.

2013-03-28 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617110#comment-13617110
 ] 

Phabricator commented on HIVE-2264:
---

navis has abandoned the revision HIVE-2264 [jira] Hive server is SHUTTING DOWN 
when invalid queries beeing executed..

  Committed

REVISION DETAIL
  https://reviews.facebook.net/D9489

To: JIRA, navis


 Hive server is SHUTTING DOWN when invalid queries beeing executed.
 --

 Key: HIVE-2264
 URL: https://issues.apache.org/jira/browse/HIVE-2264
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Affects Versions: 0.9.0
 Environment: SuSE-Linux-11
Reporter: rohithsharma
Assignee: Navis
Priority: Blocker
 Fix For: 0.11.0

 Attachments: HIVE-2264.1.patch.txt, HIVE-2264-2.patch, 
 HIVE-2264.D9489.1.patch


 When invalid query is beeing executed, Hive server is shutting down.
 {noformat}
 CREATE TABLE SAMPLETABLE(IP STRING , showtime BIGINT ) partitioned by (ds 
 string,ipz int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\040'
 ALTER TABLE SAMPLETABLE add Partition(ds='sf') location 
 '/user/hive/warehouse' Partition(ipz=100) location '/user/hive/warehouse'
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira