date:20130328


[ 
https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616123#comment-13616123
 ] 

Namit Jain commented on HIVE-4018:
--

+1

Missed this -- running tests

 MapJoin failing with Distributed Cache error
 

 Key: HIVE-4018
 URL: https://issues.apache.org/jira/browse/HIVE-4018
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.11.0

 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, 
 HIVE-4018-test.patch


 When I'm a running a star join query after HIVE-3784, it is failing with 
 following error:
 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: 
 Load Distributed Cache Error
 2013-02-13 08:36:04,585 FATAL ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4018) MapJoin failing with Distributed Cache error


 [ 
https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4018:
-

Status: Open  (was: Patch Available)

Can you refresh ?
phabricator diff is not applying cleanly.
Can you also load the latest patch ?

 MapJoin failing with Distributed Cache error
 

 Key: HIVE-4018
 URL: https://issues.apache.org/jira/browse/HIVE-4018
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.11.0

 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, 
 HIVE-4018-test.patch


 When I'm a running a star join query after HIVE-3784, it is failing with 
 following error:
 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: 
 Load Distributed Cache Error
 2013-02-13 08:36:04,585 FATAL ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2905) Desc table can't read Chinese (UTF-8 character code)


 [ 
https://issues.apache.org/jira/browse/HIVE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaozhe Wang updated HIVE-2905:
---

Affects Version/s: 0.10.0

 Desc table can't read Chinese (UTF-8 character code)
 

 Key: HIVE-2905
 URL: https://issues.apache.org/jira/browse/HIVE-2905
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.7.0, 0.10.0
 Environment: hive 0.7.0, mysql 5.1.45
Reporter: Sheng Zhou

 When desc a table with command line or hive jdbc way, the table's comment 
 can't be read.
 1. I have updated javax.jdo.option.ConnectionURL parameter in hive-site.xml 
 file.
jdbc:mysql://*.*.*.*:3306/hive?characterEncoding=UTF-8
 2. In mysql database, the comment field of COLUMNS table can be read normally.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2905) Desc table can't read Chinese (UTF-8 character code)


 [ 
https://issues.apache.org/jira/browse/HIVE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaozhe Wang updated HIVE-2905:
---

Environment: 
hive 0.7.0, mysql 5.1.45
hive 0.10.0, mysql 5.5.30

  was:hive 0.7.0, mysql 5.1.45


 Desc table can't read Chinese (UTF-8 character code)
 

 Key: HIVE-2905
 URL: https://issues.apache.org/jira/browse/HIVE-2905
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.7.0, 0.10.0
 Environment: hive 0.7.0, mysql 5.1.45
 hive 0.10.0, mysql 5.5.30
Reporter: Sheng Zhou

 When desc a table with command line or hive jdbc way, the table's comment 
 can't be read.
 1. I have updated javax.jdo.option.ConnectionURL parameter in hive-site.xml 
 file.
jdbc:mysql://*.*.*.*:3306/hive?characterEncoding=UTF-8
 2. In mysql database, the comment field of COLUMNS table can be read normally.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4221) Stripe-level merge for ORC files

2013-03-28 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4221:
--

Attachment: HIVE-4221.HIVE-4221.HIVE-4221.HIVE-4221.D9759.1.patch

sxyuan requested code review of HIVE-4221 [jira] Stripe-level merge for ORC 
files
.

Reviewers: kevinwilfong, omalley

As with RC files, we would like to be able to merge ORC files efficiently by 
reading/writing stripes without deserializing each row. Most of the logic is 
unchanged from merging for RC files, so the original code has been refactored 
for reuse.

TEST PLAN
  Copied and modified RC file merge tests to use ORC file format. Added a test 
case to TestOrcFile to make sure file level column stats are merged properly.

REVISION DETAIL
  https://reviews.facebook.net/D9759

AFFECTED FILES
  data/files/smbbucket_1.orc
  data/files/smbbucket_3.orc
  data/files/smbbucket_2.orc
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  ql/src/test/results/clientpositive/orc_createas1.q.out
  ql/src/test/results/clientpositive/orcfile_merge3.q.out
  ql/src/test/results/clientpositive/orcfile_merge2.q.out
  ql/src/test/results/clientpositive/alter_merge_orc2.q.out
  ql/src/test/results/clientpositive/alter_merge_orc.q.out
  ql/src/test/results/clientpositive/orcfile_merge1.q.out
  ql/src/test/results/clientpositive/orcfile_merge4.q.out
  ql/src/test/results/clientpositive/alter_merge_orc_stats.q.out
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
  ql/src/test/queries/clientpositive/orcfile_merge2.q
  ql/src/test/queries/clientpositive/orcfile_merge3.q
  ql/src/test/queries/clientpositive/alter_merge_orc.q
  ql/src/test/queries/clientpositive/orcfile_merge4.q
  ql/src/test/queries/clientpositive/alter_merge_orc_stats.q
  ql/src/test/queries/clientpositive/orcfile_merge1.q
  ql/src/test/queries/clientpositive/alter_merge_orc2.q
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/AlterTablePartMergeFilesDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
  ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java
  
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeOutputFormat.java
  ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java
  ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcBlockMergeRecordReader.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcBlockMergeInputFormat.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcMergeMapper.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StripeReader.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
  ql/src/java/org/apache/hadoop/hive/ql/io/merge
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeWork.java
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeMapper.java
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/BlockMergeOutputFormat.java
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/BlockMergeTask.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/23295/

To: kevinwilfong, omalley, sxyuan
Cc: JIRA


 Stripe-level merge for ORC files
 

 Key: HIVE-4221
 URL: https://issues.apache.org/jira/browse/HIVE-4221
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Samuel Yuan
Assignee: Samuel Yuan
 Attachments: HIVE-4221.HIVE-4221.HIVE-4221.HIVE-4221.D9759.1.patch


 As with RC files, we would like to be able to merge ORC files efficiently by 
 reading/writing stripes without decompressing/recompressing them. This will 
 be similar to the RC file merge, except that footers will have to be updated 
 with the stripe positions in the new file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4221) Stripe-level merge for ORC files

2013-03-28 Thread Samuel Yuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samuel Yuan updated HIVE-4221:
--

Status: Patch Available  (was: Open)

 Stripe-level merge for ORC files
 

 Key: HIVE-4221
 URL: https://issues.apache.org/jira/browse/HIVE-4221
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Samuel Yuan
Assignee: Samuel Yuan
 Attachments: HIVE-4221.HIVE-4221.HIVE-4221.HIVE-4221.D9759.1.patch


 As with RC files, we would like to be able to merge ORC files efficiently by 
 reading/writing stripes without decompressing/recompressing them. This will 
 be similar to the RC file merge, except that footers will have to be updated 
 with the stripe positions in the new file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4018) MapJoin failing with Distributed Cache error

2013-03-28 Thread Amareshwari Sriramadasu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616156#comment-13616156
 ] 

Amareshwari Sriramadasu commented on HIVE-4018:
---

After Updating the patch to trunk, the test fails with NPE again. Will see 
whats the cause and update.

 MapJoin failing with Distributed Cache error
 

 Key: HIVE-4018
 URL: https://issues.apache.org/jira/browse/HIVE-4018
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.11.0
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.11.0

 Attachments: HIVE-4018.patch, hive.4018.test.2.patch, 
 HIVE-4018-test.patch


 When I'm a running a star join query after HIVE-3784, it is failing with 
 following error:
 2013-02-13 08:36:04,584 ERROR org.apache.hadoop.hive.ql.exec.MapJoinOperator: 
 Load Distributed Cache Error
 2013-02-13 08:36:04,585 FATAL ExecMapper: 
 org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:189)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:203)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1421)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1425)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:614)
   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4240) optimize hive.enforce.bucketing and hive.enforce sorting insert


[ 
https://issues.apache.org/jira/browse/HIVE-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616174#comment-13616174
 ] 

Namit Jain commented on HIVE-4240:
--

https://reviews.facebook.net/D9765

 optimize hive.enforce.bucketing and hive.enforce sorting insert
 ---

 Key: HIVE-4240
 URL: https://issues.apache.org/jira/browse/HIVE-4240
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 Consider the following scenario:
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 set hive.input.format = 
 org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 set hive.enforce.bucketing=true;
 set hive.enforce.sorting=true;
 set hive.exec.reducers.max = 1;
 set hive.merge.mapfiles=false;
 set hive.merge.mapredfiles=false;
 -- Create two bucketed and sorted tables
 CREATE TABLE test_table1 (key INT, value STRING) PARTITIONED BY (ds STRING) 
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS;
 CREATE TABLE test_table2 (key INT, value STRING) PARTITIONED BY (ds STRING) 
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS;
 FROM src
 INSERT OVERWRITE TABLE test_table1 PARTITION (ds = '1') SELECT *;
 -- Insert data into the bucketed table by selecting from another bucketed 
 table
 -- This should be a map-only operation
 INSERT OVERWRITE TABLE test_table2 PARTITION (ds = '1')
 SELECT a.key, a.value FROM test_table1 a WHERE a.ds = '1';
 We should not need a reducer to perform the above operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4240) optimize hive.enforce.bucketing and hive.enforce sorting insert


 [ 
https://issues.apache.org/jira/browse/HIVE-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4240:
-

Attachment: hive.4240.1.patch

 optimize hive.enforce.bucketing and hive.enforce sorting insert
 ---

 Key: HIVE-4240
 URL: https://issues.apache.org/jira/browse/HIVE-4240
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4240.1.patch


 Consider the following scenario:
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 set hive.input.format = 
 org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 set hive.enforce.bucketing=true;
 set hive.enforce.sorting=true;
 set hive.exec.reducers.max = 1;
 set hive.merge.mapfiles=false;
 set hive.merge.mapredfiles=false;
 -- Create two bucketed and sorted tables
 CREATE TABLE test_table1 (key INT, value STRING) PARTITIONED BY (ds STRING) 
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS;
 CREATE TABLE test_table2 (key INT, value STRING) PARTITIONED BY (ds STRING) 
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS;
 FROM src
 INSERT OVERWRITE TABLE test_table1 PARTITION (ds = '1') SELECT *;
 -- Insert data into the bucketed table by selecting from another bucketed 
 table
 -- This should be a map-only operation
 INSERT OVERWRITE TABLE test_table2 PARTITION (ds = '1')
 SELECT a.key, a.value FROM test_table1 a WHERE a.ds = '1';
 We should not need a reducer to perform the above operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2905) Desc table can't read Chinese (UTF-8 character code)

[
https://issues.apache.org/jira/browse/HIVE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiaozhe Wang updated HIVE-2905:
---

Labels: patch (was: )
Status: Patch Available (was: Open)

The problem is
org.apache.hadoop.hive.ql.metadata.formatting.TextMetaDataFormatter.describeTable()
use DataOutputStream.writeBytes() to output column info string. Unfortunately,
DataOutputStream.writeBytes() will only write out lower byte of each character
in the String, which cause garbling problem when column comment contains
non-latin1 characters.

This simple patch solved Unicode character garbling problem when describe table
in Hive client.

Desc table can't read Chinese (UTF-8 character code)

Key: HIVE-2905
URL: https://issues.apache.org/jira/browse/HIVE-2905
Project: Hive
Issue Type: Bug
Components: CLI
Affects Versions: 0.10.0, 0.7.0
Environment: hive 0.7.0, mysql 5.1.45
hive 0.10.0, mysql 5.5.30
Reporter: Sheng Zhou
Labels: patch

When desc a table with command line or hive jdbc way, the table's comment
can't be read.
1. I have updated javax.jdo.option.ConnectionURL parameter in hive-site.xml
file.
jdbc:mysql://*.*.*.*:3306/hive?characterEncoding=UTF-8
2. In mysql database, the comment field of COLUMNS table can be read normally.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2905) Desc table can't read Chinese (UTF-8 character code)


 [ 
https://issues.apache.org/jira/browse/HIVE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaozhe Wang updated HIVE-2905:
---

Attachment: utf8-desc-comment.patch

Simple patch to resolve the garbling problem of column comment which contains 
unicode characters.

 Desc table can't read Chinese (UTF-8 character code)
 

 Key: HIVE-2905
 URL: https://issues.apache.org/jira/browse/HIVE-2905
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.7.0, 0.10.0
 Environment: hive 0.7.0, mysql 5.1.45
 hive 0.10.0, mysql 5.5.30
Reporter: Sheng Zhou
  Labels: patch
 Attachments: utf8-desc-comment.patch


 When desc a table with command line or hive jdbc way, the table's comment 
 can't be read.
 1. I have updated javax.jdo.option.ConnectionURL parameter in hive-site.xml 
 file.
jdbc:mysql://*.*.*.*:3306/hive?characterEncoding=UTF-8
 2. In mysql database, the comment field of COLUMNS table can be read normally.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4242) Predicate push down should also be provided to InputFormats

Owen O'Malley created HIVE-4242:
---

 Summary: Predicate push down should also be provided to 
InputFormats
 Key: HIVE-4242
 URL: https://issues.apache.org/jira/browse/HIVE-4242
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently, the push down predicate is only provided to native tables if the 
hive.optimize.index.filter configuration variable is set. There is no reason to 
prevent InputFormats from getting the required information to do predicate push 
down.

Obviously, this will be very useful for ORC.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4227) Add column level encryption to ORC files


[ 
https://issues.apache.org/jira/browse/HIVE-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616329#comment-13616329
 ] 

Owen O'Malley commented on HIVE-4227:
-

Supun,
  I've tagged this for Google Summer of Code. Take a look at:
http://www.google-melange.com/gsoc/homepage/google/gsoc2013

 Add column level encryption to ORC files
 

 Key: HIVE-4227
 URL: https://issues.apache.org/jira/browse/HIVE-4227
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
  Labels: gsoc, gsoc2013

 It would be useful to support column level encryption in ORC files. Since 
 each column and its associated index is stored separately, encrypting a 
 column separately isn't difficult. In terms of key distribution, it would 
 make sense to use an external server like the one in HADOOP-9331.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4243) Fix column names in FileSinkOperator

Owen O'Malley created HIVE-4243:
---

 Summary: Fix column names in FileSinkOperator
 Key: HIVE-4243
 URL: https://issues.apache.org/jira/browse/HIVE-4243
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley


All of the ObjectInspectors given to SerDe's by FileSinkOperator have virtual 
column names. Since the files are part of tables, Hive knows the column names. 
For self-describing file formats like ORC, having the real column names will 
improve the understandability.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-2162) Upgrade dependencies to Hadoop 0.20.2 and 0.20.203.0


 [ 
https://issues.apache.org/jira/browse/HIVE-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HIVE-2162.
-

Resolution: Duplicate

This has been fixed already.

 Upgrade dependencies to Hadoop 0.20.2 and 0.20.203.0
 

 Key: HIVE-2162
 URL: https://issues.apache.org/jira/browse/HIVE-2162
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley

 Hadoop has released 0.20.203.0 and we should upgrade Hive's dependency to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4244) Make string dictionaries adaptive in ORC

Owen O'Malley created HIVE-4244:
---

 Summary: Make string dictionaries adaptive in ORC
 Key: HIVE-4244
 URL: https://issues.apache.org/jira/browse/HIVE-4244
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley


The ORC writer should adaptively switch between dictionary and direct encoding. 
I'd propose looking at the first 100,000 values in each column and decide 
whether there is sufficient loading in the dictionary to use dictionary 
encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4245) Implement numeric dictionaries in ORC

Owen O'Malley created HIVE-4245:
---

 Summary: Implement numeric dictionaries in ORC
 Key: HIVE-4245
 URL: https://issues.apache.org/jira/browse/HIVE-4245
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley


For many applications, especially in de-normalized data, there is a lot of 
redundancy in the numeric columns. Therefore, it would make sense to adaptively 
use dictionary encodings for numeric columns in addition to string columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-4121) ORC should have optional dictionaries for both strings and numeric types