[jira] [Commented] (HIVE-3709) Stop storing default ConfVars in temp file

2012-11-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504437#comment-13504437
 ] 

Ashutosh Chauhan commented on HIVE-3709:


Kevin, Will HADOOP-8573 fix this?

 Stop storing default ConfVars in temp file
 --

 Key: HIVE-3709
 URL: https://issues.apache.org/jira/browse/HIVE-3709
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3709.1.patch.txt, HIVE-3709.2.patch.txt, 
 HIVE-3709.3.patch.txt


 To work around issues with Hadoop's Configuration object, specifically it's 
 addResource(InputStream), default configurations are written to a temp file 
 (I think HIVE-2362 introduced this).
 This, however, introduces the problem that once that file is deleted from 
 /tmp the client crashes.  This is particularly problematic for long running 
 services like the metastore server.
 Writing a custom InputStream to deal with the problems in the Configuration 
 object should provide a work around, which does not introduce a time bomb 
 into Hive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2266) Fix compression parameters

2012-11-27 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504479#comment-13504479
 ] 

Harsh J commented on HIVE-2266:
---

bq. Hadoop loads native compression libraries. I believe that they are platform 
dependent hence I do not assume that they always have same compression ratio. 
Please correct me if I am wrong here.

Compression is based on standard algorithms, which is platform independent. The 
native code is platform-dependent cause of the library references it has.

 Fix compression parameters
 --

 Key: HIVE-2266
 URL: https://issues.apache.org/jira/browse/HIVE-2266
 Project: Hive
  Issue Type: Bug
Reporter: Vaibhav Aggarwal
Assignee: Vaibhav Aggarwal
 Attachments: HIVE-2266-2.patch, HIVE-2266.patch


 There are a number of places where compression values are not set correctly 
 in FileSinkOperator. This results in uncompressed files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3633) sort-merge join does not work with sub-queries

2012-11-27 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3633:
-

Status: Patch Available  (was: Open)

comments addressed -- all tests passed

 sort-merge join does not work with sub-queries
 --

 Key: HIVE-3633
 URL: https://issues.apache.org/jira/browse/HIVE-3633
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3633.1.patch, hive.3633.2.patch, hive.3633.3.patch, 
 hive.3633.4.patch, hive.3633.5.patch, hive.3633.6.patch, hive.3633.7.patch


 Consider the following query:
 create table smb_bucket_1(key int, value string) CLUSTERED BY (key) SORTED BY 
 (key) INTO 6 BUCKETS STORED AS TEXTFILE;
 create table smb_bucket_2(key int, value string) CLUSTERED BY (key) SORTED BY 
 (key) INTO 6 BUCKETS STORED AS TEXTFILE;
 -- load the above tables
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 set hive.input.format = 
 org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 explain
 select count(*) from
 (
 select /*+mapjoin(a)*/ a.key as key1, b.key as key2, a.value as value1, 
 b.value as value2
 from smb_bucket_1 a join smb_bucket_2 b on a.key = b.key)
 subq;
 The above query does not use sort-merge join. This would be very useful as we 
 automatically convert the queries to use sorting and bucketing properties for 
 join.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false #212

2012-11-27 Thread Apache Jenkins Server
See 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/

--
[...truncated 9912 lines...]

compile-test:
 [echo] Project: serde
[javac] Compiling 26 source files to 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/artifact/hive/build/serde/test/classes
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

create-dirs:
 [echo] Project: service
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/service/src/test/resources
 does not exist.

init:
 [echo] Project: service

ivy-init-settings:
 [echo] Project: service

ivy-resolve:
 [echo] Project: service
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml
[ivy:report] Processing 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/artifact/hive/build/ivy/resolution-cache/org.apache.hive-hive-service-default.xml
 to 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/artifact/hive/build/ivy/report/org.apache.hive-hive-service-default.html

ivy-retrieve:
 [echo] Project: service

compile:
 [echo] Project: service

ivy-resolve-test:
 [echo] Project: service

ivy-retrieve-test:
 [echo] Project: service

compile-test:
 [echo] Project: service
[javac] Compiling 2 source files to 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/artifact/hive/build/service/test/classes

test:
 [echo] Project: hive

test-shims:
 [echo] Project: hive

test-conditions:
 [echo] Project: shims

gen-test:
 [echo] Project: shims

create-dirs:
 [echo] Project: shims
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/shims/src/test/resources
 does not exist.

init:
 [echo] Project: shims

ivy-init-settings:
 [echo] Project: shims

ivy-resolve:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml
[ivy:report] Processing 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/artifact/hive/build/ivy/resolution-cache/org.apache.hive-hive-shims-default.xml
 to 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/artifact/hive/build/ivy/report/org.apache.hive-hive-shims-default.html

ivy-retrieve:
 [echo] Project: shims

compile:
 [echo] Project: shims
 [echo] Building shims 0.20

build_shims:
 [echo] Project: shims
 [echo] Compiling 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/shims/src/common/java;/home/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.20/java
 against hadoop 0.20.2 
(https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/artifact/hive/build/hadoopcore/hadoop-0.20.2)

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml

ivy-retrieve-hadoop-shim:
 [echo] Project: shims
 [echo] Building shims 0.20S

build_shims:
 [echo] Project: shims
 [echo] Compiling 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/shims/src/common/java;/home/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common-secure/java;/home/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.20S/java
 against hadoop 1.0.0 
(https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/artifact/hive/build/hadoopcore/hadoop-1.0.0)

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml

ivy-retrieve-hadoop-shim:
 [echo] Project: shims
 [echo] Building shims 0.23

build_shims:
 [echo] Project: shims
 [echo] Compiling 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/shims/src/common/java;/home/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common-secure/java;/home/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.23/java
 against hadoop 0.23.3 
(https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/212/artifact/hive/build/hadoopcore/hadoop-0.23.3)


[jira] [Resolved] (HIVE-3234) getting the reporter in the recordwriter

2012-11-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-3234.


   Resolution: Fixed
Fix Version/s: (was: 0.9.1)
   0.10.0

Committed to trunk and 0.10. Thanks, Owen!

 getting the reporter in the recordwriter
 

 Key: HIVE-3234
 URL: https://issues.apache.org/jira/browse/HIVE-3234
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.9.1
 Environment: any
Reporter: Jimmy Hu
Assignee: Owen O'Malley
  Labels: newbie
 Fix For: 0.10.0

 Attachments: HIVE-3234.D6699.1.patch, HIVE-3234.D6699.2.patch, 
 HIVE-3234.D6987.1.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 We would like to generate some custom statistics and report back to 
 map/reduce later wen implement the 
  FileSinkOperator.RecordWriter interface. However, the current interface 
 design doesn't allow us to get the map reduce reporter object. Please extend 
 the current FileSinkOperator.RecordWriter interface so that it's close() 
 method passes in a map reduce reporter object. 
 For the same reason, please also extend the RecordReader interface too to 
 include a reporter object so that users can passes in custom map reduce  
 counters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3723) Hive Driver leaks ZooKeeper connections

2012-11-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-3723.


   Resolution: Fixed
Fix Version/s: 0.10.0

Committed to trunk and 0.10. Thanks, Gunther!

 Hive Driver leaks ZooKeeper connections
 ---

 Key: HIVE-3723
 URL: https://issues.apache.org/jira/browse/HIVE-3723
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.10.0

 Attachments: HIVE-3723.1-r1411423.patch


 In certain error cases (i.e.: statement fails to compile, semantic errors) 
 the hive driver leaks zookeeper connections.
 This can be seen in the TestNegativeCliDriver test which accumulates a large 
 number of open file handles and fails if the max allowed number of file 
 handles isn't at least 2048.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HIVE-3676) INSERT INTO regression caused by HIVE-3465

2012-11-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan reopened HIVE-3676:



Carl / Navis, 
After this commit ant test -Dtestcase=TestCliDriver -Dqfile=insert1.q is 
failing consistently on trunk. First failure was reported on 
https://builds.apache.org/job/Hive-trunk-h0.21/1805/

Can you take a look?

 INSERT INTO regression caused by HIVE-3465
 --

 Key: HIVE-3676
 URL: https://issues.apache.org/jira/browse/HIVE-3676
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Carl Steinbach
Assignee: Navis
 Fix For: 0.10.0

 Attachments: HIVE-3676.D6741.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3645) RCFileWriter does not implement the right function to support Federation

2012-11-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3645:
---

   Resolution: Fixed
Fix Version/s: 0.11
 Assignee: Arup Malakar
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Arup!

 RCFileWriter does not implement the right function to support Federation
 

 Key: HIVE-3645
 URL: https://issues.apache.org/jira/browse/HIVE-3645
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.0, 0.10.0
 Environment: Hadoop 0.23.3 federation, Hive 0.9 and Pig 0.10
Reporter: Viraj Bhat
Assignee: Arup Malakar
 Fix For: 0.11

 Attachments: HIVE_3645_branch_0.patch, HIVE_3645_trunk_0.patch


 Create a table using Hive DDL
 {code}
 CREATE TABLE tmp_hcat_federated_numbers_part_1 (
   id   int,  
   intnum   int,
   floatnum float
 )partitioned by (
   part1string,
   part2string
 )
 STORED AS rcfile
 LOCATION 'viewfs:///database/tmp_hcat_federated_numbers_part_1';
 {code}
 Populate it using Pig:
 {code}
 A = load 'default.numbers_pig' using org.apache.hcatalog.pig.HCatLoader();
 B = filter A by id =  500;
 C = foreach B generate (int)id, (int)intnum, (float)floatnum;
 store C into
 'default.tmp_hcat_federated_numbers_part_1'
 using org.apache.hcatalog.pig.HCatStorer
('part1=pig, part2=hcat_pig_insert',
 'id: int,intnum: int,floatnum: float');
 {code}
 Generates the following error when running on a Federated Cluster:
 {quote}
 2012-10-29 20:40:25,011 [main] ERROR
 org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate
 exception from backed error: AttemptID:attempt_1348522594824_0846_m_00_3
 Info:Error: org.apache.hadoop.fs.viewfs.NotInMountpointException:
 getDefaultReplication on empty path is invalid
 at
 org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:479)
 at org.apache.hadoop.hive.ql.io.RCFile$Writer.init(RCFile.java:723)
 at org.apache.hadoop.hive.ql.io.RCFile$Writer.init(RCFile.java:705)
 at
 org.apache.hadoop.hive.ql.io.RCFileOutputFormat.getRecordWriter(RCFileOutputFormat.java:86)
 at
 org.apache.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:100)
 at
 org.apache.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:228)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:84)
 at
 org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.init(MapTask.java:587)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:706)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3648) HiveMetaStoreFsImpl is not compatible with hadoop viewfs

2012-11-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3648:
---

Assignee: Arup Malakar
  Status: Open  (was: Patch Available)

Arup, All the tests passed. But, patch now conflicts because of HIVE-3645 
commit. Can you refresh the patch on trunk?

 HiveMetaStoreFsImpl is not compatible with hadoop viewfs
 

 Key: HIVE-3648
 URL: https://issues.apache.org/jira/browse/HIVE-3648
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.9.0, 0.10.0
Reporter: Kihwal Lee
Assignee: Arup Malakar
 Attachments: HIVE_3648_branch_0.patch, HIVE-3648-trunk-0.patch, 
 HIVE_3648_trunk_1.patch


 HiveMetaStoreFsImpl#deleteDir() method calls Trash#moveToTrash(). This may 
 not work when viewfs is used. It needs to call Trash#moveToAppropriateTrash() 
 instead.  Please note that this method is not available in hadoop versions 
 earlier than 0.23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3742) The derby metastore schema script for 0.10.0 doesn't run

2012-11-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-3742.


   Resolution: Fixed
Fix Version/s: 0.10.0

Committed to trunk and 0.10. Thanks, Prasad!

 The derby metastore schema script for 0.10.0 doesn't run
 

 Key: HIVE-3742
 URL: https://issues.apache.org/jira/browse/HIVE-3742
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
 Fix For: 0.10.0

 Attachments: HIVE-3742-2.patch, HIVE-3742.patch


 The hive-schema-0.10.0.derby.sql contains incorrect alter statement for 
 SKEWED_STRING_LIST which causes the script execution to fail

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3709) Stop storing default ConfVars in temp file

2012-11-27 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504814#comment-13504814
 ] 

Kevin Wilfong commented on HIVE-3709:
-

It looks like that fixes the issue on a single thread where it ends up reading 
from the same InputStream repeatedly, which is why I overrode the close method 
to reset the InputStream.

It does not look like it will fix the multi-threaded issue.  If two threads get 
Configuration objects constructed using the copy constructor, and hence get the 
same InputStream since the resources are not cloned themselves, and 
loadResources has not been called before the copy constructor, it looks like it 
could be possible that both threads call loadResources at about the same time 
causing the issues Carl was seeing in TestHiveServerSessions.

 Stop storing default ConfVars in temp file
 --

 Key: HIVE-3709
 URL: https://issues.apache.org/jira/browse/HIVE-3709
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3709.1.patch.txt, HIVE-3709.2.patch.txt, 
 HIVE-3709.3.patch.txt


 To work around issues with Hadoop's Configuration object, specifically it's 
 addResource(InputStream), default configurations are written to a temp file 
 (I think HIVE-2362 introduced this).
 This, however, introduces the problem that once that file is deleted from 
 /tmp the client crashes.  This is particularly problematic for long running 
 services like the metastore server.
 Writing a custom InputStream to deal with the problems in the Configuration 
 object should provide a work around, which does not introduce a time bomb 
 into Hive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3648) HiveMetaStoreFsImpl is not compatible with hadoop viewfs

2012-11-27 Thread Arup Malakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arup Malakar updated HIVE-3648:
---

Attachment: HIVE-3648-trunk-1.patch

Thanks Ashutosh for looking into the patch. I have updated the patch to reflect 
the last commit.

 HiveMetaStoreFsImpl is not compatible with hadoop viewfs
 

 Key: HIVE-3648
 URL: https://issues.apache.org/jira/browse/HIVE-3648
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.9.0, 0.10.0
Reporter: Kihwal Lee
Assignee: Arup Malakar
 Attachments: HIVE_3648_branch_0.patch, HIVE-3648-trunk-0.patch, 
 HIVE_3648_trunk_1.patch, HIVE-3648-trunk-1.patch


 HiveMetaStoreFsImpl#deleteDir() method calls Trash#moveToTrash(). This may 
 not work when viewfs is used. It needs to call Trash#moveToAppropriateTrash() 
 instead.  Please note that this method is not available in hadoop versions 
 earlier than 0.23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3645) RCFileWriter does not implement the right function to support Federation

2012-11-27 Thread Arup Malakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504846#comment-13504846
 ] 

Arup Malakar commented on HIVE-3645:


Thanks Ashutosh for looking into the patch. If the branch patch looks fine can 
you please commit this to 0.9 branch as well?

 RCFileWriter does not implement the right function to support Federation
 

 Key: HIVE-3645
 URL: https://issues.apache.org/jira/browse/HIVE-3645
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.0, 0.10.0
 Environment: Hadoop 0.23.3 federation, Hive 0.9 and Pig 0.10
Reporter: Viraj Bhat
Assignee: Arup Malakar
 Fix For: 0.11

 Attachments: HIVE_3645_branch_0.patch, HIVE_3645_trunk_0.patch


 Create a table using Hive DDL
 {code}
 CREATE TABLE tmp_hcat_federated_numbers_part_1 (
   id   int,  
   intnum   int,
   floatnum float
 )partitioned by (
   part1string,
   part2string
 )
 STORED AS rcfile
 LOCATION 'viewfs:///database/tmp_hcat_federated_numbers_part_1';
 {code}
 Populate it using Pig:
 {code}
 A = load 'default.numbers_pig' using org.apache.hcatalog.pig.HCatLoader();
 B = filter A by id =  500;
 C = foreach B generate (int)id, (int)intnum, (float)floatnum;
 store C into
 'default.tmp_hcat_federated_numbers_part_1'
 using org.apache.hcatalog.pig.HCatStorer
('part1=pig, part2=hcat_pig_insert',
 'id: int,intnum: int,floatnum: float');
 {code}
 Generates the following error when running on a Federated Cluster:
 {quote}
 2012-10-29 20:40:25,011 [main] ERROR
 org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate
 exception from backed error: AttemptID:attempt_1348522594824_0846_m_00_3
 Info:Error: org.apache.hadoop.fs.viewfs.NotInMountpointException:
 getDefaultReplication on empty path is invalid
 at
 org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:479)
 at org.apache.hadoop.hive.ql.io.RCFile$Writer.init(RCFile.java:723)
 at org.apache.hadoop.hive.ql.io.RCFile$Writer.init(RCFile.java:705)
 at
 org.apache.hadoop.hive.ql.io.RCFileOutputFormat.getRecordWriter(RCFileOutputFormat.java:86)
 at
 org.apache.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:100)
 at
 org.apache.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:228)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:84)
 at
 org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.init(MapTask.java:587)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:706)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-27 Thread Shreepadma Venugopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shreepadma Venugopalan updated HIVE-3678:
-

Attachment: HIVE-3678.4.patch.txt

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, 
 HIVE-3678.3.patch.txt, HIVE-3678.4.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-27 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504852#comment-13504852
 ] 

Shreepadma Venugopalan commented on HIVE-3678:
--

Uploaded patch rebased off tip of trunk. Thanks.

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, 
 HIVE-3678.3.patch.txt, HIVE-3678.4.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3748) QTestUtil should correctly find data files when running in the build directory

2012-11-27 Thread Mikhail Bautin (JIRA)
Mikhail Bautin created HIVE-3748:


 Summary: QTestUtil should correctly find data files when running 
in the build directory
 Key: HIVE-3748
 URL: https://issues.apache.org/jira/browse/HIVE-3748
 Project: Hive
  Issue Type: Improvement
Reporter: Mikhail Bautin
Priority: Minor


Some parts of the the TestCliDriver test suite (i.e. some jar lookups) require 
that the current directory is set to the build directory. This makes QTestUtil 
correctly find data files when running either in the Hive source root or in the 
build directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3748) QTestUtil should correctly find data files when running in the build directory

2012-11-27 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3748:
--

Attachment: D7005.1.patch

mbautin requested code review of [jira] [HIVE-3748] QTestUtil should correctly 
find data files when running in the build directory.
Reviewers: ashutoshc, JIRA, njain

  Some parts of the the TestCliDriver test suite (i.e. some jar lookups) 
require that the current directory is set to the build directory. This makes 
QTestUtil correctly find data files when running either in the Hive source root 
or in the build directory.


TEST PLAN
  Run TestCliDriver

REVISION DETAIL
  https://reviews.facebook.net/D7005

AFFECTED FILES
  ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/16521/

To: ashutoshc, JIRA, njain, mbautin


 QTestUtil should correctly find data files when running in the build directory
 --

 Key: HIVE-3748
 URL: https://issues.apache.org/jira/browse/HIVE-3748
 Project: Hive
  Issue Type: Improvement
Reporter: Mikhail Bautin
Priority: Minor
 Attachments: D7005.1.patch


 Some parts of the the TestCliDriver test suite (i.e. some jar lookups) 
 require that the current directory is set to the build directory. This makes 
 QTestUtil correctly find data files when running either in the Hive source 
 root or in the build directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504869#comment-13504869
 ] 

Ashutosh Chauhan commented on HIVE-3678:


Thanks, Shreepadma for updating patch. Running tests now.

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, 
 HIVE-3678.3.patch.txt, HIVE-3678.4.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3648) HiveMetaStoreFsImpl is not compatible with hadoop viewfs

2012-11-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-3648.


   Resolution: Fixed
Fix Version/s: 0.11

Committed to trunk. Thanks, Arup!

 HiveMetaStoreFsImpl is not compatible with hadoop viewfs
 

 Key: HIVE-3648
 URL: https://issues.apache.org/jira/browse/HIVE-3648
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.9.0, 0.10.0
Reporter: Kihwal Lee
Assignee: Arup Malakar
 Fix For: 0.11

 Attachments: HIVE_3648_branch_0.patch, HIVE-3648-trunk-0.patch, 
 HIVE_3648_trunk_1.patch, HIVE-3648-trunk-1.patch


 HiveMetaStoreFsImpl#deleteDir() method calls Trash#moveToTrash(). This may 
 not work when viewfs is used. It needs to call Trash#moveToAppropriateTrash() 
 instead.  Please note that this method is not available in hadoop versions 
 earlier than 0.23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3665) Allow URIs without port to be specified in metatool

2012-11-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504890#comment-13504890
 ] 

Ashutosh Chauhan commented on HIVE-3665:


+1

 Allow URIs without port to be specified in metatool
 ---

 Key: HIVE-3665
 URL: https://issues.apache.org/jira/browse/HIVE-3665
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Attachments: HIVE-3665.1.patch.txt


 Metatool should accept input URIs where one URI contains a port and the other 
 doesn't. While metatool today accepts input URIs without the port when both 
 the input URIs (oldLoc and newLoc) don't contain the port, we should make the 
 tool a little more flexible to allow for the case where one URI contains a 
 valid port and the other input URI doesn't. This makes more sense when 
 transitioning to HA and a user chooses to specify the port as part of the 
 oldLoc, but the port doesn't mean much for the newLoc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #212

2012-11-27 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/212/

--
[...truncated 36470 lines...]
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/jenkins/hive_2012-11-27_12-43-58_494_8381877270687109129/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201211271244_1754915096.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] Copying file: 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/jenkins/hive_2012-11-27_12-44-02_388_4658100672971353387/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/jenkins/hive_2012-11-27_12-44-02_388_4658100672971353387/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201211271244_1481136741.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201211271244_1247248829.txt
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201211271244_1449552180.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] Copying file: 

[jira] [Commented] (HIVE-3746) TRowSet resultset structure should be column-oriented

2012-11-27 Thread Phil Prudich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504959#comment-13504959
 ] 

Phil Prudich commented on HIVE-3746:


To make sure I'm reading the new thrift definitions correctly -- does this mean 
that all rows' column 1 values will come first on the wire, and then be 
followed by all rows' values for column 2, and so on?  I clearly see how this 
would save bytes on the wire.

However, any client trying to return rows one-at-a-time to an application would 
be required to read, process, and buffer almost an entire reply-worth of data 
before being able to return the first complete row.

I'm unfamiliar with the server code; but similar buffering may be needed there 
as well.

Is my understanding of the issue correct?

 TRowSet resultset structure should be column-oriented
 -

 Key: HIVE-3746
 URL: https://issues.apache.org/jira/browse/HIVE-3746
 Project: Hive
  Issue Type: Sub-task
  Components: Server Infrastructure
Reporter: Carl Steinbach
Assignee: Carl Steinbach



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3734) Static partition DML create duplicate files and records

2012-11-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13504962#comment-13504962
 ] 

Ashutosh Chauhan commented on HIVE-3734:


Gang,
I fail to see a bug here. You didn't show how you created the srcpart, but I 
assume you did similar to following: 
{code}
create table srcpart (key string, value string) partitioned by (ds string, hr 
string);
load data local inpath '/home/ashutosh/workspace/hive/data/files/kv1.txt' 
overwrite into table srcpart partition (ds='2008-04-08', hr='11');
load data local inpath '/home/ashutosh/workspace/hive/data/files/kv1.txt' 
overwrite into table srcpart partition (ds='2008-04-08', hr='12');
load data local inpath '/home/ashutosh/workspace/hive/data/files/kv1.txt' 
overwrite into table srcpart partition (ds='2008-04-09', hr='11');
load data local inpath '/home/ashutosh/workspace/hive/data/files/kv1.txt' 
overwrite into table srcpart partition (ds='2008-04-09', hr='12');
{code}

If so, in your insert statement, you are going to select all the rows from 
srcpart corresponding to ds=2008-04-08 which includes rows corresponding to 
both hr=11 and hr=12 and then insert into testtable in partition 
ds='2008-04-08', hr='11'. This implies rows corresponding to hr=12 in srcpart 
will be in hr=11 in testtable. Then if you are going to do select key, value 
from testtable where ds='2008-04-08' and hr='11' and key = 484; you will get 
two rows since hr='11' in testable has rows from hr='12' also of srcpart. This 
is expected. This is how partitioning has always worked in Hive. To be doubly 
sure, I also checked on hive-0.9, it has same behavior. 
Though, I agree it is bit confusing.

 Static partition DML create duplicate files and records
 ---

 Key: HIVE-3734
 URL: https://issues.apache.org/jira/browse/HIVE-3734
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Gang Tim Liu

 Static DML create duplicate files and record.
 Given the following test case, hive will return 2 records:
 484   val_484
 484   val_484
 but srcpart returns one record:
 484   val_484
 If you look at file system, DML generates duplicate file with the same 
 content:
 -rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0
 -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0
 Test Case
 ===
 set hive.mapred.supports.subdirectories=true;
 set hive.exec.dynamic.partition=true;
 set hive.exec.dynamic.partition.mode=nonstrict;
 set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
 set hive.merge.mapfiles=false;
 set hive.merge.mapredfiles=false;
 set mapred.input.dir.recursive=true;
 create table testtable (key String, value String) partitioned by (ds String, 
 hr String) ;
 explain extended
 insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
 key, value from srcpart where ds='2008-04-08';
 insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
 key, value from srcpart where ds='2008-04-08';
 desc formatted testtable partition (ds='2008-04-08', hr='11');
 select count(1) from srcpart where ds='2008-04-08';
 select count(1) from testtable where ds='2008-04-08';
 select key, value from srcpart where ds='2008-04-08' and hr='11' and key = 
 484;
 explain extended
 select key, value from testtable where ds='2008-04-08' and hr='11' and key = 
 484;
 select key, value from testtable where ds='2008-04-08' and hr='11' and key = 
 484;
 ===

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3734) Static partition DML create duplicate files and records

2012-11-27 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu resolved HIVE-3734.


Resolution: Not A Problem

 Static partition DML create duplicate files and records
 ---

 Key: HIVE-3734
 URL: https://issues.apache.org/jira/browse/HIVE-3734
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Gang Tim Liu

 Static DML create duplicate files and record.
 Given the following test case, hive will return 2 records:
 484   val_484
 484   val_484
 but srcpart returns one record:
 484   val_484
 If you look at file system, DML generates duplicate file with the same 
 content:
 -rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 00_0
 -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 01_0
 Test Case
 ===
 set hive.mapred.supports.subdirectories=true;
 set hive.exec.dynamic.partition=true;
 set hive.exec.dynamic.partition.mode=nonstrict;
 set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
 set hive.merge.mapfiles=false;
 set hive.merge.mapredfiles=false;
 set mapred.input.dir.recursive=true;
 create table testtable (key String, value String) partitioned by (ds String, 
 hr String) ;
 explain extended
 insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
 key, value from srcpart where ds='2008-04-08';
 insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
 key, value from srcpart where ds='2008-04-08';
 desc formatted testtable partition (ds='2008-04-08', hr='11');
 select count(1) from srcpart where ds='2008-04-08';
 select count(1) from testtable where ds='2008-04-08';
 select key, value from srcpart where ds='2008-04-08' and hr='11' and key = 
 484;
 explain extended
 select key, value from testtable where ds='2008-04-08' and hr='11' and key = 
 484;
 select key, value from testtable where ds='2008-04-08' and hr='11' and key = 
 484;
 ===

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3724) Metastore tests use hardcoded ports

2012-11-27 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3724:


Status: Patch Available  (was: Open)

 Metastore tests use hardcoded ports
 ---

 Key: HIVE-3724
 URL: https://issues.apache.org/jira/browse/HIVE-3724
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
Priority: Minor
 Attachments: HIVE-3724.1.patch.txt


 Several of the metastore tests use hardcoded ports for remote metastore 
 Thrift servers.  This is causing transient failures in Jenkins, e.g. 
 https://builds.apache.org/job/Hive-trunk-h0.21/1804/
 A few tests already dynamically determine free ports, and this logic can be 
 shared.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3726) History file closed in finalize method

2012-11-27 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505047#comment-13505047
 ] 

Gunther Hagleitner commented on HIVE-3726:
--

Good points. I'll look into it.

 History file closed in finalize method
 --

 Key: HIVE-3726
 URL: https://issues.apache.org/jira/browse/HIVE-3726
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.9.0, 0.10.0
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-3726.2-r1411423.patch, HIVE-3736.1-r1411423.patch


 TestCliNegative fails intermittently because it's up to the garbage collector 
 to close History files. This is only a problem if you deal with a lot of 
 SessionState objects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3648) HiveMetaStoreFsImpl is not compatible with hadoop viewfs

2012-11-27 Thread Arup Malakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505061#comment-13505061
 ] 

Arup Malakar commented on HIVE-3648:


Ashutosh, thanks for committing in trunk. Can you commit it to branch-0.9 as 
well? I will provide the rebased patch once HIVE-3645 is committed for branch.

 HiveMetaStoreFsImpl is not compatible with hadoop viewfs
 

 Key: HIVE-3648
 URL: https://issues.apache.org/jira/browse/HIVE-3648
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.9.0, 0.10.0
Reporter: Kihwal Lee
Assignee: Arup Malakar
 Fix For: 0.11

 Attachments: HIVE_3648_branch_0.patch, HIVE-3648-trunk-0.patch, 
 HIVE_3648_trunk_1.patch, HIVE-3648-trunk-1.patch


 HiveMetaStoreFsImpl#deleteDir() method calls Trash#moveToTrash(). This may 
 not work when viewfs is used. It needs to call Trash#moveToAppropriateTrash() 
 instead.  Please note that this method is not available in hadoop versions 
 earlier than 0.23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3676) INSERT INTO regression caused by HIVE-3465

2012-11-27 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505069#comment-13505069
 ] 

Navis commented on HIVE-3676:
-

@Ashutosh,
Newly added test case seemed not deterministic. I'll fix this in another issue. 
Sorry.

 INSERT INTO regression caused by HIVE-3465
 --

 Key: HIVE-3676
 URL: https://issues.apache.org/jira/browse/HIVE-3676
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Carl Steinbach
Assignee: Navis
 Fix For: 0.10.0

 Attachments: HIVE-3676.D6741.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3749) New test cases added by HIVE-3676 in insert1.q is not deterministic

2012-11-27 Thread Navis (JIRA)
Navis created HIVE-3749:
---

 Summary: New test cases added by HIVE-3676 in insert1.q is not 
deterministic
 Key: HIVE-3749
 URL: https://issues.apache.org/jira/browse/HIVE-3749
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Navis
Assignee: Navis


The test case inserts two row and selects those all. But the displaying order 
can be different from env to env.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3676) INSERT INTO regression caused by HIVE-3465

2012-11-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505076#comment-13505076
 ] 

Ashutosh Chauhan commented on HIVE-3676:


Thanks, Navis for taking a look. Yeah, it seems non-deterministic. It passed 
for me on one machine, but failed on other two. Appreciate your help. If you 
are going to open a new jira to fix this, feel free to resolve this one.

 INSERT INTO regression caused by HIVE-3465
 --

 Key: HIVE-3676
 URL: https://issues.apache.org/jira/browse/HIVE-3676
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Carl Steinbach
Assignee: Navis
 Fix For: 0.10.0

 Attachments: HIVE-3676.D6741.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-h0.21 - Build # 1820 - Still Failing

2012-11-27 Thread Apache Jenkins Server
Changes for Build #1775
[namit] HIVE-3673 Sort merge join not used when join columns have different 
names
(Kevin Wilfong via namit)


Changes for Build #1776
[kevinwilfong] HIVE-3627. eclipse misses library: 
javolution-@javolution-version@.jar. (Gang Tim Liu via kevinwilfong)


Changes for Build #1777
[kevinwilfong] HIVE-3524. Storing certain Exception objects thrown in 
HiveMetaStore.java in MetaStoreEndFunctionContext. (Maheshwaran Srinivasan via 
kevinwilfong)

[cws] HIVE-1977. DESCRIBE TABLE syntax doesn't support specifying a database 
qualified table name (Zhenxiao Luo via cws)

[cws] HIVE-3674. Test case TestParse broken after recent checkin (Sambavi 
Muthukrishnan via cws)


Changes for Build #1778
[cws] HIVE-1362. Column level scalar valued statistics on Tables and Partitions 
(Shreepadma Venugopalan via cws)


Changes for Build #1779

Changes for Build #1780
[kevinwilfong] HIVE-3686. Fix compile errors introduced by the interaction of 
HIVE-1362 and HIVE-3524. (Shreepadma Venugopalan via kevinwilfong)


Changes for Build #1781
[namit] HIVE-3687 smb_mapjoin_13.q is nondeterministic
(Kevin Wilfong via namit)


Changes for Build #1782
[hashutosh] HIVE-2715: Upgrade Thrift dependency to 0.9.0 (Ashutosh Chauhan)


Changes for Build #1783
[kevinwilfong] HIVE-3654. block relative path access in hive. (njain via 
kevinwilfong)

[hashutosh] HIVE-3658 : Unable to generate the Hbase related unit tests using 
velocity templates on Windows (Kanna Karanam via Ashutosh Chauhan)

[hashutosh] HIVE-3661 : Remove the Windows specific = related swizzle path 
changes from Proxy FileSystems (Kanna Karanam via Ashutosh Chauhan)

[hashutosh] HIVE-3480 : Resource leak: Fix the file handle leaks in Symbolic 
 Symlink related input formats. (Kanna Karanam via Ashutosh Chauhan)


Changes for Build #1784
[kevinwilfong] HIVE-3675. NaN does not work correctly for round(n). (njain via 
kevinwilfong)

[cws] HIVE-3651. bucketmapjoin?.q tests fail with hadoop 0.23 (Prasad Mujumdar 
via cws)


Changes for Build #1785
[namit] HIVE-3613 Implement grouping_id function
(Ian Gorbachev via namit)

[namit] HIVE-3692 Update parallel test documentation
(Ivan Gorbachev via namit)

[namit] HIVE-3649 Hive List Bucketing - enhance DDL to specify list bucketing 
table
(Gang Tim Liu via namit)


Changes for Build #1786
[namit] HIVE-3696 Revert HIVE-3483 which causes performance regression
(Gang Tim Liu via namit)


Changes for Build #1787
[kevinwilfong] HIVE-3621. Make prompt in Hive CLI configurable. (Jingwei Lu via 
kevinwilfong)

[kevinwilfong] HIVE-3695. TestParse breaks due to HIVE-3675. (njain via 
kevinwilfong)


Changes for Build #1788
[kevinwilfong] HIVE-3557. Access to external URLs in hivetest.py. (Ivan 
Gorbachev via kevinwilfong)


Changes for Build #1789
[hashutosh] HIVE-3662 : TestHiveServer: testScratchDirShouldClearWhileStartup 
is failing on Windows (Kanna Karanam via Ashutosh Chauhan)

[hashutosh] HIVE-3659 : TestHiveHistory::testQueryloglocParentDirNotExist Test 
fails on Windows because of some resource leaks in ZK (Kanna Karanam via 
Ashutosh Chauhan)

[hashutosh] HIVE-3663 Unable to display the MR Job file path on Windows in case 
of MR job failures.  (Kanna Karanam via Ashutosh Chauhan)


Changes for Build #1790

Changes for Build #1791

Changes for Build #1792

Changes for Build #1793
[hashutosh] HIVE-3704 : name of some metastore scripts are not per convention 
(Ashutosh Chauhan)


Changes for Build #1794
[hashutosh] HIVE-3243 : ignore white space between entries of hive/hbase table 
mapping (Shengsheng Huang via Ashutosh Chauhan)

[hashutosh] HIVE-3215 : JobDebugger should use RunningJob.getTrackingURL 
(Bhushan Mandhani via Ashutosh Chauhan)


Changes for Build #1795
[cws] HIVE-3437. 0.23 compatibility: fix unit tests when building against 0.23 
(Chris Drome via cws)

[hashutosh] HIVE-3626 : RetryingHMSHandler should wrap JDOException inside 
MetaException (Bhushan Mandhani via Ashutosh Chauhan)

[hashutosh] HIVE-3560 : Hive always prints a warning message when using remote 
metastore (Travis Crawford via Ashutosh Chauhan)


Changes for Build #1796

Changes for Build #1797
[hashutosh] HIVE-3664 : Avoid to create a symlink for hive-contrib.jar file in 
dist\lib folder. (Kanna Karanam via Ashutosh Chauhan)


Changes for Build #1798
[namit] HIVE-3706 getBoolVar in FileSinkOperator can be optimized
(Kevin Wilfong via namit)

[namit] HIVE-3707 Round map/reduce progress down when it is in the range [99.5, 
100)
(Kevin Wilfong via namit)

[namit] HIVE-3471 Implement grouping sets in hive
(Ivan Gorbachev via namit)


Changes for Build #1799
[hashutosh] HIVE-3291 : fix fs resolvers (Ashish Singh via Ashutosh Chauhan)

[hashutosh] HIVE-3680 : Include Table information in Hive's AddPartitionEvent. 
(Mithun Radhakrishnan via Ashutosh Chauhan)


Changes for Build #1800
[hashutosh] HIVE-3520 : ivysettings.xml does not let you override 
.m2/repository (Raja Aluri via Ashutosh Chauhan)

[hashutosh] HIVE-3435 : Get pdk pluginTest 

[jira] [Updated] (HIVE-3749) New test cases added by HIVE-3676 in insert1.q is not deterministic

2012-11-27 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3749:


Status: Patch Available  (was: Open)

 New test cases added by HIVE-3676 in insert1.q is not deterministic
 ---

 Key: HIVE-3749
 URL: https://issues.apache.org/jira/browse/HIVE-3749
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Navis
Assignee: Navis
 Attachments: HIVE-3749.D7011.1.patch


 The test case inserts two row and selects those all. But the displaying order 
 can be different from env to env.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3749) New test cases added by HIVE-3676 in insert1.q is not deterministic

2012-11-27 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3749:
--

Attachment: HIVE-3749.D7011.1.patch

navis requested code review of HIVE-3749 [jira] New test cases added by 
HIVE-3676 in insert1.q is not deterministic.
Reviewers: JIRA

  DPAL-1933 New test cases added by HIVE-3676 in insert1.q is not deterministic

  The test case inserts two row and selects those all. But the displaying order 
can be different from env to env.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D7011

AFFECTED FILES
  ql/src/test/queries/clientpositive/insert1.q
  ql/src/test/results/clientpositive/insert1.q.out

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/16533/

To: JIRA, navis


 New test cases added by HIVE-3676 in insert1.q is not deterministic
 ---

 Key: HIVE-3749
 URL: https://issues.apache.org/jira/browse/HIVE-3749
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Navis
Assignee: Navis
 Attachments: HIVE-3749.D7011.1.patch


 The test case inserts two row and selects those all. But the displaying order 
 can be different from env to env.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3750) JDBCStatsPublisher fails when ID length exceeds length of ID column

2012-11-27 Thread Kevin Wilfong (JIRA)
Kevin Wilfong created HIVE-3750:
---

 Summary: JDBCStatsPublisher fails when ID length exceeds length of 
ID column
 Key: HIVE-3750
 URL: https://issues.apache.org/jira/browse/HIVE-3750
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.11
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong


When the length of the ID field passed to JDBCStatsPublisher exceeds the length 
of the column in the table (currently 255 characters) stats collection fails.  
This causes the entire query to fail when hive.stats.reliable is set to true.

One way to prevent this would be to calculate a deterministic, very low 
collision hash of the ID prefix used for aggregation and use that when the 
length of the ID is too long.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3749) New test cases added by HIVE-3676 in insert1.q is not deterministic

2012-11-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505113#comment-13505113
 ] 

Ashutosh Chauhan commented on HIVE-3749:


+1

 New test cases added by HIVE-3676 in insert1.q is not deterministic
 ---

 Key: HIVE-3749
 URL: https://issues.apache.org/jira/browse/HIVE-3749
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Navis
Assignee: Navis
 Attachments: HIVE-3749.D7011.1.patch


 The test case inserts two row and selects those all. But the displaying order 
 can be different from env to env.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3676) INSERT INTO regression caused by HIVE-3465

2012-11-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-3676.


Resolution: Fixed

This is getting taken care of in HIVE-3749

 INSERT INTO regression caused by HIVE-3465
 --

 Key: HIVE-3676
 URL: https://issues.apache.org/jira/browse/HIVE-3676
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Carl Steinbach
Assignee: Navis
 Fix For: 0.10.0

 Attachments: HIVE-3676.D6741.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3749) New test cases added by HIVE-3676 in insert1.q is not deterministic

2012-11-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3749:
---

   Resolution: Fixed
Fix Version/s: 0.10.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk and 0.10. Thanks, Navis!

 New test cases added by HIVE-3676 in insert1.q is not deterministic
 ---

 Key: HIVE-3749
 URL: https://issues.apache.org/jira/browse/HIVE-3749
 Project: Hive
  Issue Type: Test
  Components: Tests
Reporter: Navis
Assignee: Navis
 Fix For: 0.10.0

 Attachments: HIVE-3749.D7011.1.patch


 The test case inserts two row and selects those all. But the displaying order 
 can be different from env to env.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3400) Add Retries to Hive MetaStore Connections

2012-11-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505174#comment-13505174
 ] 

Ashutosh Chauhan commented on HIVE-3400:


Bhushan, Can you upload the latest patch on the jira too ?

 Add Retries to Hive MetaStore Connections
 -

 Key: HIVE-3400
 URL: https://issues.apache.org/jira/browse/HIVE-3400
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
Priority: Minor
 Attachments: HIVE-3400.1.patch.txt


 Currently, when using Thrift to access the MetaStore, if the Thrift host 
 dies, there is no mechanism to reconnect to some other host even if the 
 MetaStore URIs variable in the Conf contains multiple hosts. Hive should 
 retry and reconnect rather than throwing a communication link error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3400) Add Retries to Hive MetaStore Connections

2012-11-27 Thread Bhushan Mandhani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505175#comment-13505175
 ] 

Bhushan Mandhani commented on HIVE-3400:


Yes, I'll do Submit Patch after making one key change that Carl pointed out. 
Working on that now.

 Add Retries to Hive MetaStore Connections
 -

 Key: HIVE-3400
 URL: https://issues.apache.org/jira/browse/HIVE-3400
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Bhushan Mandhani
Assignee: Bhushan Mandhani
Priority: Minor
 Attachments: HIVE-3400.1.patch.txt


 Currently, when using Thrift to access the MetaStore, if the Thrift host 
 dies, there is no mechanism to reconnect to some other host even if the 
 MetaStore URIs variable in the Conf contains multiple hosts. Hive should 
 retry and reconnect rather than throwing a communication link error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3678:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk and 0.10 Thanks, Shreepadma!

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt, HIVE-3678.2.patch.txt, 
 HIVE-3678.3.patch.txt, HIVE-3678.4.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3712) Use varbinary instead of longvarbinary to store min and max column values in column stats schema

2012-11-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3712:
---

   Resolution: Fixed
Fix Version/s: 0.10.0
   Status: Resolved  (was: Patch Available)

Taken care of in HIVE-3678

 Use varbinary instead of longvarbinary to store min and max column values in 
 column stats schema
 

 Key: HIVE-3712
 URL: https://issues.apache.org/jira/browse/HIVE-3712
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Statistics
Affects Versions: 0.9.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0


 JDBC type longvarbinary maps to BLOB SQL type in some databases. Storing min 
 and max column values for numeric types takes up 8 bytes and hence doesn't 
 require a BLOB. Storing these values in a BLOB will impact performance 
 without providing much benefits. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3665) Allow URIs without port to be specified in metatool

2012-11-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3665:
---

   Resolution: Fixed
Fix Version/s: 0.11
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Shreepadma!

 Allow URIs without port to be specified in metatool
 ---

 Key: HIVE-3665
 URL: https://issues.apache.org/jira/browse/HIVE-3665
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.10.0
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.11

 Attachments: HIVE-3665.1.patch.txt


 Metatool should accept input URIs where one URI contains a port and the other 
 doesn't. While metatool today accepts input URIs without the port when both 
 the input URIs (oldLoc and newLoc) don't contain the port, we should make the 
 tool a little more flexible to allow for the case where one URI contains a 
 valid port and the other input URI doesn't. This makes more sense when 
 transitioning to HA and a user chooses to specify the port as part of the 
 oldLoc, but the port doesn't mean much for the newLoc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review request: JIRAs useful for the Shark project

2012-11-27 Thread Ashutosh Chauhan
Hi Mikhail,

I will take a look into those jiras.

Thanks,
Ashutosh

On Tue, Nov 27, 2012 at 11:43 AM, Mikhail Bautin 
bautin.mailing.li...@gmail.com wrote:

 Hello,

 There are the following review requests pending that are very useful for
 the Shark project (http://shark.cs.berkeley.edu/). It would be great if
 someone could take a look and help us get these JIRAs committed.

- https://reviews.facebook.net/D6879
 (HIVE-3731https://issues.apache.org/jira/browse/HIVE-3731):
adding an Ant target to create a Debian package, which allows deploying
 the
patched version of Hive alongside Shark on Debian systems.
- https://reviews.facebook.net/D7005
 (HIVE-3748https://issues.apache.org/jira/browse/HIVE-3748):
making QTestUtil work correctly when running the test suite, which helps
with running Hive/Shark unit tests from using Maven.

 In addition, the following JIRA would make a lot easier to work with Hive
 for anyone who is using JDK 1.7:

- https://reviews.facebook.net/D6873
 (HIVE-3384https://issues.apache.org/jira/browse/HIVE-3384):
HIVE JDBC module won't compile under JDK1.7 as new methods added in JDBC
specification

 Your help in reviewing/committing these patches is greatly appreciated!

 Thanks,
 Mikhail



Transform.java Vs. PhysicalPlanResolver.java

2012-11-27 Thread Mahsa Mofidpoor
Hello,

1) Does Hive consider a clear-cut distinction between compile-time
optimization and run-time optimization?
2) Does anybody know the difference between the optimizations implementing
the Transform and the ones implementing the PhysicalPlanResolver? Why such
optimizations are they in separate packages?

Thanks and Regards,
Mahsa


Re: Transform.java Vs. PhysicalPlanResolver.java

2012-11-27 Thread Namit Jain

Optimizations in the Transform take the operator tree and transform that
into a new operator tree.
Then the operator tree is broken into various tasks.
Physical optimizer takes a task and optimizes/changes the task.

Both these optimizations are done at compile time.

There is nothing like a runtime optimization right now, the plan does not
change dynamically.



On 11/28/12 8:54 AM, Mahsa Mofidpoor mofidp...@gmail.com wrote:

Hello,

1) Does Hive consider a clear-cut distinction between compile-time
optimization and run-time optimization?
2) Does anybody know the difference between the optimizations implementing
the Transform and the ones implementing the PhysicalPlanResolver? Why such
optimizations are they in separate packages?

Thanks and Regards,
Mahsa



[jira] [Commented] (HIVE-3552) performant manner for performing cubes and rollups in case of less aggretation

2012-11-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505230#comment-13505230
 ] 

Namit Jain commented on HIVE-3552:
--

This approach wont work for distincts.

 performant manner for performing cubes and rollups in case of less aggretation
 --

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional stage - in the first 
 stage perform the group by assuming there was no cube. Ad another stage, where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3552) performant manner for performing cubes and rollups in case of less aggretation

2012-11-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505231#comment-13505231
 ] 

Namit Jain commented on HIVE-3552:
--

https://reviews.facebook.net/D7029

 performant manner for performing cubes and rollups in case of less aggretation
 --

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional stage - in the first 
 stage perform the group by assuming there was no cube. Ad another stage, where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3552) performant manner for performing cubes and rollups in case of less aggregation

2012-11-27 Thread Mark Grover (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Grover updated HIVE-3552:
--

Summary: performant manner for performing cubes and rollups in case of less 
aggregation  (was: performant manner for performing cubes and rollups in case 
of less aggretation)

 performant manner for performing cubes and rollups in case of less aggregation
 --

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain

 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional stage - in the first 
 stage perform the group by assuming there was no cube. Ad another stage, where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3596) Regression - HiveConf static variable causes issues in long running JVM instances with /tmp/ data

2012-11-27 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505273#comment-13505273
 ] 

Carl Steinbach commented on HIVE-3596:
--

@Chris: Turns out that Kevin has been working on this same problem in 
HIVE-3709. He was able to get a little bit farther, but his solution seems to 
have some concurrency issues. If you have time it may be worth looking at his 
solution and seeing if you can spot threading problem.

 Regression - HiveConf static variable causes issues in long running JVM 
 instances with /tmp/ data
 -

 Key: HIVE-3596
 URL: https://issues.apache.org/jira/browse/HIVE-3596
 Project: Hive
  Issue Type: Bug
  Components: Configuration
Affects Versions: 0.8.0, 0.8.1, 0.9.0
Reporter: Chris McConnell
Assignee: Chris McConnell
 Fix For: 0.8.1, 0.9.0, 0.10.0

 Attachments: HIVE-3596.patch


 With Hive 0.8.x, HiveConf was changed to utilize the private, static member 
 confVarURL which points to /tmp/hive-user-tmp_number.xml for job 
 configuration settings. 
 During long running JVMs, such as a Beeswax server, which creates multiple 
 HiveConf objects over time this variable does not properly get updated 
 between jobs and can cause job failure if the OS cleans /tmp/ during a cron 
 job. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira