[jira] [Commented] (HIVE-4773) Templeton intermittently fail to commit output to file system

2013-08-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755443#comment-13755443
 ] 

Hive QA commented on HIVE-4773:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12588951/HIVE-4773.1.patch

{color:green}SUCCESS:{color} +1 2902 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/583/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/583/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

 Templeton intermittently fail to commit output to file system
 -

 Key: HIVE-4773
 URL: https://issues.apache.org/jira/browse/HIVE-4773
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Shuaishuai Nie
Assignee: Shuaishuai Nie
 Attachments: HIVE-4773.1.patch


 With ASV as a default FS, we saw instances where output is not fully flushed 
 to storage before the Templeton controller process exits. This results in 
 stdout and stderr being empty even though the job completed successfully.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4789) FetchOperator fails on partitioned Avro data

2013-08-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755456#comment-13755456
 ] 

Hudson commented on HIVE-4789:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #392 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/392/])
HIVE-4789 : FetchOperator fails on partitioned Avro data (Sean Busbey via 
Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519132)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
* /hive/trunk/ql/src/test/queries/clientpositive/avro_partitioned.q
* /hive/trunk/ql/src/test/results/clientpositive/avro_partitioned.q.out


 FetchOperator fails on partitioned Avro data
 

 Key: HIVE-4789
 URL: https://issues.apache.org/jira/browse/HIVE-4789
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.12.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Blocker
 Fix For: 0.12.0

 Attachments: HIVE-4789.1.patch.txt, HIVE-4789.2.patch.txt, 
 HIVE-4789.3.patch.txt


 HIVE-3953 fixed using partitioned avro tables for anything that used the 
 MapOperator, but those that rely on FetchOperator still fail with the same 
 error.
 e.g.
 {code}
   SELECT * FROM partitioned_avro LIMIT 5;
   SELECT * FROM partitioned_avro WHERE partition_col=value;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5182) log more stuff via PerfLogger

2013-08-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755458#comment-13755458
 ] 

Hive QA commented on HIVE-5182:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12600906/HIVE-5182.D12639.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2902 tests executed
*Failed tests:*
{noformat}
org.apache.hcatalog.mapreduce.TestHCatExternalDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/584/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/584/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 log more stuff via PerfLogger
 -

 Key: HIVE-5182
 URL: https://issues.apache.org/jira/browse/HIVE-5182
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-5182.D12639.1.patch


 PerfLogger output is useful in understanding perf. There are large gaps in 
 it, however, and it's not clear what is going on during these. Some sections 
 are large and have no breakdown. It would be nice to add more stuff. At this 
 point I'm not certain where exactly, whoever makes the patch (me?) will just 
 need to look at the above gaps and fill them in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1511) Hive plan serialization is slow

2013-08-31 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755460#comment-13755460
 ] 

Mohammad Kamrul Islam commented on HIVE-1511:
-

Running after another probable Kryo bug.
Got the following exception  when ran with the attached plan XML file.

Exception in thread main com.esotericsoftware.kryo.KryoException: Encountered 
unregistered class ID: 115
Serialization trace:
opParseCtxMap (org.apache.hadoop.hive.ql.plan.MapWork)
mapWork (org.apache.hadoop.hive.ql.plan.MapredWork)
at 
com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:119)
at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:642)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:753)
at 
com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131)
at 
com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:680)
at 
com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:485)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:680)
at 
com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:485)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:658)
at KryoHiveTest.fun(KryoHiveTest.java:54)
at KryoHiveTest.main(KryoHiveTest.java:27)


 Hive plan serialization is slow
 ---

 Key: HIVE-1511
 URL: https://issues.apache.org/jira/browse/HIVE-1511
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.7.0, 0.11.0
Reporter: Ning Zhang
Assignee: Mohammad Kamrul Islam
 Attachments: generated_plan.xml, HIVE-1511.4.patch, 
 HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, HIVE-1511.8.patch, 
 HIVE-1511.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, 
 HIVE-1511-wip4.patch, HIVE-1511.wip.9.patch, HIVE-1511-wip.patch, 
 KryoHiveTest.java, run.sh


 As reported by Edward Capriolo:
 For reference I did this as a test case
 SELECT * FROM src where
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 ...(100 more of these)
 No OOM but I gave up after the test case did not go anywhere for about
 2 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1511) Hive plan serialization is slow

2013-08-31 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-1511:


Attachment: failedPlan.xml

 Hive plan serialization is slow
 ---

 Key: HIVE-1511
 URL: https://issues.apache.org/jira/browse/HIVE-1511
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.7.0, 0.11.0
Reporter: Ning Zhang
Assignee: Mohammad Kamrul Islam
 Attachments: failedPlan.xml, generated_plan.xml, HIVE-1511.4.patch, 
 HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, HIVE-1511.8.patch, 
 HIVE-1511.patch, HIVE-1511-wip2.patch, HIVE-1511-wip3.patch, 
 HIVE-1511-wip4.patch, HIVE-1511.wip.9.patch, HIVE-1511-wip.patch, 
 KryoHiveTest.java, run.sh


 As reported by Edward Capriolo:
 For reference I did this as a test case
 SELECT * FROM src where
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 ...(100 more of these)
 No OOM but I gave up after the test case did not go anywhere for about
 2 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4789) FetchOperator fails on partitioned Avro data

2013-08-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755471#comment-13755471
 ] 

Hudson commented on HIVE-4789:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #79 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/79/])
HIVE-4789 : FetchOperator fails on partitioned Avro data (Sean Busbey via 
Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519132)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
* /hive/trunk/ql/src/test/queries/clientpositive/avro_partitioned.q
* /hive/trunk/ql/src/test/results/clientpositive/avro_partitioned.q.out


 FetchOperator fails on partitioned Avro data
 

 Key: HIVE-4789
 URL: https://issues.apache.org/jira/browse/HIVE-4789
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.12.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Blocker
 Fix For: 0.12.0

 Attachments: HIVE-4789.1.patch.txt, HIVE-4789.2.patch.txt, 
 HIVE-4789.3.patch.txt


 HIVE-3953 fixed using partitioned avro tables for anything that used the 
 MapOperator, but those that rely on FetchOperator still fail with the same 
 error.
 e.g.
 {code}
   SELECT * FROM partitioned_avro LIMIT 5;
   SELECT * FROM partitioned_avro WHERE partition_col=value;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4789) FetchOperator fails on partitioned Avro data

2013-08-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755490#comment-13755490
 ] 

Hudson commented on HIVE-4789:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #146 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/146/])
HIVE-4789 : FetchOperator fails on partitioned Avro data (Sean Busbey via 
Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519132)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
* /hive/trunk/ql/src/test/queries/clientpositive/avro_partitioned.q
* /hive/trunk/ql/src/test/results/clientpositive/avro_partitioned.q.out


 FetchOperator fails on partitioned Avro data
 

 Key: HIVE-4789
 URL: https://issues.apache.org/jira/browse/HIVE-4789
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.12.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Blocker
 Fix For: 0.12.0

 Attachments: HIVE-4789.1.patch.txt, HIVE-4789.2.patch.txt, 
 HIVE-4789.3.patch.txt


 HIVE-3953 fixed using partitioned avro tables for anything that used the 
 MapOperator, but those that rely on FetchOperator still fail with the same 
 error.
 e.g.
 {code}
   SELECT * FROM partitioned_avro LIMIT 5;
   SELECT * FROM partitioned_avro WHERE partition_col=value;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1511) Hive plan serialization is slow

2013-08-31 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-1511:
---

Attachment: HIVE-1511.9.patch

Hi,

I am re-uploading your patch HIVE-1511.wip.9.patch as HIVE-1511.9.patch so the 
[tests will 
execute|https://cwiki.apache.org/confluence/display/Hive/Hive+PreCommit+Patch+Testing].

 Hive plan serialization is slow
 ---

 Key: HIVE-1511
 URL: https://issues.apache.org/jira/browse/HIVE-1511
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.7.0, 0.11.0
Reporter: Ning Zhang
Assignee: Mohammad Kamrul Islam
 Attachments: failedPlan.xml, generated_plan.xml, HIVE-1511.4.patch, 
 HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, HIVE-1511.8.patch, 
 HIVE-1511.9.patch, HIVE-1511.patch, HIVE-1511-wip2.patch, 
 HIVE-1511-wip3.patch, HIVE-1511-wip4.patch, HIVE-1511.wip.9.patch, 
 HIVE-1511-wip.patch, KryoHiveTest.java, run.sh


 As reported by Edward Capriolo:
 For reference I did this as a test case
 SELECT * FROM src where
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 ...(100 more of these)
 No OOM but I gave up after the test case did not go anywhere for about
 2 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1511) Hive plan serialization is slow

2013-08-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755497#comment-13755497
 ] 

Hive QA commented on HIVE-1511:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12600944/HIVE-1511.9.patch

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/587/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/587/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests failed with: NonZeroExitCodeException: Command 'bash 
/data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and 
output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-587/source-prep.txt
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf build hcatalog/build hcatalog/core/build 
hcatalog/storage-handlers/hbase/build hcatalog/server-extensions/build 
hcatalog/webhcat/svr/build hcatalog/webhcat/java-client/build 
hcatalog/hcatalog-pig-adapter/build common/src/gen
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1519174.

At revision 1519174.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0 to p2
+ exit 1
'
{noformat}

This message is automatically generated.

 Hive plan serialization is slow
 ---

 Key: HIVE-1511
 URL: https://issues.apache.org/jira/browse/HIVE-1511
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.7.0, 0.11.0
Reporter: Ning Zhang
Assignee: Mohammad Kamrul Islam
 Attachments: failedPlan.xml, generated_plan.xml, HIVE-1511.4.patch, 
 HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, HIVE-1511.8.patch, 
 HIVE-1511.9.patch, HIVE-1511.patch, HIVE-1511-wip2.patch, 
 HIVE-1511-wip3.patch, HIVE-1511-wip4.patch, HIVE-1511.wip.9.patch, 
 HIVE-1511-wip.patch, KryoHiveTest.java, run.sh


 As reported by Edward Capriolo:
 For reference I did this as a test case
 SELECT * FROM src where
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 ...(100 more of these)
 No OOM but I gave up after the test case did not go anywhere for about
 2 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1511) Hive plan serialization is slow

2013-08-31 Thread Leo Romanoff (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755500#comment-13755500
 ] 

Leo Romanoff commented on HIVE-1511:


This XML file can be read properly by the latest Kryo snapshot without any 
additional changes on my side. Are you sure you use the latest build of the 
latest Kryo snapshot (2.22-SNAPSHOT)?

-Leo

 Hive plan serialization is slow
 ---

 Key: HIVE-1511
 URL: https://issues.apache.org/jira/browse/HIVE-1511
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.7.0, 0.11.0
Reporter: Ning Zhang
Assignee: Mohammad Kamrul Islam
 Attachments: failedPlan.xml, generated_plan.xml, HIVE-1511.4.patch, 
 HIVE-1511.5.patch, HIVE-1511.6.patch, HIVE-1511.7.patch, HIVE-1511.8.patch, 
 HIVE-1511.9.patch, HIVE-1511.patch, HIVE-1511-wip2.patch, 
 HIVE-1511-wip3.patch, HIVE-1511-wip4.patch, HIVE-1511.wip.9.patch, 
 HIVE-1511-wip.patch, KryoHiveTest.java, run.sh


 As reported by Edward Capriolo:
 For reference I did this as a test case
 SELECT * FROM src where
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 ...(100 more of these)
 No OOM but I gave up after the test case did not go anywhere for about
 2 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5185) test query files reduce_deduplicate_exclude_gby.q and reducesink_dedup.q are useless

2013-08-31 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-5185:
---

Priority: Trivial  (was: Minor)

 test query files reduce_deduplicate_exclude_gby.q and reducesink_dedup.q are 
 useless
 

 Key: HIVE-5185
 URL: https://issues.apache.org/jira/browse/HIVE-5185
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Trivial

 The file reduce_deduplicate_exclude_gby.q contains
 {code:sql}
 create table t1( key_int1 int, key_int2 int, key_string1 string, key_string2 
 string);
 set hive.optimize.reducededuplication=false;
 set hive.map.aggr=false;
 select Q1.key_int1, sum(Q1.key_int1) from (select * from t1 cluster by 
 key_int1) Q1 group by Q1.key_int1;
 drop table t1;
 {code}
 Since the table is not populated, there is no result will be in the .out file.
 The same thing in reducesink-dedup.q
 {code:sql}
 DROP TABLE part;
 -- data setup
 CREATE TABLE part( 
     p_partkey INT,
     p_name STRING,
     p_mfgr STRING,
     p_brand STRING,
     p_type STRING,
     p_size INT,
     p_container STRING,
     p_retailprice DOUBLE,
     p_comment STRING
 );
 select p_name 
 from (select p_name from part distribute by 1 sort by 1) p 
 distribute by 1 sort by 1
 ;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Hive 0.12 release

2013-08-31 Thread Edward Capriolo
I do not think we should consider too many not done items for 12.0. I do
not think releases should be a wish-list. If trunk is working with no
blockers we should build a release, anything not done and not committed
goes in next release. I do not like cherry picking issues and then waiting
for them. Historically adding types is much more complicated then people
think. Generally there are three or more follow on issues for things people
did not consider in the initial patch, it worked this way for binary,
decimal, date, so I am not super eager to announce and release a large
feature that was just committed and not heavily battle tested. The npath
thing is a blocker and we can not release without that.

Committers should review the other blockers as well, and either mark them
not as blockers, or work to get them committed, because if we have
blockers, we should dealing with them.


On Fri, Aug 30, 2013 at 11:21 PM, Eugene Koifman
ekoif...@hortonworks.comwrote:

 Because this change includes moving/renaming 300 files and then adding
 about 200 more with the same name (but contents from 0.11 branch) as the
 file had before the move.  The first part is necessary to change the
 package name, the second to ensure backwards compatibility.  I described
 this in detail the mail RFC: Major HCatalog refactoring.

 Given the complexity of the changes I think creating and applying a patch
 could end up with a lot of conflicts.  So doing this after the branch adds
 complexity but does not add anything useful.

 Eugene



 On Fri, Aug 30, 2013 at 5:57 PM, Thejas Nair the...@hortonworks.com
 wrote:

  Hi Eugene,
  Can you please elaborate on why you would like to have this in before
  branching and not commit it after branching in trunk and the branch ?
  Thanks,
  Thejas
 
 
 
  On Thu, Aug 29, 2013 at 10:31 PM, Eugene Koifman
  ekoif...@hortonworks.comwrote:
 
   I think we should make sure that several items under HIVE-4869 get
  checked
   in before branching.
  
   Eugene
  
  
   On Thu, Aug 29, 2013 at 9:18 PM, Thejas Nair the...@hortonworks.com
   wrote:
  
It has been more than 3 months since 0.11 was released and we already
   have
294 jiras in resolved-fixed state for 0.12. This includes several new
features such as date data type, optimizer improvements, ORC format
improvements and many bug fixes. There are also many features look
  ready
   to
get committed soon such as the varchar type.
I think it is time to start preparing for a 0.12 release by creating
 a
branch later next week and start stabilizing it. What do people think
   about
it ?
   
As we get closer to the branching, we can start discussing any
  additional
features/bug fixes that we should add to the release and start
  monitoring
their progress.
   
Thanks,
Thejas
   
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
  entity
   to
which it is addressed and may contain information that is
 confidential,
privileged and exempt from disclosure under applicable law. If the
  reader
of this message is not the intended recipient, you are hereby
 notified
   that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
   immediately
and delete it from your system. Thank You.
   
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity
  to
   which it is addressed and may contain information that is confidential,
   privileged and exempt from disclosure under applicable law. If the
 reader
   of this message is not the intended recipient, you are hereby notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
  immediately
   and delete it from your system. Thank You.
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you 

Re: RFC: Major HCatalog refactoring

2013-08-31 Thread Edward Capriolo
By coverage do you mean to say that:

 Thus, the published HCatalog JARs will contain both packages and the unit
 tests will cover both versions of the API.

We are going to double the time of unit tests for this module?


On Fri, Aug 30, 2013 at 8:41 PM, Eugene Koifman ekoif...@hortonworks.comwrote:

 This will change every file under hcatalog so it has to happen before the
 branching.  Most likely at the beginning of next week.

 Thanks


 On Wed, Aug 28, 2013 at 5:24 PM, Eugene Koifman ekoif...@hortonworks.com
 wrote:

  Hi,
 
 
  Here is the plan for refactoring HCatalog as was agreed to when it was
  merged into Hive during.  HIVE-4869 is the umbrella bug for this work.
  The
  changes are complex and touch every single file under hcatalog.  Please
  comment.
 
  When HCatalog project was merged into Hive on 0.11 several integration
  items did not make the 0.11 deadline.  It was agreed to finish them in
 0.12
  release.  Specifically:
 
  1. HIVE-4895 - change package name from org.apache.hcatalog to
  org.apache.hive.hcatalog
 
  2. HIVE-4896 - create binary backwards compatibility layer for hcat users
  upgrading from 0.11 to 0.12
 
  For item 1, we’ll just move every file under org.apache.hcatalog to
  org.apache.hive.hcatalog and update all “package” and “import” statement
 as
  well as all hcat/webhcat scripts.  This will include all JUnit tests.
 
  Item 2 will ensure that if a user has a M/R program or Pig script, etc.
  that uses HCatalog public API, their programs will continue to work w/o
  change with hive 0.12.
 
  The proposal is to make the changes that have as little impact on the
  build system, in part to make upcoming ‘mavenization’ of hive easier, in
  part to make the changes more manageable.
 
 
 
  The list of public interfaces (and their transitive closure) for which
  backwards compat will be provided.
 
 1.
 
 HCatLoader
 2.
 
 HCatStorer
 3.
 
 HCatInputFormat
 4.
 
 HCatOutputFormat
 5.
 
 HCatReader
 6.
 
 HCatWriter
 7.
 
 HCatRecord
 8.
 
 HCatSchema
 
 
  To achieve this, 0.11 version of these classes will be added in
  org.apache.hcatalog package (after item 1 is done).  Each of these
 classes
  as well as dependencies will be deprecated to make it clear that any new
  development needs to happen in org.apache.hive.hcatalog.  0.11 version of
  JUnit tests for hcat will also be brought to trunk and handled the same
 way
  as mainline code.  A sunset clause will be added to the deprecation
 message.
 
  Thus, the published HCatalog JARs will contain both packages and the unit
  tests will cover both versions of the API.
 
  Since these changes are unavoidably disruptive, we’ll need to lock down
  hcatalog part of hive, check in all existing patches (which are ready,
 i.e.
  apply/test cleanly and don’t have review comments which need to be
  addressed) and them make the refactoring changes.
 
 
  Thanks,
 
  Eugene
 
 
 
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



[jira] [Updated] (HIVE-4891) Distinct includes duplicate records

2013-08-31 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-4891:
--

Priority: Blocker  (was: Major)

 Distinct includes duplicate records
 ---

 Key: HIVE-4891
 URL: https://issues.apache.org/jira/browse/HIVE-4891
 Project: Hive
  Issue Type: Bug
  Components: File Formats, HiveServer2, Query Processor
Affects Versions: 0.10.0
Reporter: Fengdong Yu
Priority: Blocker

 I have two partitions, one is sequence file, another is RCFile, but they are 
 the same data(only different file format).
 I have the following SQL:
 {code}
 select distinct uid from test where (dt ='20130718' or dt ='20130718_1') and 
 cur_url like '%cq.aa.com%';
 {code}
 dt ='20130718' is sequence file,(default input format, which specified when 
 create table)
  
 dt ='20130718_1' is RCFile.
 {code}
 ALTER TABLE test ADD IF NOT EXISTS PARTITION (dt='20130718_1') LOCATION 
 '/user/test/test-data'
 ALTER TABLE test PARTITION(dt='20130718_1') SET FILEFORMAT RCFILE;
 {code}
 but there are duplicate recoreds in the result.
 If two partitions with the same input format, then there are no duplicate 
 records.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-08-31 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755530#comment-13755530
 ] 

Edward Capriolo commented on HIVE-4002:
---

[~yhuai][~navis] Are you two discussing possible revisions or is this patch 
ready to be committed?

 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch, HIVE-4002.D8739.4.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns

2013-08-31 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5149:
--

Priority: Blocker  (was: Major)

 ReduceSinkDeDuplication can pick the wrong partitioning columns
 ---

 Key: HIVE-5149
 URL: https://issues.apache.org/jira/browse/HIVE-5149
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Blocker
 Attachments: HIVE-5149.1.patch, HIVE-5149.2.patch


 https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


VOTE: Yin Huai as a committer

2013-08-31 Thread Edward Capriolo
https://issues.apache.org/jira/issues/?jql=assignee%20%3D%20yhuai

Yin, has done an amazing job with new features to hive's planner and
optimizer. He has started reviewing issues of others. Additionally he is
finding edge-case bugs in current features.

His multi-year effort on https://issues.apache.org/jira/browse/HIVE-2206 is
an impressive body of work and example of great determination.


Re: VOTE: Yin Huai as a committer

2013-08-31 Thread Edward Capriolo
My mistake. Please do not vote here. The vote goes to private.


On Sat, Aug 31, 2013 at 11:06 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 https://issues.apache.org/jira/issues/?jql=assignee%20%3D%20yhuai

 Yin, has done an amazing job with new features to hive's planner and
 optimizer. He has started reviewing issues of others. Additionally he is
 finding edge-case bugs in current features.

 His multi-year effort on https://issues.apache.org/jira/browse/HIVE-2206is an 
 impressive body of work and example of great determination.



Re: RFC: Major HCatalog refactoring

2013-08-31 Thread Eugene Koifman
not quite double but close  (on my Mac that means it will go up from 35
minutes to 55-60) so in greater scheme of things it should be negligible



On Sat, Aug 31, 2013 at 7:35 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 By coverage do you mean to say that:

  Thus, the published HCatalog JARs will contain both packages and the unit
  tests will cover both versions of the API.

 We are going to double the time of unit tests for this module?


 On Fri, Aug 30, 2013 at 8:41 PM, Eugene Koifman ekoif...@hortonworks.com
 wrote:

  This will change every file under hcatalog so it has to happen before the
  branching.  Most likely at the beginning of next week.
 
  Thanks
 
 
  On Wed, Aug 28, 2013 at 5:24 PM, Eugene Koifman 
 ekoif...@hortonworks.com
  wrote:
 
   Hi,
  
  
   Here is the plan for refactoring HCatalog as was agreed to when it was
   merged into Hive during.  HIVE-4869 is the umbrella bug for this work.
   The
   changes are complex and touch every single file under hcatalog.  Please
   comment.
  
   When HCatalog project was merged into Hive on 0.11 several integration
   items did not make the 0.11 deadline.  It was agreed to finish them in
  0.12
   release.  Specifically:
  
   1. HIVE-4895 - change package name from org.apache.hcatalog to
   org.apache.hive.hcatalog
  
   2. HIVE-4896 - create binary backwards compatibility layer for hcat
 users
   upgrading from 0.11 to 0.12
  
   For item 1, we’ll just move every file under org.apache.hcatalog to
   org.apache.hive.hcatalog and update all “package” and “import”
 statement
  as
   well as all hcat/webhcat scripts.  This will include all JUnit tests.
  
   Item 2 will ensure that if a user has a M/R program or Pig script, etc.
   that uses HCatalog public API, their programs will continue to work w/o
   change with hive 0.12.
  
   The proposal is to make the changes that have as little impact on the
   build system, in part to make upcoming ‘mavenization’ of hive easier,
 in
   part to make the changes more manageable.
  
  
  
   The list of public interfaces (and their transitive closure) for which
   backwards compat will be provided.
  
  1.
  
  HCatLoader
  2.
  
  HCatStorer
  3.
  
  HCatInputFormat
  4.
  
  HCatOutputFormat
  5.
  
  HCatReader
  6.
  
  HCatWriter
  7.
  
  HCatRecord
  8.
  
  HCatSchema
  
  
   To achieve this, 0.11 version of these classes will be added in
   org.apache.hcatalog package (after item 1 is done).  Each of these
  classes
   as well as dependencies will be deprecated to make it clear that any
 new
   development needs to happen in org.apache.hive.hcatalog.  0.11 version
 of
   JUnit tests for hcat will also be brought to trunk and handled the same
  way
   as mainline code.  A sunset clause will be added to the deprecation
  message.
  
   Thus, the published HCatalog JARs will contain both packages and the
 unit
   tests will cover both versions of the API.
  
   Since these changes are unavoidably disruptive, we’ll need to lock down
   hcatalog part of hive, check in all existing patches (which are ready,
  i.e.
   apply/test cleanly and don’t have review comments which need to be
   addressed) and them make the refactoring changes.
  
  
   Thanks,
  
   Eugene
  
  
  
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: RFC: Major HCatalog refactoring

2013-08-31 Thread Brock Noland
Will these be new Java class files or new test methods to existing
classes?  I am just curious as to how this will play into the
distributed testing framework.

On Sat, Aug 31, 2013 at 10:19 AM, Eugene Koifman
ekoif...@hortonworks.com wrote:
 not quite double but close  (on my Mac that means it will go up from 35
 minutes to 55-60) so in greater scheme of things it should be negligible



 On Sat, Aug 31, 2013 at 7:35 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 By coverage do you mean to say that:

  Thus, the published HCatalog JARs will contain both packages and the unit
  tests will cover both versions of the API.

 We are going to double the time of unit tests for this module?


 On Fri, Aug 30, 2013 at 8:41 PM, Eugene Koifman ekoif...@hortonworks.com
 wrote:

  This will change every file under hcatalog so it has to happen before the
  branching.  Most likely at the beginning of next week.
 
  Thanks
 
 
  On Wed, Aug 28, 2013 at 5:24 PM, Eugene Koifman 
 ekoif...@hortonworks.com
  wrote:
 
   Hi,
  
  
   Here is the plan for refactoring HCatalog as was agreed to when it was
   merged into Hive during.  HIVE-4869 is the umbrella bug for this work.
   The
   changes are complex and touch every single file under hcatalog.  Please
   comment.
  
   When HCatalog project was merged into Hive on 0.11 several integration
   items did not make the 0.11 deadline.  It was agreed to finish them in
  0.12
   release.  Specifically:
  
   1. HIVE-4895 - change package name from org.apache.hcatalog to
   org.apache.hive.hcatalog
  
   2. HIVE-4896 - create binary backwards compatibility layer for hcat
 users
   upgrading from 0.11 to 0.12
  
   For item 1, we’ll just move every file under org.apache.hcatalog to
   org.apache.hive.hcatalog and update all “package” and “import”
 statement
  as
   well as all hcat/webhcat scripts.  This will include all JUnit tests.
  
   Item 2 will ensure that if a user has a M/R program or Pig script, etc.
   that uses HCatalog public API, their programs will continue to work w/o
   change with hive 0.12.
  
   The proposal is to make the changes that have as little impact on the
   build system, in part to make upcoming ‘mavenization’ of hive easier,
 in
   part to make the changes more manageable.
  
  
  
   The list of public interfaces (and their transitive closure) for which
   backwards compat will be provided.
  
  1.
  
  HCatLoader
  2.
  
  HCatStorer
  3.
  
  HCatInputFormat
  4.
  
  HCatOutputFormat
  5.
  
  HCatReader
  6.
  
  HCatWriter
  7.
  
  HCatRecord
  8.
  
  HCatSchema
  
  
   To achieve this, 0.11 version of these classes will be added in
   org.apache.hcatalog package (after item 1 is done).  Each of these
  classes
   as well as dependencies will be deprecated to make it clear that any
 new
   development needs to happen in org.apache.hive.hcatalog.  0.11 version
 of
   JUnit tests for hcat will also be brought to trunk and handled the same
  way
   as mainline code.  A sunset clause will be added to the deprecation
  message.
  
   Thus, the published HCatalog JARs will contain both packages and the
 unit
   tests will cover both versions of the API.
  
   Since these changes are unavoidably disruptive, we’ll need to lock down
   hcatalog part of hive, check in all existing patches (which are ready,
  i.e.
   apply/test cleanly and don’t have review comments which need to be
   addressed) and them make the refactoring changes.
  
  
   Thanks,
  
   Eugene
  
  
  
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org


Re: RFC: Major HCatalog refactoring

2013-08-31 Thread Edward Capriolo
 not quite double but close  (on my Mac that means it will go up from 35
 minutes to 55-60) so in greater scheme of things it should be negligible

Can't we make the classes extend each other, and just test them once. Or
test them once before the patch and only include half the tests in the
final commit?

Using tests to guarantee backwards compact-ability for end users at the
expense of making our test process longer is not a good option. I am not
sure anyone realizes the scope of this, but apache's build servers are
constantly spinning trying to run our tests, for multiple branches of
hadoop. We also have build farm volunteered to us just so we can commit
features in a reasonable time frame.
We have to run every test before we commit so just sloshing on an extra 20
minutes of testing hurts our agility. I think we need to come up with a
better option.


On Sat, Aug 31, 2013 at 11:19 AM, Eugene Koifman
ekoif...@hortonworks.comwrote:

 not quite double but close  (on my Mac that means it will go up from 35
 minutes to 55-60) so in greater scheme of things it should be negligible



 On Sat, Aug 31, 2013 at 7:35 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

  By coverage do you mean to say that:
 
   Thus, the published HCatalog JARs will contain both packages and the
 unit
   tests will cover both versions of the API.
 
  We are going to double the time of unit tests for this module?
 
 
  On Fri, Aug 30, 2013 at 8:41 PM, Eugene Koifman 
 ekoif...@hortonworks.com
  wrote:
 
   This will change every file under hcatalog so it has to happen before
 the
   branching.  Most likely at the beginning of next week.
  
   Thanks
  
  
   On Wed, Aug 28, 2013 at 5:24 PM, Eugene Koifman 
  ekoif...@hortonworks.com
   wrote:
  
Hi,
   
   
Here is the plan for refactoring HCatalog as was agreed to when it
 was
merged into Hive during.  HIVE-4869 is the umbrella bug for this
 work.
The
changes are complex and touch every single file under hcatalog.
  Please
comment.
   
When HCatalog project was merged into Hive on 0.11 several
 integration
items did not make the 0.11 deadline.  It was agreed to finish them
 in
   0.12
release.  Specifically:
   
1. HIVE-4895 - change package name from org.apache.hcatalog to
org.apache.hive.hcatalog
   
2. HIVE-4896 - create binary backwards compatibility layer for hcat
  users
upgrading from 0.11 to 0.12
   
For item 1, we’ll just move every file under org.apache.hcatalog to
org.apache.hive.hcatalog and update all “package” and “import”
  statement
   as
well as all hcat/webhcat scripts.  This will include all JUnit tests.
   
Item 2 will ensure that if a user has a M/R program or Pig script,
 etc.
that uses HCatalog public API, their programs will continue to work
 w/o
change with hive 0.12.
   
The proposal is to make the changes that have as little impact on the
build system, in part to make upcoming ‘mavenization’ of hive easier,
  in
part to make the changes more manageable.
   
   
   
The list of public interfaces (and their transitive closure) for
 which
backwards compat will be provided.
   
   1.
   
   HCatLoader
   2.
   
   HCatStorer
   3.
   
   HCatInputFormat
   4.
   
   HCatOutputFormat
   5.
   
   HCatReader
   6.
   
   HCatWriter
   7.
   
   HCatRecord
   8.
   
   HCatSchema
   
   
To achieve this, 0.11 version of these classes will be added in
org.apache.hcatalog package (after item 1 is done).  Each of these
   classes
as well as dependencies will be deprecated to make it clear that any
  new
development needs to happen in org.apache.hive.hcatalog.  0.11
 version
  of
JUnit tests for hcat will also be brought to trunk and handled the
 same
   way
as mainline code.  A sunset clause will be added to the deprecation
   message.
   
Thus, the published HCatalog JARs will contain both packages and the
  unit
tests will cover both versions of the API.
   
Since these changes are unavoidably disruptive, we’ll need to lock
 down
hcatalog part of hive, check in all existing patches (which are
 ready,
   i.e.
apply/test cleanly and don’t have review comments which need to be
addressed) and them make the refactoring changes.
   
   
Thanks,
   
Eugene
   
   
   
   
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity
  to
   which it is addressed and may contain information that is confidential,
   privileged and exempt from disclosure under applicable law. If the
 reader
   of this message is not the intended recipient, you are hereby notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
  immediately
   and delete it from 

[jira] [Commented] (HIVE-5107) Change hive's build to maven

2013-08-31 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1374#comment-1374
 ] 

Edward Capriolo commented on HIVE-5107:
---

[~sershe] I think what you are saying is the thrift part of hive-metastore 
should be a submodule. That seems easy to do. I can do that.In other words the 
thrift generated classes in their own sub project.

[~roshan_naik] I will look at there. Currently though there is a lot of stuff 
in there I do not want. I generally like gutting things and stripping them 
down.I am afraid a tool like 'makpom' will just preoduce very ugly and 
complicated poms, but I am not saying I won't try it.

 Change hive's build to maven
 

 Key: HIVE-5107
 URL: https://issues.apache.org/jira/browse/HIVE-5107
 Project: Hive
  Issue Type: Task
Reporter: Edward Capriolo
Assignee: Edward Capriolo

 I can not cope with hive's build infrastructure any more. I have started 
 working on porting the project to maven. When I have some solid progess i 
 will github the entire thing for review. Then we can talk about switching the 
 project somehow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Hive 0.12 release

2013-08-31 Thread Edward Capriolo
The error reported on the list a few days ago with streaming not working
properly when grouping on multiple columns should also probably qualify as
a blocker since streaming is a key feature of hive.


On Sat, Aug 31, 2013 at 10:31 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 I do not think we should consider too many not done items for 12.0. I do
 not think releases should be a wish-list. If trunk is working with no
 blockers we should build a release, anything not done and not committed
 goes in next release. I do not like cherry picking issues and then waiting
 for them. Historically adding types is much more complicated then people
 think. Generally there are three or more follow on issues for things people
 did not consider in the initial patch, it worked this way for binary,
 decimal, date, so I am not super eager to announce and release a large
 feature that was just committed and not heavily battle tested. The npath
 thing is a blocker and we can not release without that.

 Committers should review the other blockers as well, and either mark them
 not as blockers, or work to get them committed, because if we have
 blockers, we should dealing with them.


 On Fri, Aug 30, 2013 at 11:21 PM, Eugene Koifman ekoif...@hortonworks.com
  wrote:

 Because this change includes moving/renaming 300 files and then adding
 about 200 more with the same name (but contents from 0.11 branch) as the
 file had before the move.  The first part is necessary to change the
 package name, the second to ensure backwards compatibility.  I described
 this in detail the mail RFC: Major HCatalog refactoring.

 Given the complexity of the changes I think creating and applying a patch
 could end up with a lot of conflicts.  So doing this after the branch adds
 complexity but does not add anything useful.

 Eugene



 On Fri, Aug 30, 2013 at 5:57 PM, Thejas Nair the...@hortonworks.com
 wrote:

  Hi Eugene,
  Can you please elaborate on why you would like to have this in before
  branching and not commit it after branching in trunk and the branch ?
  Thanks,
  Thejas
 
 
 
  On Thu, Aug 29, 2013 at 10:31 PM, Eugene Koifman
  ekoif...@hortonworks.comwrote:
 
   I think we should make sure that several items under HIVE-4869 get
  checked
   in before branching.
  
   Eugene
  
  
   On Thu, Aug 29, 2013 at 9:18 PM, Thejas Nair the...@hortonworks.com
   wrote:
  
It has been more than 3 months since 0.11 was released and we
 already
   have
294 jiras in resolved-fixed state for 0.12. This includes several
 new
features such as date data type, optimizer improvements, ORC format
improvements and many bug fixes. There are also many features look
  ready
   to
get committed soon such as the varchar type.
I think it is time to start preparing for a 0.12 release by
 creating a
branch later next week and start stabilizing it. What do people
 think
   about
it ?
   
As we get closer to the branching, we can start discussing any
  additional
features/bug fixes that we should add to the release and start
  monitoring
their progress.
   
Thanks,
Thejas
   
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
  entity
   to
which it is addressed and may contain information that is
 confidential,
privileged and exempt from disclosure under applicable law. If the
  reader
of this message is not the intended recipient, you are hereby
 notified
   that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
   immediately
and delete it from your system. Thank You.
   
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity
  to
   which it is addressed and may contain information that is
 confidential,
   privileged and exempt from disclosure under applicable law. If the
 reader
   of this message is not the intended recipient, you are hereby notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
  immediately
   and delete it from your system. Thank You.
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
 entity to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 

 

[jira] [Commented] (HIVE-5184) Load filesystem, ugi, metastore client at tez session startup

2013-08-31 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755571#comment-13755571
 ] 

Edward Capriolo commented on HIVE-5184:
---

Would be nice to do this to the standard hive as well.Normally you do not see 
errors ontil you try a command but we could detect those on startup.

 Load filesystem, ugi, metastore client at tez session startup
 -

 Key: HIVE-5184
 URL: https://issues.apache.org/jira/browse/HIVE-5184
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch

 Attachments: HIVE-5184.1.patch


 Make sure the session is ready to go when we connect. That way once the 
 session/connection is open, we're ready to go.
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4789) FetchOperator fails on partitioned Avro data

2013-08-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755586#comment-13755586
 ] 

Hudson commented on HIVE-4789:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2300 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2300/])
HIVE-4789 : FetchOperator fails on partitioned Avro data (Sean Busbey via 
Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1519132)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
* /hive/trunk/ql/src/test/queries/clientpositive/avro_partitioned.q
* /hive/trunk/ql/src/test/results/clientpositive/avro_partitioned.q.out


 FetchOperator fails on partitioned Avro data
 

 Key: HIVE-4789
 URL: https://issues.apache.org/jira/browse/HIVE-4789
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.12.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Blocker
 Fix For: 0.12.0

 Attachments: HIVE-4789.1.patch.txt, HIVE-4789.2.patch.txt, 
 HIVE-4789.3.patch.txt


 HIVE-3953 fixed using partitioned avro tables for anything that used the 
 MapOperator, but those that rely on FetchOperator still fail with the same 
 error.
 e.g.
 {code}
   SELECT * FROM partitioned_avro LIMIT 5;
   SELECT * FROM partitioned_avro WHERE partition_col=value;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira