[jira] [Commented] (HIVE-5302) PartitionPruner fails on Avro non-partitioned data

2013-09-28 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780718#comment-13780718
 ] 

Sean Busbey commented on HIVE-5302:
---

Arg. Okay, tl;dr: I need to go back to the drawing board on finding a suitable 
test. Please lower priority or close as appropriate.

Long version:

In setting up my test case I was too quick to presume AvroSerdeException 
showing up in the logs was a hard failure. But there does appear to be a 
non-fatal problem when the partition pruner optimization is working with a 
non-partitioned avro table. It attempts to make a shadow partition to represent 
the whole table. Creating this partition relies on an initializer that goes 
through a code path for instantiating the SerDe based on feedback just from 
MetaStoreUtils.

So the AvroSerDe fails during initialization (and logs a WARN about it with an 
AvroSerdeException), but since this instance of the serde is never actually 
used, it doesn't result in a failure.

you can see this by even running the basic sanity test:

{noformat}
  $ ant clean package
…
  $ ant -Dmodule=ql -Dtestcase=TestCliDriver -Dqfile=avro_sanity_test.q test
…
BUILD SUCCESSFUL
Total time: 1 minute 15 seconds
  $ less build/ql/tmp/hive.log
{noformat}

In the log grep for AvroSerdeException (for me it's line 3198)

So sad Sean will need to go back to finding a case where this explodes in a way 
that stops things.

On the matter of query plan bloat, we could isolate related changes to the Avro 
Serde so long as there's a way to get at table properties during SerDe 
initialization. That way it could check partition-specific and then fall back 
to table on its own. I'll worry about that once I find a test case.

 PartitionPruner fails on Avro non-partitioned data
 --

 Key: HIVE-5302
 URL: https://issues.apache.org/jira/browse/HIVE-5302
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Blocker
  Labels: avro
 Attachments: HIVE-5302.1-branch-0.12.patch.txt, 
 HIVE-5302.1.patch.txt, HIVE-5302.1.patch.txt


 While updating HIVE-3585 I found a test case that causes the failure in the 
 MetaStoreUtils partition retrieval from back in HIVE-4789.
 in this case, the failure is triggered when the partition pruner is handed a 
 non-partitioned table and has to construct a pseudo-partition.
 e.g.
 {code}
   INSERT OVERWRITE TABLE partitioned_table PARTITION(col) SELECT id, foo, col 
 FROM non_partitioned_table WHERE col = 9;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5391) make ORC predicate pushdown work with vectorization

2013-09-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780724#comment-13780724
 ] 

Hive QA commented on HIVE-5391:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12605630/HIVE-5391.01-vectorization.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 4054 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.fileformats.TestOrcDynamicPartitioned.testHCatDynamicPartitionedTableMultipleTask
org.apache.hive.hcatalog.mapreduce.TestHCatExternalDynamicPartitioned.testHCatDynamicPartitionedTable
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/948/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/948/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

 make ORC predicate pushdown work with vectorization
 ---

 Key: HIVE-5391
 URL: https://issues.apache.org/jira/browse/HIVE-5391
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-5391.01-vectorization.patch, 
 HIVE-5391-vectorization.patch


 Vectorized execution doesn't utilize ORC predicate pushdown. It should.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4561) Column stats : LOW_VALUE (or HIGH_VALUE) will always be 0.0000 ,if all the column values larger than 0.0 (or if all column values smaller than 0.0)

2013-09-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780727#comment-13780727
 ] 

Hive QA commented on HIVE-4561:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12586100/HIVE-4561.3.patch

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/951/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/951/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests failed with: NonZeroExitCodeException: Command 'bash 
/data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and 
output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-951/source-prep.txt
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1527155.

At revision 1527155.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0 to p2
+ exit 1
'
{noformat}

This message is automatically generated.

 Column stats :  LOW_VALUE (or HIGH_VALUE) will always be 0. ,if all the 
 column values larger than 0.0 (or if all column values smaller than 0.0)
 

 Key: HIVE-4561
 URL: https://issues.apache.org/jira/browse/HIVE-4561
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.12.0
Reporter: caofangkun
Assignee: caofangkun
 Attachments: HIVE-4561.1.patch, HIVE-4561.2.patch, HIVE-4561.3.patch


 if all column values larger than 0.0  DOUBLE_LOW_VALUE always will be 0.0 
 or  if all column values less than 0.0,  DOUBLE_HIGH_VALUE will always be 
 hive (default) create table src_test (price double);
 hive (default) load data local inpath './test.txt' into table src_test;
 hive (default) select * from src_test;
 OK
 1.0
 2.0
 3.0
 Time taken: 0.313 seconds, Fetched: 3 row(s)
 hive (default) analyze table src_test compute statistics for columns price;
 mysql select * from TAB_COL_STATS \G;
  CS_ID: 16
DB_NAME: default
 TABLE_NAME: src_test
COLUMN_NAME: price
COLUMN_TYPE: double
 TBL_ID: 2586
 LONG_LOW_VALUE: 0
LONG_HIGH_VALUE: 0
   DOUBLE_LOW_VALUE: 0.   # Wrong Result ! Expected is 1.
  DOUBLE_HIGH_VALUE: 3.
  BIG_DECIMAL_LOW_VALUE: NULL
 BIG_DECIMAL_HIGH_VALUE: NULL
  NUM_NULLS: 0
  NUM_DISTINCTS: 1
AVG_COL_LEN: 0.
MAX_COL_LEN: 0
  NUM_TRUES: 0
 NUM_FALSES: 0
  LAST_ANALYZED: 1368596151
 2 rows in set (0.00 sec)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5395) Various cleanup in ptf code

2013-09-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780725#comment-13780725
 ] 

Hive QA commented on HIVE-5395:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12605636/HIVE-5395.3.patch.txt

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/949/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/949/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests failed with: NonZeroExitCodeException: Command 'bash 
/data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and 
output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-949/source-prep.txt
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1527155.

At revision 1527155.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0 to p2
+ exit 1
'
{noformat}

This message is automatically generated.

 Various cleanup in ptf code
 ---

 Key: HIVE-5395
 URL: https://issues.apache.org/jira/browse/HIVE-5395
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5395.1.patch.txt, HIVE-5395.2.patch.txt, 
 HIVE-5395.3.patch.txt


 Some minor issues:
 Implementing classes on left side of equals
 Stack used instead of ArrayDeque
 Classes defined statically inside other files (when they do not need to be
 Checkstyle errors like indenting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5395) Various cleanup in ptf code

2013-09-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780726#comment-13780726
 ] 

Hive QA commented on HIVE-5395:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12605636/HIVE-5395.3.patch.txt

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/950/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/950/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests failed with: NonZeroExitCodeException: Command 'bash 
/data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and 
output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-950/source-prep.txt
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1527155.

At revision 1527155.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0 to p2
+ exit 1
'
{noformat}

This message is automatically generated.

 Various cleanup in ptf code
 ---

 Key: HIVE-5395
 URL: https://issues.apache.org/jira/browse/HIVE-5395
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5395.1.patch.txt, HIVE-5395.2.patch.txt, 
 HIVE-5395.3.patch.txt


 Some minor issues:
 Implementing classes on left side of equals
 Stack used instead of ArrayDeque
 Classes defined statically inside other files (when they do not need to be
 Checkstyle errors like indenting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5298) AvroSerde performance problem caused by HIVE-3833

2013-09-28 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780749#comment-13780749
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-5298:
-

[~xuefuz] and [~ashutoshc] : I looked at the exact piece of code and thought of 
doing a similar optimization mentioned here while looking at one of my jiras, 
HIVE-5348. It seems like 
1. conf.getPathToAliases() gives the path to aliases mapping  
2. conf.getPathToPartitionInfo() gives the path to partition info mapping

It is clear that (1) and (2) return HashMaps of the same size, say numPaths.

In the change [~xuefuz] added the below line ,

{code:title=MapOperator.java|borderStyle=solid}
...
SetPartitionDesc partDescSet = 
new HashSetPartitionDesc(conf.getPathToPartitionInfo().values());
...

{code} 

The size of partDescSet returns the number of distinct partitions associated 
with the map operator. The size of the above partDescSet, say numParts, can be 
way less than numPaths if a partition comprises of many files. Hence the 
relatively less # of iterations. 
Hence, I would +1 since the idea behind this fix looks correct.

NB: The contents of the for loop in the original code looks kind of hairy and I 
am rewriting the contents of the for loop as part of HIVE-5348.

Thanks,
Hari

 AvroSerde performance problem caused by HIVE-3833
 -

 Key: HIVE-5298
 URL: https://issues.apache.org/jira/browse/HIVE-5298
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.13.0

 Attachments: HIVE-5298.1.patch, HIVE-5298.patch


 HIVE-3833 fixed the targeted problem and made Hive to use partition-level 
 metadata to initialize object inspector. In doing that, however, it goes thru 
 every file under the table to access the partition metadata, which is very 
 inefficient, especially in case of multiple files per partition. This causes 
 more problem for AvroSerde because AvroSerde initialization accesses schema, 
 which is located on file system. As a result, before hive can process any 
 data, it needs to access every file for a table, which can take long enough 
 to cause job failure because of lack of job progress.
 The improvement can be made so that partition metadata is only access once 
 per partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5324) Extend record writer and ORC reader/writer interfaces to provide statistics

2013-09-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780753#comment-13780753
 ] 

Hudson commented on HIVE-5324:
--

SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #185 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/185/])
HIVE-5324 : Extend record writer and ORC reader/writer interfaces to provide 
statistics (Prasanth J via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1527149)
* 
/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/base64/Base64TextOutputFormat.java
* 
/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java
* 
/hive/trunk/hcatalog/core/src/test/java/org/apache/hcatalog/cli/DummyStorageHandler.java
* 
/hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseBaseOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/FSRecordWriter.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveBinaryOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveIgnoreKeyTextOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveNullValueSequenceFileOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughRecordWriter.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveSequenceFileOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroContainerOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordWriter.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13OutputFormat.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java


 Extend record writer and ORC reader/writer interfaces to provide statistics
 ---

 Key: HIVE-5324
 URL: https://issues.apache.org/jira/browse/HIVE-5324
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile, statistics
 Fix For: 0.13.0

 Attachments: HIVE-5324.1.patch.txt, HIVE-5324.2.patch.txt, 
 HIVE-5324.3.patch.txt, HIVE-5324.4.patch.txt


 The current implementation for computing statistics (number of rows and raw 
 data size) happens for every single row processed. The processOp() method in 
 FileSinkOperator gets raw data size for each row from the serde and 
 accumulates the size in hashmap while counting the number of rows. This 
 accumulated statistics is then published to metastore. 
 In case of ORC, ORC already stores enough statistics internally which can be 
 made use of when publishing the stats to metastore. This will avoid the 
 duplication of work that is happening in the processOp(). Also getting the 
 statistics directly from ORC is very cheap (can directly read from the file 
 footer).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5324) Extend record writer and ORC reader/writer interfaces to provide statistics

2013-09-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780763#comment-13780763
 ] 

Hudson commented on HIVE-5324:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #119 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/119/])
HIVE-5324 : Extend record writer and ORC reader/writer interfaces to provide 
statistics (Prasanth J via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1527149)
* 
/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/base64/Base64TextOutputFormat.java
* 
/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java
* 
/hive/trunk/hcatalog/core/src/test/java/org/apache/hcatalog/cli/DummyStorageHandler.java
* 
/hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseBaseOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/FSRecordWriter.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveBinaryOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveIgnoreKeyTextOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveNullValueSequenceFileOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughRecordWriter.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveSequenceFileOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroContainerOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordWriter.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13OutputFormat.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java


 Extend record writer and ORC reader/writer interfaces to provide statistics
 ---

 Key: HIVE-5324
 URL: https://issues.apache.org/jira/browse/HIVE-5324
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile, statistics
 Fix For: 0.13.0

 Attachments: HIVE-5324.1.patch.txt, HIVE-5324.2.patch.txt, 
 HIVE-5324.3.patch.txt, HIVE-5324.4.patch.txt


 The current implementation for computing statistics (number of rows and raw 
 data size) happens for every single row processed. The processOp() method in 
 FileSinkOperator gets raw data size for each row from the serde and 
 accumulates the size in hashmap while counting the number of rows. This 
 accumulated statistics is then published to metastore. 
 In case of ORC, ORC already stores enough statistics internally which can be 
 made use of when publishing the stats to metastore. This will avoid the 
 duplication of work that is happening in the processOp(). Also getting the 
 statistics directly from ORC is very cheap (can directly read from the file 
 footer).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5324) Extend record writer and ORC reader/writer interfaces to provide statistics

2013-09-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780764#comment-13780764
 ] 

Hudson commented on HIVE-5324:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2364 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2364/])
HIVE-5324 : Extend record writer and ORC reader/writer interfaces to provide 
statistics (Prasanth J via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1527149)
* 
/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/base64/Base64TextOutputFormat.java
* 
/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java
* 
/hive/trunk/hcatalog/core/src/test/java/org/apache/hcatalog/cli/DummyStorageHandler.java
* 
/hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseBaseOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/FSRecordWriter.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveBinaryOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveIgnoreKeyTextOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveNullValueSequenceFileOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughRecordWriter.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveSequenceFileOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroContainerOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordWriter.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13OutputFormat.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java


 Extend record writer and ORC reader/writer interfaces to provide statistics
 ---

 Key: HIVE-5324
 URL: https://issues.apache.org/jira/browse/HIVE-5324
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile, statistics
 Fix For: 0.13.0

 Attachments: HIVE-5324.1.patch.txt, HIVE-5324.2.patch.txt, 
 HIVE-5324.3.patch.txt, HIVE-5324.4.patch.txt


 The current implementation for computing statistics (number of rows and raw 
 data size) happens for every single row processed. The processOp() method in 
 FileSinkOperator gets raw data size for each row from the serde and 
 accumulates the size in hashmap while counting the number of rows. This 
 accumulated statistics is then published to metastore. 
 In case of ORC, ORC already stores enough statistics internally which can be 
 made use of when publishing the stats to metastore. This will avoid the 
 duplication of work that is happening in the processOp(). Also getting the 
 statistics directly from ORC is very cheap (can directly read from the file 
 footer).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5358) ReduceSinkDeDuplication should ignore column orders when check overlapping part of keys between parent and child

2013-09-28 Thread Chun Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780765#comment-13780765
 ] 

Chun Chen commented on HIVE-5358:
-

Thanks, [~yhuai]. I got it. Seems like the first method adjust the first GBY 
to construct its key from both key and value of the reduce input is easier and 
don't have to waste extra resources to sort the rows.

 ReduceSinkDeDuplication should ignore column orders when check overlapping 
 part of keys between parent and child
 

 Key: HIVE-5358
 URL: https://issues.apache.org/jira/browse/HIVE-5358
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Chun Chen
Assignee: Chun Chen
 Attachments: D13113.1.patch, HIVE-5358.2.patch, HIVE-5358.patch


 {code}
 select key, value from (select key, value from src group by key, value) t 
 group by key, value;
 {code}
 This can be optimized by ReduceSinkDeDuplication
 {code}
 select key, value from (select key, value from src group by key, value) t 
 group by value, key;
 {code}
 However the sql above can't be optimized by ReduceSinkDeDuplication currently 
 due to different column orders of parent and child operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5297) Hive does not honor type for partition columns

2013-09-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780793#comment-13780793
 ] 

Hudson commented on HIVE-5297:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #461 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/461/])
HIVE-5297 Hive does not honor type for partition columns (Vikram Dixit via 
Harish Butani) (rhbutani: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1527024)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
* /hive/trunk/ql/src/test/queries/clientnegative/illegal_partition_type.q
* /hive/trunk/ql/src/test/queries/clientnegative/illegal_partition_type2.q
* /hive/trunk/ql/src/test/queries/clientpositive/alter_partition_coltype.q
* /hive/trunk/ql/src/test/queries/clientpositive/partition_type_check.q
* /hive/trunk/ql/src/test/results/clientnegative/alter_table_add_partition.q.out
* /hive/trunk/ql/src/test/results/clientnegative/alter_view_failure5.q.out
* /hive/trunk/ql/src/test/results/clientnegative/illegal_partition_type.q.out
* /hive/trunk/ql/src/test/results/clientnegative/illegal_partition_type2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/alter_partition_coltype.q.out
* /hive/trunk/ql/src/test/results/clientpositive/partition_type_check.q.out


 Hive does not honor type for partition columns
 --

 Key: HIVE-5297
 URL: https://issues.apache.org/jira/browse/HIVE-5297
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: 0.12.0

 Attachments: HIVE-5297.1.patch, HIVE-5297.2.patch, HIVE-5297.3.patch, 
 HIVE-5297.4.patch, HIVE-5297.5.patch, HIVE-5297.6.patch, HIVE-5297.7.patch, 
 HIVE-5297.8.patch


 Hive does not consider the type of the partition column while writing 
 partitions. Consider for example the query:
 {noformat}
 create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
 row format delimited fields terminated by ',';
 alter table tab1 add partition (month='June', day='second');
 {noformat}
 Hive accepts this query. However if you try to select from this table and 
 insert into another expecting schema match, it will insert nulls instead. We 
 should throw an exception on such user error at the time the partition 
 addition/load happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5272) Column statistics on a invalid column name results in IndexOutOfBoundsException

2013-09-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780794#comment-13780794
 ] 

Hudson commented on HIVE-5272:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #461 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/461/])
HIVE-5272 : Column statistics on a invalid column name results in 
IndexOutOfBoundsException (Prasanth J via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1527078)
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java
* /hive/trunk/ql/src/test/results/clientnegative/columnstats_tbllvl.q.out
* 
/hive/trunk/ql/src/test/results/clientnegative/columnstats_tbllvl_incorrect_column.q.out


 Column statistics on a invalid column name results in 
 IndexOutOfBoundsException
 ---

 Key: HIVE-5272
 URL: https://issues.apache.org/jira/browse/HIVE-5272
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: statistics
 Fix For: 0.13.0

 Attachments: HIVE-5272.1.patch.txt, HIVE-5272.2.patch.txt, 
 HIVE-5272.3.patch.txt, junit-noframes.html


 When invalid column name is specified for column statistics 
 IndexOutOfBoundsException is thrown. 
 {code}hive analyze table customer_staging compute statistics for columns 
 c_first_name, invalid_name, c_customer_sk;
 FAILED: IndexOutOfBoundsException Index: 2, Size: 1{code}
 If the invalid column name appears at first or last then 
 INVALID_COLUMN_REFERENCE is thrown at query planning stage. But if the 
 invalid column name appears somewhere in the middle of column lists then 
 IndexOutOfBoundsException is thrown at semantic analysis step. The problem is 
 with getTableColumnType() and getPartitionColumnType() methods. The following 
 segment 
 {code}for (int i=0; i numCols; i++) {
   colName = colNames.get(i);
   for (FieldSchema col: cols) {
 if (colName.equalsIgnoreCase(col.getName())) {
   colTypes.add(i, new String(col.getType()));
 }
   }
 }{code}
 is the reason for it. If the invalid column names appears in the middle of 
 column list then the equalsIgnoreCase() skips the invalid name and increments 
 the i. Since the list is not initialized it results in exception. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5231) Remove TestSerDe.jar from data/files

2013-09-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780796#comment-13780796
 ] 

Hudson commented on HIVE-5231:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #461 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/461/])
HIVE-5231 : Remove TestSerDe.jar from data/files (Hari Sankar via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1527004)
* /hive/trunk/build-common.xml
* /hive/trunk/data/files/TestSerDe.jar
* /hive/trunk/ql/src/test/queries/clientnegative/deletejar.q
* /hive/trunk/ql/src/test/queries/clientnegative/invalid_columns.q
* /hive/trunk/ql/src/test/queries/clientpositive/alter1.q
* /hive/trunk/ql/src/test/queries/clientpositive/input16.q
* /hive/trunk/ql/src/test/queries/clientpositive/input16_cc.q


 Remove TestSerDe.jar from data/files
 

 Key: HIVE-5231
 URL: https://issues.apache.org/jira/browse/HIVE-5231
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Fix For: 0.13.0

 Attachments: HIVE-5231.1.patch.txt, HIVE-5231.2.patch.txt, 
 HIVE-5231.3.patch.txt, HIVE-5231.4.patch.txt


 TestSerDe.jar should be removed from data/files. Even though, TestSerDe.java 
 is present in ql/src/test/org/apache/hadoop/hive/serde2/TestSerDe.java, this 
 is never compiled during build process. The jar file should be created as 
 part of build process for testing purpose rather than using a hard-coded jar 
 file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5379) NoClassDefFoundError is thrown when using lead/lag with kryo serialization

2013-09-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780792#comment-13780792
 ] 

Hudson commented on HIVE-5379:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #461 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/461/])
HIVE-5379 - NoClassDefFoundError is thrown when using lead/lag with kryo 
serialization (Reviewed By Ashutosh, Contributed by Navis) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1526941)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/LeadLagInfo.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/WindowingExprNodeEvaluatorFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java


 NoClassDefFoundError is thrown when using lead/lag with kryo serialization
 --

 Key: HIVE-5379
 URL: https://issues.apache.org/jira/browse/HIVE-5379
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Reporter: Navis
Assignee: Navis
Priority: Minor
 Fix For: 0.13.0

 Attachments: D13155.1.patch


 {noformat}
 java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:432)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
   at org.apache.hadoop.mapred.Child.main(Child.java:260)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
   ... 9 more
 Caused by: java.lang.NoClassDefFoundError: 
 org/antlr/runtime/tree/TreeWizard$ContextVisitor
   at java.lang.ClassLoader.defineClass1(Native Method)
   at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
   at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
   at 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
   at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
   at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
   at java.lang.Class.getDeclaringClass(Native Method)
   at java.lang.Class.getEnclosingClass(Class.java:1085)
   at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1054)
   at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1110)
   at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:526)
   at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:502)
   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
   at 
 com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
   at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
   at 
 com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
   at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
   at 
 com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
   at 
 com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
   at 

[jira] [Commented] (HIVE-5374) hive-schema-0.13.0.postgres.sql doesn't work

2013-09-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780795#comment-13780795
 ] 

Hudson commented on HIVE-5374:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #461 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/461/])
HIVE-5374 : hive-schema-0.13.0.postgres.sql doesn't work (Kousuke Saruta via 
Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1527007)
* /hive/trunk/metastore/scripts/upgrade/postgres/014-HIVE-3764.postgres.sql
* /hive/trunk/metastore/scripts/upgrade/postgres/hive-schema-0.12.0.postgres.sql
* /hive/trunk/metastore/scripts/upgrade/postgres/hive-schema-0.13.0.postgres.sql
* 
/hive/trunk/metastore/scripts/upgrade/postgres/upgrade-0.11.0-to-0.12.0.postgres.sql
* 
/hive/trunk/metastore/scripts/upgrade/postgres/upgrade-0.12.0-to-0.13.0.postgres.sql


 hive-schema-0.13.0.postgres.sql doesn't work
 

 Key: HIVE-5374
 URL: https://issues.apache.org/jira/browse/HIVE-5374
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.13.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta
Priority: Blocker
 Fix For: 0.12.0

 Attachments: HIVE-5374.1.patch, HIVE-5374.patch.1, HIVE-5374.patch.2


 hive-schema-0.13.0.postgres.sql doesn't work. In PostgreSQL, if we double 
 quote keyword (colmn name, table name etc ), those names are treated case 
 sensitively. But in the script, there is a non double quoted table name and 
 column anme although those are double quoted at the definition.
 {code}
 CREATE TABLE VERSION (
   VER_ID bigint,
   SCHEMA_VERSION character varying(127) NOT NULL,
   COMMENT character varying(255) NOT NULL,
   PRIMARY KEY (VER_ID)
 );
 {code}
 {code}
 INSERT INTO VERSION (VER_ID, SCHEMA_VERSION, VERSION_COMMENT) VALUES (1, 
 '0.13.0', 'Hive release version 0.13.0');
 {code}
 Also, the definition above defines column COMMENT but I think it should be 
 named VERSION_COMMENT.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5361) PTest2 should allow a different JVM for compilation versus execution

2013-09-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780790#comment-13780790
 ] 

Hudson commented on HIVE-5361:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #461 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/461/])
HIVE-5361 - PTest2 should allow a different JVM for compilation versus 
execution (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1526925)
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/CleanupPhase.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/PTest.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/conf/TestConfiguration.java
* 
/hive/trunk/testutils/ptest2/src/main/java/org/apache/hive/ptest/execution/context/CloudExecutionContextProvider.java
* /hive/trunk/testutils/ptest2/src/main/resources/batch-exec.vm
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestCleanupPhase.java
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestScripts.java
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestScripts.testAlternativeTestJVM.approved.txt
* 
/hive/trunk/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestScripts.testBatch.approved.txt


 PTest2 should allow a different JVM for compilation versus execution
 

 Key: HIVE-5361
 URL: https://issues.apache.org/jira/browse/HIVE-5361
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Minor
 Fix For: 0.13.0

 Attachments: HIVE-5361.patch


 NO PRECOMMIT TESTS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5357) ReduceSinkDeDuplication optimizer pick the wrong keys in pRS-cGBYm-cRS-cGBYr scenario when there are distinct keys in child GBY

2013-09-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780791#comment-13780791
 ] 

Hudson commented on HIVE-5357:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #461 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/461/])
HIVE-5357 : ReduceSinkDeDuplication optimizer pick the wrong keys in 
pRS-cGBYm-cRS-cGBYr scenario when there are distinct keys in child GBY (Chun 
Chen via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1526990)
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationUtilities.java
* /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_extended.q
* 
/hive/trunk/ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out


 ReduceSinkDeDuplication optimizer pick the wrong keys in pRS-cGBYm-cRS-cGBYr 
 scenario when there are distinct keys in child GBY
 ---

 Key: HIVE-5357
 URL: https://issues.apache.org/jira/browse/HIVE-5357
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Chun Chen
Assignee: Chun Chen
Priority: Blocker
 Fix For: 0.13.0

 Attachments: HIVE-5357.patch


 Example:
 {code}
 select key, count(distinct value) from (select key, value from src group by 
 key, value) t group by key;
 //result
 0 0 NULL
 10  10  NULL
 100 100 NULL
 103 103 NULL
 104 104 NULL
 {code}
 Obviously the result is wrong.
 When we have a simple group by query with a distinct column
 {code}
 explain select count(distinct value) from src group by key;
 {code}
 The plan is 
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 src 
   TableScan
 alias: src
 Select Operator
   expressions:
 expr: key
 type: string
 expr: value
 type: string
   outputColumnNames: key, value
   Group By Operator
 aggregations:
   expr: count(DISTINCT value)
 bucketGroup: false
 keys:
   expr: key
   type: string
   expr: value
   type: string
 mode: hash
 outputColumnNames: _col0, _col1, _col2
 Reduce Output Operator
   key expressions:
 expr: _col0
 type: string
 expr: _col1
 type: string
   sort order: ++
   Map-reduce partition columns:
 expr: _col0
 type: string
   tag: -1
   value expressions:
 expr: _col2
 type: bigint
   Reduce Operator Tree:
 Group By Operator
   aggregations:
 expr: count(DISTINCT KEY._col1:0._col0)
   bucketGroup: false
   keys:
 expr: KEY._col0
 type: string
   mode: mergepartial
   outputColumnNames: _col0, _col1
   Select Operator
 expressions:
   expr: _col1
   type: bigint
 outputColumnNames: _col0
 File Output Operator
   compressed: false
   GlobalTableId: 0
   table:
   input format: org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
   Stage: Stage-0
 Fetch Operator
   limit: -1
 {code}
 The map side GBY also adds the distinct columns (value in this case) to its 
 key columns.
 When RSDedup optimizes a query involving a GBY with distinct keys, if 
 map-side aggregation is enabled, currently it assigns the map-side GBY's key 
 columns to the reduce-side GBY. So, for the example shown at the beginning, 
 after we generate a plan with a single MR job, the second GBY in the 
 reduce-side uses both key and value as its key columns. The correct key 
 column is key.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5324) Extend record writer and ORC reader/writer interfaces to provide statistics

2013-09-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780799#comment-13780799
 ] 

Hudson commented on HIVE-5324:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #462 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/462/])
HIVE-5324 : Extend record writer and ORC reader/writer interfaces to provide 
statistics (Prasanth J via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1527149)
* 
/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/fileformat/base64/Base64TextOutputFormat.java
* 
/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java
* 
/hive/trunk/hcatalog/core/src/test/java/org/apache/hcatalog/cli/DummyStorageHandler.java
* 
/hive/trunk/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseBaseOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/FSRecordWriter.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveBinaryOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveIgnoreKeyTextOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveNullValueSequenceFileOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughRecordWriter.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/HiveSequenceFileOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFileOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroContainerOutputFormat.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordWriter.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java
* 
/hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/udf/Rot13OutputFormat.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeStats.java


 Extend record writer and ORC reader/writer interfaces to provide statistics
 ---

 Key: HIVE-5324
 URL: https://issues.apache.org/jira/browse/HIVE-5324
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile, statistics
 Fix For: 0.13.0

 Attachments: HIVE-5324.1.patch.txt, HIVE-5324.2.patch.txt, 
 HIVE-5324.3.patch.txt, HIVE-5324.4.patch.txt


 The current implementation for computing statistics (number of rows and raw 
 data size) happens for every single row processed. The processOp() method in 
 FileSinkOperator gets raw data size for each row from the serde and 
 accumulates the size in hashmap while counting the number of rows. This 
 accumulated statistics is then published to metastore. 
 In case of ORC, ORC already stores enough statistics internally which can be 
 made use of when publishing the stats to metastore. This will avoid the 
 duplication of work that is happening in the processOp(). Also getting the 
 statistics directly from ORC is very cheap (can directly read from the file 
 footer).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5302) PartitionPruner fails on Avro non-partitioned data

2013-09-28 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780880#comment-13780880
 ] 

Edward Capriolo commented on HIVE-5302:
---

We do not necessarily need a documented testable case in the to justify the 
change, seeing a non fatal error in the logs is reason enough to apply the 
patch.

{quote}
In the matter of query plan bloat, we could isolate related changes to the Avro 
Serde so long as there's a way to get at table properties during SerDe 
initialization. That way it could check partition-specific and then fall back 
to table on its own. I'll worry about that once I find a test case.
{quote}
I would focus less on finding a test case. We can treat this as an 
optimization, and take your word that their are cases where the current system 
does not work. See if you can find this other way to solve this without 
effecting the plan, i think that is a big win for all parties, if it is not 
possible there is nothing wrong with committing your original patch in my eyes.

 PartitionPruner fails on Avro non-partitioned data
 --

 Key: HIVE-5302
 URL: https://issues.apache.org/jira/browse/HIVE-5302
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Blocker
  Labels: avro
 Attachments: HIVE-5302.1-branch-0.12.patch.txt, 
 HIVE-5302.1.patch.txt, HIVE-5302.1.patch.txt


 While updating HIVE-3585 I found a test case that causes the failure in the 
 MetaStoreUtils partition retrieval from back in HIVE-4789.
 in this case, the failure is triggered when the partition pruner is handed a 
 non-partitioned table and has to construct a pseudo-partition.
 e.g.
 {code}
   INSERT OVERWRITE TABLE partitioned_table PARTITION(col) SELECT id, foo, col 
 FROM non_partitioned_table WHERE col = 9;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5324) Extend record writer and ORC reader/writer interfaces to provide statistics

2013-09-28 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780883#comment-13780883
 ] 

Edward Capriolo commented on HIVE-5324:
---

Have we considered providing this interface as a property of the object rather 
then a ctor parameter, people have implemented record-writes outside of hive 
and this could be a breaking change for them. Are there plans to produce stats 
in trunk for anything besides orc ? What type of load with publishing stats put 
on the metastore? Is this feature disabled via hive.stats.publish?

 Extend record writer and ORC reader/writer interfaces to provide statistics
 ---

 Key: HIVE-5324
 URL: https://issues.apache.org/jira/browse/HIVE-5324
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile, statistics
 Fix For: 0.13.0

 Attachments: HIVE-5324.1.patch.txt, HIVE-5324.2.patch.txt, 
 HIVE-5324.3.patch.txt, HIVE-5324.4.patch.txt


 The current implementation for computing statistics (number of rows and raw 
 data size) happens for every single row processed. The processOp() method in 
 FileSinkOperator gets raw data size for each row from the serde and 
 accumulates the size in hashmap while counting the number of rows. This 
 accumulated statistics is then published to metastore. 
 In case of ORC, ORC already stores enough statistics internally which can be 
 made use of when publishing the stats to metastore. This will avoid the 
 duplication of work that is happening in the processOp(). Also getting the 
 statistics directly from ORC is very cheap (can directly read from the file 
 footer).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5395) Various cleanup in ptf code

2013-09-28 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5395:
--

Attachment: HIVE-5395.4.patch.txt

 Various cleanup in ptf code
 ---

 Key: HIVE-5395
 URL: https://issues.apache.org/jira/browse/HIVE-5395
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5395.1.patch.txt, HIVE-5395.2.patch.txt, 
 HIVE-5395.3.patch.txt, HIVE-5395.4.patch.txt


 Some minor issues:
 Implementing classes on left side of equals
 Stack used instead of ArrayDeque
 Classes defined statically inside other files (when they do not need to be
 Checkstyle errors like indenting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5253) Create component to compile and jar dynamic code

2013-09-28 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-5253:
--

Attachment: HIVE-5253.8.patch.txt

 Create component to compile and jar dynamic code
 

 Key: HIVE-5253
 URL: https://issues.apache.org/jira/browse/HIVE-5253
 Project: Hive
  Issue Type: Sub-task
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5253.1.patch.txt, HIVE-5253.3.patch.txt, 
 HIVE-5253.3.patch.txt, HIVE-5253.3.patch.txt, HIVE-5253.8.patch.txt, 
 HIVE-5253.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3925) dependencies of fetch task are not shown by explain

2013-09-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780903#comment-13780903
 ] 

Ashutosh Chauhan commented on HIVE-3925:


[~navis] I think we should move forward with this. This is very useful to 
understand the behavior of query planner. If you refresh your source only 
patch, I will take care of updating .q files and committing it.

 dependencies of fetch task are not shown by explain
 ---

 Key: HIVE-3925
 URL: https://issues.apache.org/jira/browse/HIVE-3925
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Navis
 Attachments: HIVE-3925.D8577.1.patch, HIVE-3925.D8577.2.patch, 
 HIVE-3925.D8577.3.patch


 A simple query like:
 hive explain select * from src order by key;
 OK
 ABSTRACT SYNTAX TREE:
   (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME src))) (TOK_INSERT 
 (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR 
 TOK_ALLCOLREF)) (TOK_ORDERBY (TOK_TABSORTCOLNAMEASC (TOK_TABLE_OR_COL key)
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 is a root stage
   Stage: Stage-0
 Fetch Operator
   limit: -1
 Stage-0 is not a root stage and depends on stage-1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5220) Add option for removing intermediate directory for partition, which is empty

2013-09-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780906#comment-13780906
 ] 

Ashutosh Chauhan commented on HIVE-5220:


I see existing behavior as a bug, which your patch is fixing. Don't see a need 
for config variable. This should be default. Also, I could be mistaken but I 
think {{FileSystem}} provides rmr api. Atleast {{FsShell}} provides it. It will 
be better to reuse those apis instead of writing our own recursive delete.

 Add option for removing intermediate directory for partition, which is empty
 

 Key: HIVE-5220
 URL: https://issues.apache.org/jira/browse/HIVE-5220
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5220.D12729.1.patch


 For deeply nested partitioned table, intermediate directories are not removed 
 even if there is no partitions in it by removing them.
 {noformat}
 /deep_part/c=09/d=01
 /deep_part/c=09/d=01/e=01
 /deep_part/c=09/d=01/e=02
 /deep_part/c=09/d=02
 /deep_part/c=09/d=02/e=01
 /deep_part/c=09/d=02/e=02
 {noformat}
 After removing partition (c='09'), directory remains like this, 
 {noformat}
 /deep_part/c=09/d=01
 /deep_part/c=09/d=02
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5395) Various cleanup in ptf code

2013-09-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780907#comment-13780907
 ] 

Hive QA commented on HIVE-5395:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12605658/HIVE-5395.4.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 3179 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/952/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/952/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 Various cleanup in ptf code
 ---

 Key: HIVE-5395
 URL: https://issues.apache.org/jira/browse/HIVE-5395
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-5395.1.patch.txt, HIVE-5395.2.patch.txt, 
 HIVE-5395.3.patch.txt, HIVE-5395.4.patch.txt


 Some minor issues:
 Implementing classes on left side of equals
 Stack used instead of ArrayDeque
 Classes defined statically inside other files (when they do not need to be
 Checkstyle errors like indenting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5178) Wincompat : QTestUtil changes

2013-09-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780909#comment-13780909
 ] 

Ashutosh Chauhan commented on HIVE-5178:


+1


 Wincompat : QTestUtil changes
 -

 Key: HIVE-5178
 URL: https://issues.apache.org/jira/browse/HIVE-5178
 Project: Hive
  Issue Type: Sub-task
  Components: Windows
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-5178.2.patch, HIVE-5178.patch


 Miscellaneous QTestUtil changes are needed to make tests work under windows:
 a) Aux jars needed to be set up for minimr
 b) Ignore empty test lines if windows

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4837) Union on void type fails with NPE

2013-09-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780910#comment-13780910
 ] 

Ashutosh Chauhan commented on HIVE-4837:


+1

 Union on void type fails with NPE
 -

 Key: HIVE-4837
 URL: https://issues.apache.org/jira/browse/HIVE-4837
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4837.D11649.1.patch


 From mailing list, 
 http://www.mail-archive.com/user@hive.apache.org/msg08683.html
 {noformat}
 java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 9 more
 Caused by: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
 ... 14 more
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
 ... 17 more
 Caused by: java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:121)
 ... 22 more
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:64)
 at java.lang.String.valueOf(String.java:2826)
 at java.lang.StringBuilder.append(StringBuilder.java:115)
 at 
 org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:110)
 at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
 at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186)
 at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:563)
 at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
 at 
 org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:100)
 ... 22 more
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4501) HS2 memory leak - FileSystem objects in FileSystem.CACHE

2013-09-28 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HIVE-4501:
-

Status: Open  (was: Patch Available)

 HS2 memory leak - FileSystem objects in FileSystem.CACHE
 

 Key: HIVE-4501
 URL: https://issues.apache.org/jira/browse/HIVE-4501
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Priority: Critical
 Attachments: HIVE-4501.1.patch, HIVE-4501.1.patch, HIVE-4501.1.patch, 
 HIVE-4501.trunk.patch


 org.apache.hadoop.fs.FileSystem objects are getting accumulated in 
 FileSystem.CACHE, with HS2 in unsecure mode.
 As a workaround, it is possible to set fs.hdfs.impl.disable.cache and 
 fs.file.impl.disable.cache to true.
 Users should not have to bother with this extra configuration. 
 As a workaround disable impersonation by setting hive.server2.enable.doAs to 
 false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4501) HS2 memory leak - FileSystem objects in FileSystem.CACHE

2013-09-28 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HIVE-4501:
-

Attachment: HIVE-4501.trunk.patch

 HS2 memory leak - FileSystem objects in FileSystem.CACHE
 

 Key: HIVE-4501
 URL: https://issues.apache.org/jira/browse/HIVE-4501
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Priority: Critical
 Attachments: HIVE-4501.1.patch, HIVE-4501.1.patch, HIVE-4501.1.patch, 
 HIVE-4501.trunk.patch


 org.apache.hadoop.fs.FileSystem objects are getting accumulated in 
 FileSystem.CACHE, with HS2 in unsecure mode.
 As a workaround, it is possible to set fs.hdfs.impl.disable.cache and 
 fs.file.impl.disable.cache to true.
 Users should not have to bother with this extra configuration. 
 As a workaround disable impersonation by setting hive.server2.enable.doAs to 
 false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4501) HS2 memory leak - FileSystem objects in FileSystem.CACHE

2013-09-28 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HIVE-4501:
-

Status: Patch Available  (was: Open)

Here's the trunk patch. The original one wasn't applicable for the trunk 
version and should have been clearly marked as such. Apologies.

 HS2 memory leak - FileSystem objects in FileSystem.CACHE
 

 Key: HIVE-4501
 URL: https://issues.apache.org/jira/browse/HIVE-4501
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Priority: Critical
 Attachments: HIVE-4501.1.patch, HIVE-4501.1.patch, HIVE-4501.1.patch, 
 HIVE-4501.trunk.patch


 org.apache.hadoop.fs.FileSystem objects are getting accumulated in 
 FileSystem.CACHE, with HS2 in unsecure mode.
 As a workaround, it is possible to set fs.hdfs.impl.disable.cache and 
 fs.file.impl.disable.cache to true.
 Users should not have to bother with this extra configuration. 
 As a workaround disable impersonation by setting hive.server2.enable.doAs to 
 false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results

2013-09-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780916#comment-13780916
 ] 

Ashutosh Chauhan commented on HIVE-3972:


[~navis] HIVE-3562 and HIVE-1402 are in now. In light of that, is this 
optimization still relevant? Are there any queries which may see still further 
benefits from this patch even after both of those optimizations are on.

 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, 
 HIVE-3972.D8349.3.patch, HIVE-3972.D8349.4.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-09-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780918#comment-13780918
 ] 

Ashutosh Chauhan commented on HIVE-3959:


This is useful work. [~bmadhvani] / [~gangtimliu] Do you guys want to refresh 
this patch?

 Update Partition Statistics in Metastore Layer
 --

 Key: HIVE-3959
 URL: https://issues.apache.org/jira/browse/HIVE-3959
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Statistics
Reporter: Bhushan Mandhani
Assignee: Gang Tim Liu
Priority: Minor
 Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
 HIVE-3959.patch.12.txt, HIVE-3959.patch.2


 When partitions are created using queries (insert overwrite and insert 
 into) then the StatsTask updates all stats. However, when partitions are 
 added directly through metadata-only partitions (either CLI or direct calls 
 to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
 set to true. This puts us in a situation where we can't decide if stats are 
 truly reliable or not.
 We propose that the fast stats (numFiles and totalSize) which don't require 
 a scan of the data should always be populated and be completely reliable. For 
 now we are still excluding rowCount and rawDataSize because that will make 
 these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3930) Generate and publish source jars

2013-09-28 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13781187#comment-13781187
 ] 

Konstantin Boudnik commented on HIVE-3930:
--

Any chance to have this fix on 0.12 (or trunk?)

 Generate and publish source jars
 

 Key: HIVE-3930
 URL: https://issues.apache.org/jira/browse/HIVE-3930
 Project: Hive
  Issue Type: Improvement
Reporter: Mikhail Bautin

 Hive should generate and publish source jars to Maven.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-3694) Generate test jars and publish them to Maven

2013-09-28 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HIVE-3694:
-

Affects Version/s: 0.9.0

 Generate test jars and publish them to Maven
 

 Key: HIVE-3694
 URL: https://issues.apache.org/jira/browse/HIVE-3694
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Affects Versions: 0.9.0
Reporter: Mikhail Bautin
Priority: Minor
 Attachments: D6843.1.patch, D6843.2.patch, D6843.3.patch, 
 D6843.4.patch


 It should be possible to generate Hive test jars and publish them to Maven so 
 that other projects that rely on Hive or extend it could reuse its test 
 library.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: did you always have to log in to phabricator

2013-09-28 Thread Sean Busbey
Bump. Any update on this?


On Tue, Sep 17, 2013 at 12:41 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 I do not like this. It is inconvenience when using a mobile device, but
 more importantly it does not seem very transparent to our end users. For
 example, a user is browsing jira they may want to review the code only on
 review board (not yet attached to the issue), they should not be forced to
 sign up to help in the process.

 Would anyone from facebook care to chime in here? I think we all like
 fabricator for the most part. Our docs suggest this fabricator is our
 de-facto review system. As an ASF project I do not think requiring a login
 on some external service even to review a jira is correct.


 On Tue, Sep 17, 2013 at 12:27 PM, Xuefu Zhang xzh...@cloudera.com wrote:

  Yeah. I used to be able to view w/o login, but now I am not.
 
 
  On Tue, Sep 17, 2013 at 7:27 AM, Brock Noland br...@cloudera.com
 wrote:
 
   Personally I prefer Review Board.
  
   On Tue, Sep 17, 2013 at 8:31 AM, Edward Capriolo 
 edlinuxg...@gmail.com
   wrote:
I never remeber having to log into phabricator to view a patch. Has
  this
changed recently? I believe that having to create an external account
  to
view a patch in progress is not something we should be doing.
  
  
  
   --
   Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
  
 




-- 
Sean