date:20140825

[jira] [Updated] (HIVE-5690) Support subquery for single sourced multi query

2014-08-25 Thread Navis (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5690:


Attachment: HIVE-5690.11.patch.txt

 Support subquery for single sourced multi query
 ---

 Key: HIVE-5690
 URL: https://issues.apache.org/jira/browse/HIVE-5690
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D13791.1.patch, HIVE-5690.10.patch.txt, 
 HIVE-5690.11.patch.txt, HIVE-5690.2.patch.txt, HIVE-5690.3.patch.txt, 
 HIVE-5690.4.patch.txt, HIVE-5690.5.patch.txt, HIVE-5690.6.patch.txt, 
 HIVE-5690.7.patch.txt, HIVE-5690.8.patch.txt, HIVE-5690.9.patch.txt


 Single sourced multi (insert) query is very useful for various ETL processes 
 but it does not allow subqueries included. For example, 
 {noformat}
 explain from src 
 insert overwrite table x1 select * from (select distinct key,value) b order 
 by key
 insert overwrite table x2 select * from (select distinct key,value) c order 
 by value;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7849) Support more generic predicate pushdown for hbase handler

2014-08-25 Thread Navis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108795#comment-14108795
 ] 

Navis commented on HIVE-7849:
-

vector_between_in need update. Others are not related to this.

 Support more generic predicate pushdown for hbase handler
 -

 Key: HIVE-7849
 URL: https://issues.apache.org/jira/browse/HIVE-7849
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-7849.1.patch.txt, HIVE-7849.2.patch.txt


 Currently, hbase handler supports AND conjugated filters only. This is the 
 first try to support OR, NOT, IN, BETWEEN predicates for hbase.
 Mostly based on the work done by [~teddy.choi].



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24986: HIVE-7553: decouple the auxiliary jar loading from hive server2 starting phase

2014-08-25 Thread cheng xu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24986/
---

(Updated Aug. 25, 2014, 6:45 a.m.)


Review request for hive.


Changes
---

(1) clean code (2) change property description


Bugs: HIVE-7553
https://issues.apache.org/jira/browse/HIVE-7553


Repository: hive-git


Description
---

HIVE-7553: decouple the auxiliary jar loading from hive server2 starting phase


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
7f4afd9d64aff18329e7850342855aade42e21f5 
  hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatUtil.java 
93a03adeab7ba3c3c91344955d303e4252005239 
  
hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatClient.java
 f25039dcf55b3b24bbf8dcba05855665a1c7f3b0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultFetchFormatter.java 
5924bcf1f55dc4c2dd06f312f929047b7df9de55 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
0c6a3d44ef1f796778768421dc02f8bf3ede6a8c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionTask.java 
bd45df1a401d1adb009e953d08205c7d5c2d5de2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ListSinkOperator.java 
dcc19f70644c561e17df8c8660ca62805465f1d6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 
76fee612a583cdc2c632d27932623521b735e768 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
eb2851b2c5fa52e0f555b3d8d1beea5d1ac3b225 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/HookUtils.java 
3f474f846c7af5f1f65f1c14f3ce51308f1279d4 
  ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughOutputFormat.java 
0962cadce0d515e046371d0a816f4efd70b8eef7 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java 
9051ba6d80e619ddbb6c27bb161e1e7a5cdb08a5 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 
edec1b734fb2f015902fd5e1c8afd5acdf4cb3bf 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 
2f13ac2e30195a25844a25e9ec8a7c42ed99b75c 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java
 b15aedc15d8cd0979aced6ff4c9e87606576f0a3 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 
d86df453cd7686627940ade62c0fd72f1636dd0b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java 
0a1c660b4bbd46d8410e646270b23c99a4de8b7e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
b05d3b48ec014e4dc8026bb5f6615f62da0e2210 
  ql/src/java/org/apache/hadoop/hive/ql/plan/AggregationDesc.java 
17eeae1a3435fceb4b57325675c58b599e0973ea 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 
930acbc98e81f8d421cee1170659d8b7a427fe7d 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 
39f1793aaa5bed8a494883cac516ad314be951f4 
  ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorFactory.java 
0d237f01a248a65b4092eb7202fe30eebf27be82 
  ql/src/java/org/apache/hadoop/hive/ql/processors/HiveCommand.java 
f5bc427a5834860441f21bfc72e175c6a1cf877f 
  ql/src/java/org/apache/hadoop/hive/ql/processors/RefreshProcessor.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 
9798cf3f537a27d1f828f8139790c62c5945c366 
  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java 
e247184b7d95c85fd3e12432e7eb75eb1e2a0b68 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBridge.java 
959007a54b335bb0bdef0256f60e6cbc65798dc7 
  ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java 
ef0052f5763922d50986f127c416af5eaa6ae30d 
  ql/src/test/resources/SessionStateTest-V1.jar PRE-CREATION 
  ql/src/test/resources/SessionStateTest-V2.jar PRE-CREATION 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
bc0a02c1df7f9fdb848d5f078e94a663a579e571 

Diff: https://reviews.apache.org/r/24986/diff/


Testing
---


Thanks,

cheng xu

[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-25 Thread Sathish (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish updated HIVE-7850:
--

Attachment: HIVE-7850.1.patch

New patch file submitted by correcting indentations.

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.14.0, 0.13.1
Reporter: Sathish
Assignee: Sathish
  Labels: parquet, serde
 Fix For: 0.14.0

 Attachments: HIVE-7850.1.patch, HIVE-7850.patch


 * Created a parquet file from the Avro file which have 1 array data type and 
 rest are primitive types. Avro Schema of the array data type. Eg: 
 {code}
 { name : action, type : [ { type : array, items : string }, 
 null ] }
 {code}
 * Created External Hive table with the Array type as below, 
 {code}
 create external table paraArray (action Array) partitioned by (partitionid 
 int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
 inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
 alter table paraArray add partition(partitionid=1) location '/testPara';
 {code}
 * Run the following query(select action from paraArray limit 10) and the Map 
 reduce jobs are failing with the following exception.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row [Error getting row data with exception 
 java.lang.ClassCastException: 
 parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
 org.apache.hadoop.io.ArrayWritable
 at 
 parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)
 ]
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 {code}
 This issue has long back posted on Parquet issues list and Since this is 
 related to Parquet Hive serde, I have created the Hive issue here, The 
 details and history of this information are as shown in the link here 
 https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-25 Thread Chengxiang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108843#comment-14108843
 ] 

Chengxiang Li commented on HIVE-7799:
-

Depends on the implementation of {{ResultIterator.hasNext()}}, it is designed 
to be a lazy iterator as it only try to call {{processNextRecord()}} while 
RowContainer is empty, but RowContainer does not support add more rows after 
already read as mentioned in previous comments. Here is what happens while 
different queries is executed:
# For Map only job, it write map output into file directly, no need Collector 
in this case.
# For Map Reduce job with GroupByOperator, 
{{HiveBaseFunctionResultList.collect()}} is triggered by 
{{closeRecordProcessor()}}, which is beyond the lazy-computing logic, so the 
ResultIterator does not do lazy computing in this case.
# For Map Reduce job without GroupByOperator(like cluster by queries), 
ResultIterator do lazy computing, and it clear RowContainer each time befor 
call {{processNextRecord()}}. While read/write HiveBaseFunctionResultList in 
the same thread, access progress of RowContainer is like 
.clear()-addRow()-first()-clear()-addRow()-first().. so it won't 
violate RowContainer's access rule. But with mutli threads to read/write 
HiveBaseFunctionResultList, as the ScriptOperator does which venki mentioned 
above, it would definitely hit this JIRA issue.

In my opinion, there are 2 solutions:
# remove ResultIterator lazy computing feature as patch1 does.
# implement a RowConatiner-like class, which support current RowContainer 
features. it also need to be thread-safe, and support add row after {{first()}} 
is already called. 

The second solution is quite complex, it may introduce performance degrade 
after support thread-safe access and write-after-read, compare with the 
performance upgrade of lazy-computing support, it's hardly to say whether it's 
worthy or not now. So I suggest we take the first solution to fix this issue, 
and left the possible optimization to milestone 4.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, 
 HIVE-7799.3-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-25 Thread Chengxiang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7799:


Attachment: HIVE-7799.3-spark.patch

reattach the first patch.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, 
 HIVE-7799.3-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5690) Support subquery for single sourced multi query

2014-08-25 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108860#comment-14108860
 ] 

Hive QA commented on HIVE-5690:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12664102/HIVE-5690.11.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6119 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/484/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/484/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-484/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12664102

 Support subquery for single sourced multi query
 ---

 Key: HIVE-5690
 URL: https://issues.apache.org/jira/browse/HIVE-5690
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D13791.1.patch, HIVE-5690.10.patch.txt, 
 HIVE-5690.11.patch.txt, HIVE-5690.2.patch.txt, HIVE-5690.3.patch.txt, 
 HIVE-5690.4.patch.txt, HIVE-5690.5.patch.txt, HIVE-5690.6.patch.txt, 
 HIVE-5690.7.patch.txt, HIVE-5690.8.patch.txt, HIVE-5690.9.patch.txt


 Single sourced multi (insert) query is very useful for various ETL processes 
 but it does not allow subqueries included. For example, 
 {noformat}
 explain from src 
 insert overwrite table x1 select * from (select distinct key,value) b order 
 by key
 insert overwrite table x2 select * from (select distinct key,value) c order 
 by value;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7826) Dynamic partition pruning on Tez

2014-08-25 Thread Gunther Hagleitner (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gunther Hagleitner updated HIVE-7826:
-

Attachment: HIVE-7826.3.patch

.3 has various fixes. Should be good to go now.

Dynamic partition pruning on Tez

Key: HIVE-7826
URL: https://issues.apache.org/jira/browse/HIVE-7826
Project: Hive
Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
Labels: tez
Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch

It's natural in a star schema to map one or more dimensions to partition
columns. Time or location are likely candidates.
It can also useful to be to compute the partitions one would like to scan via
a subquery (where p in select ... from ...).
The resulting joins in hive require a full table scan of the large table
though, because partition pruning takes place before the corresponding values
are known.
On Tez it's relatively straight forward to send the values needed to prune to
the application master - where splits are generated and tasks are submitted.
Using these values we can strip out any unneeded partitions dynamically,
while the query is running.
The approach is straight forward:
- Insert synthetic conditions for each join representing x in (keys of other
side in join)
- This conditions will be pushed as far down as possible
- If the condition hits a table scan and the column involved is a partition
column:
- Setup Operator to send key events to AM
- else:
- Remove synthetic predicate

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7826) Dynamic partition pruning on Tez

2014-08-25 Thread Gunther Hagleitner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108872#comment-14108872
 ] 

Gunther Hagleitner commented on HIVE-7826:
--

Review board link: https://reviews.apache.org/r/25019/

 Dynamic partition pruning on Tez
 

 Key: HIVE-7826
 URL: https://issues.apache.org/jira/browse/HIVE-7826
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
  Labels: tez
 Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch


 It's natural in a star schema to map one or more dimensions to partition 
 columns. Time or location are likely candidates. 
 It can also useful to be to compute the partitions one would like to scan via 
 a subquery (where p in select ... from ...).
 The resulting joins in hive require a full table scan of the large table 
 though, because partition pruning takes place before the corresponding values 
 are known.
 On Tez it's relatively straight forward to send the values needed to prune to 
 the application master - where splits are generated and tasks are submitted. 
 Using these values we can strip out any unneeded partitions dynamically, 
 while the query is running.
 The approach is straight forward:
 - Insert synthetic conditions for each join representing x in (keys of other 
 side in join)
 - This conditions will be pushed as far down as possible
 - If the condition hits a table scan and the column involved is a partition 
 column:
- Setup Operator to send key events to AM
 - else:
- Remove synthetic predicate



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7733) Ambiguous column reference error on query

2014-08-25 Thread Navis (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108871#comment-14108871
 ] 

Navis commented on HIVE-7733:
-

Just a blind shot. I'll look into this.

 Ambiguous column reference error on query
 -

 Key: HIVE-7733
 URL: https://issues.apache.org/jira/browse/HIVE-7733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Jason Dere
 Attachments: HIVE-7733.1.patch.txt


 {noformat}
 CREATE TABLE agg1 
   ( 
  col0 INT, 
  col1 STRING, 
  col2 DOUBLE 
   ); 
 explain SELECT single_use_subq11.a1 AS a1, 
single_use_subq11.a2 AS a2 
 FROM   (SELECT Sum(agg1.col2) AS a1 
 FROM   agg1 
 GROUP  BY agg1.col0) single_use_subq12 
JOIN (SELECT alias.a2 AS a0, 
 alias.a1 AS a1, 
 alias.a1 AS a2 
  FROM   (SELECT agg1.col1 AS a0, 
 '42'  AS a1, 
 agg1.col0 AS a2 
  FROM   agg1 
  UNION ALL 
  SELECT agg1.col1 AS a0, 
 '41'  AS a1, 
 agg1.col0 AS a2 
  FROM   agg1) alias 
  GROUP  BY alias.a2, 
alias.a1) single_use_subq11 
  ON ( single_use_subq11.a0 = single_use_subq11.a0 );
 {noformat}
 Gets the following error:
 FAILED: SemanticException [Error 10007]: Ambiguous column reference a2
 Looks like this query had been working in 0.12 but starting failing with this 
 error in 0.13



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Hive unwanted directories creation issue

2014-08-25 Thread Valluri, Sathish

We are creating external table in Hive and if the location path is not present 
in the HDFS say /testdata(as shown below), Hive is creating the '/testdata' 
dummy folder.

Is there any option in hive or any way to stop creating dummy directories if 
the location folder not exists.

So we end up  creating many unwanted dummy directories if the data not present 
on the HDFS for many partitions we add after creating table.



CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES 
('avro.schema.literal'='{ schema json literal') STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 
'/testdata/';



Regards

Sathish Valluri

[jira] [Commented] (HIVE-5799) session/operation timeout for hiveserver2

2014-08-25 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108910#comment-14108910
 ] 

Hive QA commented on HIVE-5799:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663284/HIVE-5799.12.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6119 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/486/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/486/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-486/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663284

 session/operation timeout for hiveserver2
 -

 Key: HIVE-5799
 URL: https://issues.apache.org/jira/browse/HIVE-5799
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5799.1.patch.txt, HIVE-5799.10.patch.txt, 
 HIVE-5799.11.patch.txt, HIVE-5799.12.patch.txt, HIVE-5799.12.patch.txt, 
 HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt, 
 HIVE-5799.5.patch.txt, HIVE-5799.6.patch.txt, HIVE-5799.7.patch.txt, 
 HIVE-5799.8.patch.txt, HIVE-5799.9.patch.txt


 Need some timeout facility for preventing resource leakages from instable  or 
 bad clients.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-25 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108912#comment-14108912
 ] 

Hive QA commented on HIVE-7850:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12664109/HIVE-7850.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/487/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/487/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-487/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-487/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'common/src/java/org/apache/hadoop/hive/conf/HiveConf.java'
Reverted 'common/src/java/org/apache/hadoop/hive/conf/Validator.java'
Reverted 'service/src/java/org/apache/hive/service/cli/OperationState.java'
Reverted 'service/src/java/org/apache/hive/service/cli/session/HiveSession.java'
Reverted 
'service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java'
Reverted 
'service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java'
Reverted 
'service/src/java/org/apache/hive/service/cli/session/SessionManager.java'
Reverted 'service/src/java/org/apache/hive/service/cli/operation/Operation.java'
Reverted 
'service/src/java/org/apache/hive/service/cli/operation/OperationManager.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-unit/target 
itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java
 itests/custom-serde/target itests/util/target hcatalog/target 
hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/webhcat/svr/target 
hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target 
accumulo-handler/target hwi/target common/target common/src/gen service/target 
contrib/target serde/target beeline/target odbc/target cli/target 
ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1620279.

At revision 1620279.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12664109

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive

This looks like an Hive issue to me, Can anyone suggest other ways to overcome this.

2014-08-25 Thread valluri sathish

We are creating external
table in Hive and if the location path is not present in the HDFS say
/testdata(as shown below), Hive is creating the ‘/testdata’ dummy folder.
Is there any option in hive
or any way to stop creating dummy directories if the location folder not
exists.
So we end up  creating
many unwanted dummy directories if the data not present on the HDFS for many
partitions we add after creating table.
 
CREATE EXTERNAL TABLE
testTable ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH
SERDEPROPERTIES ('avro.schema.literal'='{ schema json literal') STORED
AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' 
LOCATION '/testdata/';
 
Regards
Sathish Valluri

Re: Hive on Tez Counters

2014-08-25 Thread Suma Shivaprasad

Hi Siddharth/Gunther,

Thanks for replying to my queries. I was particularly interested in
the CPU counter
since I was doing some benchmarking on queries. Can you please clarify if I
just blindly take a mod(CPU counter) for all tasks and add them up..would
they be fine..or should I take a patch from the fix and apply it on Tez 0.4
 to get it working until 0.5 is released?

Thanks
Suma


On Fri, Aug 22, 2014 at 2:55 AM, Gunther Hagleitner 
ghagleit...@hortonworks.com wrote:

 Hive logs the same counters regardless of whether you run with Tez or MR.
 We've removed some counters in hive 0.13 (HIVE-4518) - the specific one
 you're looking for might be in that list.

 Thanks,
 Gunther.


 On Thu, Aug 21, 2014 at 11:13 AM, Siddharth Seth ss...@apache.org wrote:

  I'll let Hive folks answer the questions about the Hive counters.
 
  In terms of the CPU counter - that was a bug in Tez-0.4.0, which has been
  fixed in 0.5.0.
 
  COMMITTED_HEAP_BYTES just represents the memory available to the JVM
  (Runtime.getRuntime().totalMemory()). This will only vary if the VM is
  started with a different Xms and Xmx option.
 
  In terms of Tez, the application logs are currently the best place. Hive
  may expose these in a more accessible manner though.
 
 
  On Wed, Aug 20, 2014 at 11:16 PM, Suma Shivaprasad 
  sumasai.shivapra...@gmail.com wrote:
 
   Hi,
  
   Needed info on where I can get detailed job counters for Hive on Tez.
 Am
   running this on a HDP cluster with Hive 0.13 and see only the following
  job
   counters through Hive Tez in Yarn application logs which I got through(
   yarn logs -applicationId ...) .
  
   a. Cannot see any ReduceOperator counters and also only
  DESERIALIZE_ERRORS
   is the only counter present in MapOperator
   b. The CPU_MILLISECONDS in some cases in -ve. Is CPU_MILLISECONDS
  accurate
   c. What does COMMITTED_HEAP_BYTES indicate?
   d. Is there any other place I should be checking the counters?
  
   [[File System Counters
   FILE: BYTES_READ=512,
   FILE: BYTES_WRITTEN=3079881,
   FILE: READ_OPS=0, FILE: LARGE_READ_OPS=0, FILE: WRITE_OPS=0, HDFS:
   BYTES_READ=8215153, HDFS: BYTES_WRITTEN=0, HDFS: READ_OPS=3, HDFS:
   LARGE_READ_OPS=0, HDFS: WRITE_OPS=0]
  
   [org.apache.tez.common.counters.TaskCounter SPILLED_RECORDS=222543,
   GC_TIME_MILLIS=172, *CPU_MILLISECONDS=-19700*,
   PHYSICAL_MEMORY_BYTES=667566080, VIRTUAL_MEMORY_BYTES=1887797248,
   COMMITTED_HEAP_BYTES=1011023872, INPUT_RECORDS_PROCESSED=222543,
   OUTPUT_RECORDS=222543,
   OUTPUT_BYTES=23543896,
   OUTPUT_BYTES_WITH_OVERHEAD=23989024, OUTPUT_BYTES_PHYSICAL=3079369,
   ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0,
   ADDITIONAL_SPILL_COUNT=0]
  
  
   [*org.apache.hadoop.hive.ql.exec.MapOperator*$Counter
   DESERIALIZE_ERRORS=0]]
  
   Thanks
   Suma
  
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-25 Thread Venki Korukanti (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109246#comment-14109246
 ] 

Venki Korukanti commented on HIVE-7799:
---

[~chengxiang li] Your plan sounds good. Lets log a JIRA to enable lazy 
computing and we will revisit in milestone 4.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, 
 HIVE-7799.3-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7870) Insert overwrite table query does not generate correct task plan

2014-08-25 Thread Na Yang (JIRA)

Na Yang created HIVE-7870:
-

 Summary: Insert overwrite table query does not generate correct 
task plan
 Key: HIVE-7870
 URL: https://issues.apache.org/jira/browse/HIVE-7870
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Na Yang


Insert overwrite table query does not generate correct task plan when 
hive.optimize.union.remove and hive.merge.sparkfiles properties are ON. 
{noformat}
set hive.optimize.union.remove=true
set hive.merge.sparkfiles=true

insert overwrite table outputTbl1
SELECT * FROM
(
select key, 1 as values from inputTbl1
union all
select * FROM (
  SELECT key, count(1) as values from inputTbl1 group by key
  UNION ALL
  SELECT key, 2 as values from inputTbl1
) a
)b;
select * from outputTbl1 order by key, values;
{noformat}

query result
{noformat}
1   1
1   2
2   1
2   2
3   1
3   2
7   1
7   2
8   2
8   2
8   2
{noformat}

expected result:
{noformat}
1   1
1   1
1   2
2   1
2   1
2   2
3   1
3   1
3   2
7   1
7   1
7   2
8   1
8   1
8   2
8   2
8   2
{noformat}

Move work is not working properly and some data are missing during move.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

1 2 >

1 - 100 of 155 matches

Mail list logo