[jira] [Updated] (HIVE-5690) Support subquery for single sourced multi query

2014-08-25 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5690:


Attachment: HIVE-5690.11.patch.txt

 Support subquery for single sourced multi query
 ---

 Key: HIVE-5690
 URL: https://issues.apache.org/jira/browse/HIVE-5690
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D13791.1.patch, HIVE-5690.10.patch.txt, 
 HIVE-5690.11.patch.txt, HIVE-5690.2.patch.txt, HIVE-5690.3.patch.txt, 
 HIVE-5690.4.patch.txt, HIVE-5690.5.patch.txt, HIVE-5690.6.patch.txt, 
 HIVE-5690.7.patch.txt, HIVE-5690.8.patch.txt, HIVE-5690.9.patch.txt


 Single sourced multi (insert) query is very useful for various ETL processes 
 but it does not allow subqueries included. For example, 
 {noformat}
 explain from src 
 insert overwrite table x1 select * from (select distinct key,value) b order 
 by key
 insert overwrite table x2 select * from (select distinct key,value) c order 
 by value;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7849) Support more generic predicate pushdown for hbase handler

2014-08-25 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108795#comment-14108795
 ] 

Navis commented on HIVE-7849:
-

vector_between_in need update. Others are not related to this.

 Support more generic predicate pushdown for hbase handler
 -

 Key: HIVE-7849
 URL: https://issues.apache.org/jira/browse/HIVE-7849
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-7849.1.patch.txt, HIVE-7849.2.patch.txt


 Currently, hbase handler supports AND conjugated filters only. This is the 
 first try to support OR, NOT, IN, BETWEEN predicates for hbase.
 Mostly based on the work done by [~teddy.choi].



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24986: HIVE-7553: decouple the auxiliary jar loading from hive server2 starting phase

2014-08-25 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24986/
---

(Updated Aug. 25, 2014, 6:45 a.m.)


Review request for hive.


Changes
---

(1) clean code (2) change property description


Bugs: HIVE-7553
https://issues.apache.org/jira/browse/HIVE-7553


Repository: hive-git


Description
---

HIVE-7553: decouple the auxiliary jar loading from hive server2 starting phase


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
7f4afd9d64aff18329e7850342855aade42e21f5 
  hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatUtil.java 
93a03adeab7ba3c3c91344955d303e4252005239 
  
hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatClient.java
 f25039dcf55b3b24bbf8dcba05855665a1c7f3b0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultFetchFormatter.java 
5924bcf1f55dc4c2dd06f312f929047b7df9de55 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
0c6a3d44ef1f796778768421dc02f8bf3ede6a8c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionTask.java 
bd45df1a401d1adb009e953d08205c7d5c2d5de2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ListSinkOperator.java 
dcc19f70644c561e17df8c8660ca62805465f1d6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 
76fee612a583cdc2c632d27932623521b735e768 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
eb2851b2c5fa52e0f555b3d8d1beea5d1ac3b225 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/HookUtils.java 
3f474f846c7af5f1f65f1c14f3ce51308f1279d4 
  ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughOutputFormat.java 
0962cadce0d515e046371d0a816f4efd70b8eef7 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java 
9051ba6d80e619ddbb6c27bb161e1e7a5cdb08a5 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 
edec1b734fb2f015902fd5e1c8afd5acdf4cb3bf 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 
2f13ac2e30195a25844a25e9ec8a7c42ed99b75c 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java
 b15aedc15d8cd0979aced6ff4c9e87606576f0a3 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 
d86df453cd7686627940ade62c0fd72f1636dd0b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java 
0a1c660b4bbd46d8410e646270b23c99a4de8b7e 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
b05d3b48ec014e4dc8026bb5f6615f62da0e2210 
  ql/src/java/org/apache/hadoop/hive/ql/plan/AggregationDesc.java 
17eeae1a3435fceb4b57325675c58b599e0973ea 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 
930acbc98e81f8d421cee1170659d8b7a427fe7d 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableDesc.java 
39f1793aaa5bed8a494883cac516ad314be951f4 
  ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessorFactory.java 
0d237f01a248a65b4092eb7202fe30eebf27be82 
  ql/src/java/org/apache/hadoop/hive/ql/processors/HiveCommand.java 
f5bc427a5834860441f21bfc72e175c6a1cf877f 
  ql/src/java/org/apache/hadoop/hive/ql/processors/RefreshProcessor.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 
9798cf3f537a27d1f828f8139790c62c5945c366 
  ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java 
e247184b7d95c85fd3e12432e7eb75eb1e2a0b68 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBridge.java 
959007a54b335bb0bdef0256f60e6cbc65798dc7 
  ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java 
ef0052f5763922d50986f127c416af5eaa6ae30d 
  ql/src/test/resources/SessionStateTest-V1.jar PRE-CREATION 
  ql/src/test/resources/SessionStateTest-V2.jar PRE-CREATION 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
bc0a02c1df7f9fdb848d5f078e94a663a579e571 

Diff: https://reviews.apache.org/r/24986/diff/


Testing
---


Thanks,

cheng xu



[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-25 Thread Sathish (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sathish updated HIVE-7850:
--

Attachment: HIVE-7850.1.patch

New patch file submitted by correcting indentations.

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.14.0, 0.13.1
Reporter: Sathish
Assignee: Sathish
  Labels: parquet, serde
 Fix For: 0.14.0

 Attachments: HIVE-7850.1.patch, HIVE-7850.patch


 * Created a parquet file from the Avro file which have 1 array data type and 
 rest are primitive types. Avro Schema of the array data type. Eg: 
 {code}
 { name : action, type : [ { type : array, items : string }, 
 null ] }
 {code}
 * Created External Hive table with the Array type as below, 
 {code}
 create external table paraArray (action Array) partitioned by (partitionid 
 int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
 inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
 alter table paraArray add partition(partitionid=1) location '/testPara';
 {code}
 * Run the following query(select action from paraArray limit 10) and the Map 
 reduce jobs are failing with the following exception.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row [Error getting row data with exception 
 java.lang.ClassCastException: 
 parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
 org.apache.hadoop.io.ArrayWritable
 at 
 parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)
 ]
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 {code}
 This issue has long back posted on Parquet issues list and Since this is 
 related to Parquet Hive serde, I have created the Hive issue here, The 
 details and history of this information are as shown in the link here 
 https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-25 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108843#comment-14108843
 ] 

Chengxiang Li commented on HIVE-7799:
-

Depends on the implementation of {{ResultIterator.hasNext()}}, it is designed 
to be a lazy iterator as it only try to call {{processNextRecord()}} while 
RowContainer is empty, but RowContainer does not support add more rows after 
already read as mentioned in previous comments. Here is what happens while 
different queries is executed:
# For Map only job, it write map output into file directly, no need Collector 
in this case.
# For Map Reduce job with GroupByOperator, 
{{HiveBaseFunctionResultList.collect()}} is triggered by 
{{closeRecordProcessor()}}, which is beyond the lazy-computing logic, so the 
ResultIterator does not do lazy computing in this case.
# For Map Reduce job without GroupByOperator(like cluster by queries), 
ResultIterator do lazy computing, and it clear RowContainer each time befor 
call {{processNextRecord()}}. While read/write HiveBaseFunctionResultList in 
the same thread, access progress of RowContainer is like 
.clear()-addRow()-first()-clear()-addRow()-first().. so it won't 
violate RowContainer's access rule. But with mutli threads to read/write 
HiveBaseFunctionResultList, as the ScriptOperator does which venki mentioned 
above, it would definitely hit this JIRA issue.

In my opinion, there are 2 solutions:
# remove ResultIterator lazy computing feature as patch1 does.
# implement a RowConatiner-like class, which support current RowContainer 
features. it also need to be thread-safe, and support add row after {{first()}} 
is already called. 

The second solution is quite complex, it may introduce performance degrade 
after support thread-safe access and write-after-read, compare with the 
performance upgrade of lazy-computing support, it's hardly to say whether it's 
worthy or not now. So I suggest we take the first solution to fix this issue, 
and left the possible optimization to milestone 4.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, 
 HIVE-7799.3-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-25 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7799:


Attachment: HIVE-7799.3-spark.patch

reattach the first patch.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, 
 HIVE-7799.3-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5690) Support subquery for single sourced multi query

2014-08-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108860#comment-14108860
 ] 

Hive QA commented on HIVE-5690:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12664102/HIVE-5690.11.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6119 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/484/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/484/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-484/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12664102

 Support subquery for single sourced multi query
 ---

 Key: HIVE-5690
 URL: https://issues.apache.org/jira/browse/HIVE-5690
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D13791.1.patch, HIVE-5690.10.patch.txt, 
 HIVE-5690.11.patch.txt, HIVE-5690.2.patch.txt, HIVE-5690.3.patch.txt, 
 HIVE-5690.4.patch.txt, HIVE-5690.5.patch.txt, HIVE-5690.6.patch.txt, 
 HIVE-5690.7.patch.txt, HIVE-5690.8.patch.txt, HIVE-5690.9.patch.txt


 Single sourced multi (insert) query is very useful for various ETL processes 
 but it does not allow subqueries included. For example, 
 {noformat}
 explain from src 
 insert overwrite table x1 select * from (select distinct key,value) b order 
 by key
 insert overwrite table x2 select * from (select distinct key,value) c order 
 by value;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7826) Dynamic partition pruning on Tez

2014-08-25 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7826:
-

Attachment: HIVE-7826.3.patch

.3 has various fixes. Should be good to go now.

 Dynamic partition pruning on Tez
 

 Key: HIVE-7826
 URL: https://issues.apache.org/jira/browse/HIVE-7826
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
  Labels: tez
 Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch


 It's natural in a star schema to map one or more dimensions to partition 
 columns. Time or location are likely candidates. 
 It can also useful to be to compute the partitions one would like to scan via 
 a subquery (where p in select ... from ...).
 The resulting joins in hive require a full table scan of the large table 
 though, because partition pruning takes place before the corresponding values 
 are known.
 On Tez it's relatively straight forward to send the values needed to prune to 
 the application master - where splits are generated and tasks are submitted. 
 Using these values we can strip out any unneeded partitions dynamically, 
 while the query is running.
 The approach is straight forward:
 - Insert synthetic conditions for each join representing x in (keys of other 
 side in join)
 - This conditions will be pushed as far down as possible
 - If the condition hits a table scan and the column involved is a partition 
 column:
- Setup Operator to send key events to AM
 - else:
- Remove synthetic predicate



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7826) Dynamic partition pruning on Tez

2014-08-25 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108872#comment-14108872
 ] 

Gunther Hagleitner commented on HIVE-7826:
--

Review board link: https://reviews.apache.org/r/25019/

 Dynamic partition pruning on Tez
 

 Key: HIVE-7826
 URL: https://issues.apache.org/jira/browse/HIVE-7826
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
  Labels: tez
 Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch


 It's natural in a star schema to map one or more dimensions to partition 
 columns. Time or location are likely candidates. 
 It can also useful to be to compute the partitions one would like to scan via 
 a subquery (where p in select ... from ...).
 The resulting joins in hive require a full table scan of the large table 
 though, because partition pruning takes place before the corresponding values 
 are known.
 On Tez it's relatively straight forward to send the values needed to prune to 
 the application master - where splits are generated and tasks are submitted. 
 Using these values we can strip out any unneeded partitions dynamically, 
 while the query is running.
 The approach is straight forward:
 - Insert synthetic conditions for each join representing x in (keys of other 
 side in join)
 - This conditions will be pushed as far down as possible
 - If the condition hits a table scan and the column involved is a partition 
 column:
- Setup Operator to send key events to AM
 - else:
- Remove synthetic predicate



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7733) Ambiguous column reference error on query

2014-08-25 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108871#comment-14108871
 ] 

Navis commented on HIVE-7733:
-

Just a blind shot. I'll look into this.

 Ambiguous column reference error on query
 -

 Key: HIVE-7733
 URL: https://issues.apache.org/jira/browse/HIVE-7733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Jason Dere
 Attachments: HIVE-7733.1.patch.txt


 {noformat}
 CREATE TABLE agg1 
   ( 
  col0 INT, 
  col1 STRING, 
  col2 DOUBLE 
   ); 
 explain SELECT single_use_subq11.a1 AS a1, 
single_use_subq11.a2 AS a2 
 FROM   (SELECT Sum(agg1.col2) AS a1 
 FROM   agg1 
 GROUP  BY agg1.col0) single_use_subq12 
JOIN (SELECT alias.a2 AS a0, 
 alias.a1 AS a1, 
 alias.a1 AS a2 
  FROM   (SELECT agg1.col1 AS a0, 
 '42'  AS a1, 
 agg1.col0 AS a2 
  FROM   agg1 
  UNION ALL 
  SELECT agg1.col1 AS a0, 
 '41'  AS a1, 
 agg1.col0 AS a2 
  FROM   agg1) alias 
  GROUP  BY alias.a2, 
alias.a1) single_use_subq11 
  ON ( single_use_subq11.a0 = single_use_subq11.a0 );
 {noformat}
 Gets the following error:
 FAILED: SemanticException [Error 10007]: Ambiguous column reference a2
 Looks like this query had been working in 0.12 but starting failing with this 
 error in 0.13



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Hive unwanted directories creation issue

2014-08-25 Thread Valluri, Sathish
We are creating external table in Hive and if the location path is not present 
in the HDFS say /testdata(as shown below), Hive is creating the '/testdata' 
dummy folder.

Is there any option in hive or any way to stop creating dummy directories if 
the location folder not exists.

So we end up  creating many unwanted dummy directories if the data not present 
on the HDFS for many partitions we add after creating table.



CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES 
('avro.schema.literal'='{ schema json literal') STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 
'/testdata/';



Regards

Sathish Valluri





[jira] [Commented] (HIVE-5799) session/operation timeout for hiveserver2

2014-08-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108910#comment-14108910
 ] 

Hive QA commented on HIVE-5799:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663284/HIVE-5799.12.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6119 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/486/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/486/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-486/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663284

 session/operation timeout for hiveserver2
 -

 Key: HIVE-5799
 URL: https://issues.apache.org/jira/browse/HIVE-5799
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5799.1.patch.txt, HIVE-5799.10.patch.txt, 
 HIVE-5799.11.patch.txt, HIVE-5799.12.patch.txt, HIVE-5799.12.patch.txt, 
 HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt, 
 HIVE-5799.5.patch.txt, HIVE-5799.6.patch.txt, HIVE-5799.7.patch.txt, 
 HIVE-5799.8.patch.txt, HIVE-5799.9.patch.txt


 Need some timeout facility for preventing resource leakages from instable  or 
 bad clients.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108912#comment-14108912
 ] 

Hive QA commented on HIVE-7850:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12664109/HIVE-7850.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/487/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/487/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-487/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-487/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'common/src/java/org/apache/hadoop/hive/conf/HiveConf.java'
Reverted 'common/src/java/org/apache/hadoop/hive/conf/Validator.java'
Reverted 'service/src/java/org/apache/hive/service/cli/OperationState.java'
Reverted 'service/src/java/org/apache/hive/service/cli/session/HiveSession.java'
Reverted 
'service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java'
Reverted 
'service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java'
Reverted 
'service/src/java/org/apache/hive/service/cli/session/SessionManager.java'
Reverted 'service/src/java/org/apache/hive/service/cli/operation/Operation.java'
Reverted 
'service/src/java/org/apache/hive/service/cli/operation/OperationManager.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-unit/target 
itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java
 itests/custom-serde/target itests/util/target hcatalog/target 
hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/webhcat/svr/target 
hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target 
accumulo-handler/target hwi/target common/target common/src/gen service/target 
contrib/target serde/target beeline/target odbc/target cli/target 
ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1620279.

At revision 1620279.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12664109

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
 

This looks like an Hive issue to me, Can anyone suggest other ways to overcome this.

2014-08-25 Thread valluri sathish
We are creating external
table in Hive and if the location path is not present in the HDFS say
/testdata(as shown below), Hive is creating the ‘/testdata’ dummy folder.
Is there any option in hive
or any way to stop creating dummy directories if the location folder not
exists.
So we end up  creating
many unwanted dummy directories if the data not present on the HDFS for many
partitions we add after creating table.
 
CREATE EXTERNAL TABLE
testTable ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH
SERDEPROPERTIES ('avro.schema.literal'='{ schema json literal') STORED
AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' 
LOCATION '/testdata/';
 
Regards
Sathish Valluri

Re: Hive on Tez Counters

2014-08-25 Thread Suma Shivaprasad
Hi Siddharth/Gunther,

Thanks for replying to my queries. I was particularly interested in
the CPU counter
since I was doing some benchmarking on queries. Can you please clarify if I
just blindly take a mod(CPU counter) for all tasks and add them up..would
they be fine..or should I take a patch from the fix and apply it on Tez 0.4
 to get it working until 0.5 is released?

Thanks
Suma


On Fri, Aug 22, 2014 at 2:55 AM, Gunther Hagleitner 
ghagleit...@hortonworks.com wrote:

 Hive logs the same counters regardless of whether you run with Tez or MR.
 We've removed some counters in hive 0.13 (HIVE-4518) - the specific one
 you're looking for might be in that list.

 Thanks,
 Gunther.


 On Thu, Aug 21, 2014 at 11:13 AM, Siddharth Seth ss...@apache.org wrote:

  I'll let Hive folks answer the questions about the Hive counters.
 
  In terms of the CPU counter - that was a bug in Tez-0.4.0, which has been
  fixed in 0.5.0.
 
  COMMITTED_HEAP_BYTES just represents the memory available to the JVM
  (Runtime.getRuntime().totalMemory()). This will only vary if the VM is
  started with a different Xms and Xmx option.
 
  In terms of Tez, the application logs are currently the best place. Hive
  may expose these in a more accessible manner though.
 
 
  On Wed, Aug 20, 2014 at 11:16 PM, Suma Shivaprasad 
  sumasai.shivapra...@gmail.com wrote:
 
   Hi,
  
   Needed info on where I can get detailed job counters for Hive on Tez.
 Am
   running this on a HDP cluster with Hive 0.13 and see only the following
  job
   counters through Hive Tez in Yarn application logs which I got through(
   yarn logs -applicationId ...) .
  
   a. Cannot see any ReduceOperator counters and also only
  DESERIALIZE_ERRORS
   is the only counter present in MapOperator
   b. The CPU_MILLISECONDS in some cases in -ve. Is CPU_MILLISECONDS
  accurate
   c. What does COMMITTED_HEAP_BYTES indicate?
   d. Is there any other place I should be checking the counters?
  
   [[File System Counters
   FILE: BYTES_READ=512,
   FILE: BYTES_WRITTEN=3079881,
   FILE: READ_OPS=0, FILE: LARGE_READ_OPS=0, FILE: WRITE_OPS=0, HDFS:
   BYTES_READ=8215153, HDFS: BYTES_WRITTEN=0, HDFS: READ_OPS=3, HDFS:
   LARGE_READ_OPS=0, HDFS: WRITE_OPS=0]
  
   [org.apache.tez.common.counters.TaskCounter SPILLED_RECORDS=222543,
   GC_TIME_MILLIS=172, *CPU_MILLISECONDS=-19700*,
   PHYSICAL_MEMORY_BYTES=667566080, VIRTUAL_MEMORY_BYTES=1887797248,
   COMMITTED_HEAP_BYTES=1011023872, INPUT_RECORDS_PROCESSED=222543,
   OUTPUT_RECORDS=222543,
   OUTPUT_BYTES=23543896,
   OUTPUT_BYTES_WITH_OVERHEAD=23989024, OUTPUT_BYTES_PHYSICAL=3079369,
   ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0,
   ADDITIONAL_SPILL_COUNT=0]
  
  
   [*org.apache.hadoop.hive.ql.exec.MapOperator*$Counter
   DESERIALIZE_ERRORS=0]]
  
   Thanks
   Suma
  
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-25 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109246#comment-14109246
 ] 

Venki Korukanti commented on HIVE-7799:
---

[~chengxiang li] Your plan sounds good. Lets log a JIRA to enable lazy 
computing and we will revisit in milestone 4.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, 
 HIVE-7799.3-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7870) Insert overwrite table query does not generate correct task plan

2014-08-25 Thread Na Yang (JIRA)
Na Yang created HIVE-7870:
-

 Summary: Insert overwrite table query does not generate correct 
task plan
 Key: HIVE-7870
 URL: https://issues.apache.org/jira/browse/HIVE-7870
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Na Yang


Insert overwrite table query does not generate correct task plan when 
hive.optimize.union.remove and hive.merge.sparkfiles properties are ON. 
{noformat}
set hive.optimize.union.remove=true
set hive.merge.sparkfiles=true

insert overwrite table outputTbl1
SELECT * FROM
(
select key, 1 as values from inputTbl1
union all
select * FROM (
  SELECT key, count(1) as values from inputTbl1 group by key
  UNION ALL
  SELECT key, 2 as values from inputTbl1
) a
)b;
select * from outputTbl1 order by key, values;
{noformat}

query result
{noformat}
1   1
1   2
2   1
2   2
3   1
3   2
7   1
7   2
8   2
8   2
8   2
{noformat}

expected result:
{noformat}
1   1
1   1
1   2
2   1
2   1
2   2
3   1
3   1
3   2
7   1
7   1
7   2
8   1
8   1
8   2
8   2
8   2
{noformat}

Move work is not working properly and some data are missing during move.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7799:
---

Status: Patch Available  (was: Open)

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, 
 HIVE-7799.3-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-7869) Long running tests (1) [Spark Branch]

2014-08-25 Thread Suhas Satish (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suhas Satish reassigned HIVE-7869:
--

Assignee: Suhas Satish

 Long running tests (1) [Spark Branch]
 -

 Key: HIVE-7869
 URL: https://issues.apache.org/jira/browse/HIVE-7869
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Suhas Satish

 I have noticed when running the full test suite locally that the test JVM 
 eventually crashes. We should do some testing (not part of the unit tests) 
 which starts up a HS2 and runs queries on it continuously for 24 hours or so.
 In this JIRA let's create a stand alone java program which connects to a HS2 
 over JDBC, creates a bunch of tables (say 100) and then runs queries until 
 the JDBC client is killed. This will allow us to run long running tests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7810) Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true [Spark Branch]

2014-08-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7810:
---

Issue Type: Sub-task  (was: Task)
Parent: HIVE-7292

 Insert overwrite table query has strange behavior when set 
 hive.optimize.union.remove=true [Spark Branch]
 -

 Key: HIVE-7810
 URL: https://issues.apache.org/jira/browse/HIVE-7810
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Na Yang
Assignee: Na Yang
 Attachments: HIVE-7810.1-spark.patch


 Insert overwrite table query has strange behavior when 
 set hive.optimize.union.remove=true
 set hive.mapred.supports.subdirectories=true;
 set hive.merge.mapfiles=true;
 set hive.merge.mapredfiles=true;
 We expect the following two sets of queries return the same set of data 
 result, but they do not. 
 1)
 {noformat}
 insert overwrite table outputTbl1
 SELECT * FROM
 (
 select key, 1 as values from inputTbl1
 union all
 select * FROM (
   SELECT key, count(1) as values from inputTbl1 group by key
   UNION ALL
   SELECT key, 2 as values from inputTbl1
 ) a
 )b;
 select * from outputTbl1 order by key, values;
 {noformat}
 Below is the query result:
 {noformat}
 1 1
 1 2
 2 1
 2 2
 3 1
 3 2
 7 1
 7 2
 8 2
 8 2
 8 2
 {noformat}
 2) 
 {noformat}
 SELECT * FROM
 (
 select key, 1 as values from inputTbl1
 union all
 select * FROM (
   SELECT key, count(1) as values from inputTbl1 group by key
   UNION ALL
   SELECT key, 2 as values from inputTbl1
 ) a
 )b order by key, values;
 {noformat}
 Below is the query result:
 {noformat}
 1 1
 1 1
 1 2
 2 1
 2 1
 2 2
 3 1
 3 1
 3 2
 7 1
 7 1
 7 2
 8 1
 8 1
 8 2
 8 2
 8 2
 {noformat}
 Some data is missing in the first set of query result. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-25 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109315#comment-14109315
 ] 

Ryan Blue commented on HIVE-7850:
-

Looking at just the changes to the schema conversion, I'm not sure why the 
change to the list structure was done. Previously, lists were converted to:

{code}
// arraystring name
optional group name (LIST) {
  repeated group bag {
optional string array_element;
  }
}
{code}

This allowed the list itself to be null and allowed null elements. This patch 
changes the conversion to:

{code}
// arraystring name
optional group name (LIST) {
  repeated string array_element;
}
{code}

This requires that the elements are non-null. Was this on purpose? The first 
one looks more correct to me, but the second would be if nulls aren't allowed 
in Hive lists. In addition, the HiveSchemaConverter#listWrapper method and 
ParquetHiveSerDe.ARRAY static field are no longer used but not removed.

The other change to schema conversion tests the Repetition and calls 
{{Types.required}} or {{Types.optional}}. This should instead call 
{{Types.primitive(type, repetition)}} to pass the repetition to the {{Types}} 
API. That way, {{Repetition.REPEATED}} is supported also, which is a bug in the 
current patch.

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.14.0, 0.13.1
Reporter: Sathish
Assignee: Sathish
  Labels: parquet, serde
 Fix For: 0.14.0

 Attachments: HIVE-7850.1.patch, HIVE-7850.patch


 * Created a parquet file from the Avro file which have 1 array data type and 
 rest are primitive types. Avro Schema of the array data type. Eg: 
 {code}
 { name : action, type : [ { type : array, items : string }, 
 null ] }
 {code}
 * Created External Hive table with the Array type as below, 
 {code}
 create external table paraArray (action Array) partitioned by (partitionid 
 int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
 inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
 alter table paraArray add partition(partitionid=1) location '/testPara';
 {code}
 * Run the following query(select action from paraArray limit 10) and the Map 
 reduce jobs are failing with the following exception.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row [Error getting row data with exception 
 java.lang.ClassCastException: 
 parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
 org.apache.hadoop.io.ArrayWritable
 at 
 parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)
 ]
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 {code}
 This issue has long back posted on Parquet issues list and Since this is 
 related to Parquet Hive serde, I have created the Hive issue here, The 
 details and history of this information are as shown in the link here 
 https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-7724) CBO: support Subquery predicates

2014-08-25 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-7724.


Resolution: Fixed

Committed to cbo branch. Thanks, Harish!

 CBO: support Subquery predicates
 

 Key: HIVE-7724
 URL: https://issues.apache.org/jira/browse/HIVE-7724
 Project: Hive
  Issue Type: Sub-task
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7724.1.patch, HIVE-7724.rewriteInHive.prelim.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7810) Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true [Spark Branch]

2014-08-25 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109323#comment-14109323
 ] 

Brock Noland commented on HIVE-7810:


[~csun] can you review this patch since it appears you have some knowledge here?

 Insert overwrite table query has strange behavior when set 
 hive.optimize.union.remove=true [Spark Branch]
 -

 Key: HIVE-7810
 URL: https://issues.apache.org/jira/browse/HIVE-7810
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Na Yang
Assignee: Na Yang
 Attachments: HIVE-7810.1-spark.patch


 Insert overwrite table query has strange behavior when 
 set hive.optimize.union.remove=true
 set hive.mapred.supports.subdirectories=true;
 set hive.merge.mapfiles=true;
 set hive.merge.mapredfiles=true;
 We expect the following two sets of queries return the same set of data 
 result, but they do not. 
 1)
 {noformat}
 insert overwrite table outputTbl1
 SELECT * FROM
 (
 select key, 1 as values from inputTbl1
 union all
 select * FROM (
   SELECT key, count(1) as values from inputTbl1 group by key
   UNION ALL
   SELECT key, 2 as values from inputTbl1
 ) a
 )b;
 select * from outputTbl1 order by key, values;
 {noformat}
 Below is the query result:
 {noformat}
 1 1
 1 2
 2 1
 2 2
 3 1
 3 2
 7 1
 7 2
 8 2
 8 2
 8 2
 {noformat}
 2) 
 {noformat}
 SELECT * FROM
 (
 select key, 1 as values from inputTbl1
 union all
 select * FROM (
   SELECT key, count(1) as values from inputTbl1 group by key
   UNION ALL
   SELECT key, 2 as values from inputTbl1
 ) a
 )b order by key, values;
 {noformat}
 Below is the query result:
 {noformat}
 1 1
 1 1
 1 2
 2 1
 2 1
 2 2
 3 1
 3 1
 3 2
 7 1
 7 1
 7 2
 8 1
 8 1
 8 2
 8 2
 8 2
 {noformat}
 Some data is missing in the first set of query result. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7724) CBO: support Subquery predicates

2014-08-25 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7724:
---

Component/s: CBO

 CBO: support Subquery predicates
 

 Key: HIVE-7724
 URL: https://issues.apache.org/jira/browse/HIVE-7724
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7724.1.patch, HIVE-7724.rewriteInHive.prelim.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7850) Hive Query failed if the data type is arraystring with parquet files

2014-08-25 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109347#comment-14109347
 ] 

Ryan Blue commented on HIVE-7850:
-

It looks like {{ArrayWritableGroupConverter}} is only used for maps and arrays, 
but the array handling was added mostly in this patch. Given that most of the 
methods check {{isMap}} and have completely different implementations for map 
and array, it makes more sense to separate this into two classes, 
{{ArrayGroupConverter}} and {{MapGroupConverter}}. Then {{HiveSchemaConverter}} 
should choose the correct one based on the {{OriginalType}} annotation. If 
there is no original type annotation, but the type is repeated, it should use 
an {{ArrayGroupConverter}}.

 Hive Query failed if the data type is arraystring with parquet files
 --

 Key: HIVE-7850
 URL: https://issues.apache.org/jira/browse/HIVE-7850
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.14.0, 0.13.1
Reporter: Sathish
Assignee: Sathish
  Labels: parquet, serde
 Fix For: 0.14.0

 Attachments: HIVE-7850.1.patch, HIVE-7850.patch


 * Created a parquet file from the Avro file which have 1 array data type and 
 rest are primitive types. Avro Schema of the array data type. Eg: 
 {code}
 { name : action, type : [ { type : array, items : string }, 
 null ] }
 {code}
 * Created External Hive table with the Array type as below, 
 {code}
 create external table paraArray (action Array) partitioned by (partitionid 
 int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
 inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
 alter table paraArray add partition(partitionid=1) location '/testPara';
 {code}
 * Run the following query(select action from paraArray limit 10) and the Map 
 reduce jobs are failing with the following exception.
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row [Error getting row data with exception 
 java.lang.ClassCastException: 
 parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
 org.apache.hadoop.io.ArrayWritable
 at 
 parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
 at 
 org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
 at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)
 ]
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
 at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
 ... 8 more
 {code}
 This issue has long back posted on Parquet issues list and Since this is 
 related to Parquet Hive serde, I have created the Hive issue here, The 
 details and history of this information are as shown in the link here 
 https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7810) Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true [Spark Branch]

2014-08-25 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109360#comment-14109360
 ] 

Chao commented on HIVE-7810:


[~brocknoland] OK, I'll take a look.

 Insert overwrite table query has strange behavior when set 
 hive.optimize.union.remove=true [Spark Branch]
 -

 Key: HIVE-7810
 URL: https://issues.apache.org/jira/browse/HIVE-7810
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Na Yang
Assignee: Na Yang
 Attachments: HIVE-7810.1-spark.patch


 Insert overwrite table query has strange behavior when 
 set hive.optimize.union.remove=true
 set hive.mapred.supports.subdirectories=true;
 set hive.merge.mapfiles=true;
 set hive.merge.mapredfiles=true;
 We expect the following two sets of queries return the same set of data 
 result, but they do not. 
 1)
 {noformat}
 insert overwrite table outputTbl1
 SELECT * FROM
 (
 select key, 1 as values from inputTbl1
 union all
 select * FROM (
   SELECT key, count(1) as values from inputTbl1 group by key
   UNION ALL
   SELECT key, 2 as values from inputTbl1
 ) a
 )b;
 select * from outputTbl1 order by key, values;
 {noformat}
 Below is the query result:
 {noformat}
 1 1
 1 2
 2 1
 2 2
 3 1
 3 2
 7 1
 7 2
 8 2
 8 2
 8 2
 {noformat}
 2) 
 {noformat}
 SELECT * FROM
 (
 select key, 1 as values from inputTbl1
 union all
 select * FROM (
   SELECT key, count(1) as values from inputTbl1 group by key
   UNION ALL
   SELECT key, 2 as values from inputTbl1
 ) a
 )b order by key, values;
 {noformat}
 Below is the query result:
 {noformat}
 1 1
 1 1
 1 2
 2 1
 2 1
 2 2
 3 1
 3 1
 3 2
 7 1
 7 1
 7 2
 8 1
 8 1
 8 2
 8 2
 8 2
 {noformat}
 Some data is missing in the first set of query result. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup

2014-08-25 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6847:
---

Status: Open  (was: Patch Available)

 Improve / fix bugs in Hive scratch dir setup
 

 Key: HIVE-6847
 URL: https://issues.apache.org/jira/browse/HIVE-6847
 Project: Hive
  Issue Type: Bug
  Components: CLI, HiveServer2
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-6847.1.patch, HIVE-6847.2.patch, HIVE-6847.3.patch, 
 HIVE-6847.4.patch


 Currently, the hive server creates scratch directory and changes permission 
 to 777 however, this is not great with respect to security. We need to create 
 user specific scratch directories instead. Also refer to HIVE-6782 1st 
 iteration of the patch for approach.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup

2014-08-25 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6847:
---

Attachment: HIVE-6847.5.patch

 Improve / fix bugs in Hive scratch dir setup
 

 Key: HIVE-6847
 URL: https://issues.apache.org/jira/browse/HIVE-6847
 Project: Hive
  Issue Type: Bug
  Components: CLI, HiveServer2
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-6847.1.patch, HIVE-6847.2.patch, HIVE-6847.3.patch, 
 HIVE-6847.4.patch, HIVE-6847.5.patch


 Currently, the hive server creates scratch directory and changes permission 
 to 777 however, this is not great with respect to security. We need to create 
 user specific scratch directories instead. Also refer to HIVE-6782 1st 
 iteration of the patch for approach.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7871) WebHCat: Hive job with SQL server as MetastoreDB fails when Unicode characters are present in curl command

2014-08-25 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-7871:
---

 Summary: WebHCat: Hive job with SQL server as MetastoreDB fails  
when Unicode characters are present in curl command
 Key: HIVE-7871
 URL: https://issues.apache.org/jira/browse/HIVE-7871
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


Please follow the steps below to repro.
1. Create a SQL Server. Create Username, Password, DB with Unicode characters 
in their name.
2. Create a cluster and run the below command against its templeton endpoint
curl -i -u username:password \
-d define=javax.jdo.option.ConnectionUserName=dbusername@SQLserver \
-d define=hive.metastore.uris= \
-d 
define=javax.jdo.option.ConnectionURL=jdbc:sqlserver://SQLserver.database.windows.net;database=dbname;
 encrypt=true;trustServerCertificate=true;create=false \
-d define=javax.jdo.option.ConnectionPassword=dbpassword \
-d statusdir=/hivestatus \
-d user.name=admin \
-d enablelog=false \
-d execute=show tables; \
-s https://localhost:30111/templeton/v1/hive;
The following error message is received.
javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the 
given database. JDBC url = 
jdbc:sqlserver://SQLserver.database.windows.net;database=dbname; 
encrypt=true;trustServerCertificate=true;create=false, username = 
dbusername@SQLserver. Terminating connection pool (set lazyInit to true if 
you expect to start your database after your app). Original Exception: --
com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user 
'dbusername'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup

2014-08-25 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-6847:
---

Status: Patch Available  (was: Open)

 Improve / fix bugs in Hive scratch dir setup
 

 Key: HIVE-6847
 URL: https://issues.apache.org/jira/browse/HIVE-6847
 Project: Hive
  Issue Type: Bug
  Components: CLI, HiveServer2
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-6847.1.patch, HIVE-6847.2.patch, HIVE-6847.3.patch, 
 HIVE-6847.4.patch, HIVE-6847.5.patch


 Currently, the hive server creates scratch directory and changes permission 
 to 777 however, this is not great with respect to security. We need to create 
 user specific scratch directories instead. Also refer to HIVE-6782 1st 
 iteration of the patch for approach.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109398#comment-14109398
 ] 

Hive QA commented on HIVE-7799:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12664112/HIVE-7799.3-spark.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6253 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/92/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/92/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-92/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12664112

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, 
 HIVE-7799.3-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-7721) CBO: support case statement translation to optiq

2014-08-25 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran reassigned HIVE-7721:


Assignee: Laljo John Pullokkaran

 CBO: support case statement translation to optiq
 

 Key: HIVE-7721
 URL: https://issues.apache.org/jira/browse/HIVE-7721
 Project: Hive
  Issue Type: Sub-task
Reporter: Harish Butani
Assignee: Laljo John Pullokkaran

 Following query:
 {code}
 explain select case when key  '104' then null else key end as key from src
 {code}
 fails with:
 {quote}
 java.lang.RuntimeException: java.lang.RuntimeException: 
 java.lang.RuntimeException: Unsupported Expression
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer$OptiqBasedPlanner.getOptimizedAST(SemanticAnalyzer.java:11808)
 
 aused by: java.lang.RuntimeException: Unsupported Expression
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.translator.RexNodeConverter.convert(RexNodeConverter.java:91)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.translator.RexNodeConverter.convert(RexNodeConverter.java:124)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7720) CBO: rank translation to Optiq RelNode tree failing

2014-08-25 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7720:
-

Assignee: Laljo John Pullokkaran

 CBO: rank translation to Optiq RelNode tree failing
 ---

 Key: HIVE-7720
 URL: https://issues.apache.org/jira/browse/HIVE-7720
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Harish Butani
Assignee: Laljo John Pullokkaran

 Following query:
 {code}
 explain select p_name
 from (select p_mfgr, p_name, p_size, rank() over(partition by p_mfgr order by 
 p_size) as r from part) a
 where r = 2;
 {code}
 fails with 
 {quote}
 org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more 
 arguments are expected.
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRank.getEvaluator(GenericUDAFRank.java:61)
   at 
 org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver.getEvaluator(AbstractGenericUDAFResolver.java:47)
   at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:1110)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:3506)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer$OptiqBasedPlanner.getHiveAggInfo(SemanticAnalyzer.java:12496)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer$OptiqBasedPlanner.genWindowingProj(SemanticAnalyzer.java:12858)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7721) CBO: support case statement translation to optiq

2014-08-25 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109405#comment-14109405
 ] 

Laljo John Pullokkaran commented on HIVE-7721:
--

Fixed by HIVE-7841

 CBO: support case statement translation to optiq
 

 Key: HIVE-7721
 URL: https://issues.apache.org/jira/browse/HIVE-7721
 Project: Hive
  Issue Type: Sub-task
Reporter: Harish Butani
Assignee: Laljo John Pullokkaran

 Following query:
 {code}
 explain select case when key  '104' then null else key end as key from src
 {code}
 fails with:
 {quote}
 java.lang.RuntimeException: java.lang.RuntimeException: 
 java.lang.RuntimeException: Unsupported Expression
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer$OptiqBasedPlanner.getOptimizedAST(SemanticAnalyzer.java:11808)
 
 aused by: java.lang.RuntimeException: Unsupported Expression
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.translator.RexNodeConverter.convert(RexNodeConverter.java:91)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.translator.RexNodeConverter.convert(RexNodeConverter.java:124)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-7721) CBO: support case statement translation to optiq

2014-08-25 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran resolved HIVE-7721.
--

Resolution: Fixed

 CBO: support case statement translation to optiq
 

 Key: HIVE-7721
 URL: https://issues.apache.org/jira/browse/HIVE-7721
 Project: Hive
  Issue Type: Sub-task
Reporter: Harish Butani
Assignee: Laljo John Pullokkaran

 Following query:
 {code}
 explain select case when key  '104' then null else key end as key from src
 {code}
 fails with:
 {quote}
 java.lang.RuntimeException: java.lang.RuntimeException: 
 java.lang.RuntimeException: Unsupported Expression
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer$OptiqBasedPlanner.getOptimizedAST(SemanticAnalyzer.java:11808)
 
 aused by: java.lang.RuntimeException: Unsupported Expression
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.translator.RexNodeConverter.convert(RexNodeConverter.java:91)
   at 
 org.apache.hadoop.hive.ql.optimizer.optiq.translator.RexNodeConverter.convert(RexNodeConverter.java:124)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7860) [CBO] Query on partitioned table which filter out all partitions fails

2014-08-25 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7860:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to branch

 [CBO] Query on partitioned table which filter out all partitions fails
 --

 Key: HIVE-7860
 URL: https://issues.apache.org/jira/browse/HIVE-7860
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: h-7860.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7720) CBO: rank translation to Optiq RelNode tree failing

2014-08-25 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109411#comment-14109411
 ] 

Laljo John Pullokkaran commented on HIVE-7720:
--

Support all Windowing UDAF.
row_number, rank, dense_rank, percent_rank, cume_dist, first_value, last_value, 
lead, lag.

 CBO: rank translation to Optiq RelNode tree failing
 ---

 Key: HIVE-7720
 URL: https://issues.apache.org/jira/browse/HIVE-7720
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Harish Butani
Assignee: Laljo John Pullokkaran

 Following query:
 {code}
 explain select p_name
 from (select p_mfgr, p_name, p_size, rank() over(partition by p_mfgr order by 
 p_size) as r from part) a
 where r = 2;
 {code}
 fails with 
 {quote}
 org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more 
 arguments are expected.
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRank.getEvaluator(GenericUDAFRank.java:61)
   at 
 org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver.getEvaluator(AbstractGenericUDAFResolver.java:47)
   at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:1110)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:3506)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer$OptiqBasedPlanner.getHiveAggInfo(SemanticAnalyzer.java:12496)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer$OptiqBasedPlanner.genWindowingProj(SemanticAnalyzer.java:12858)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7871) WebHCat: Hive job with SQL server as MetastoreDB fails when Unicode characters are present in curl command

2014-08-25 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7871:


Attachment: HIVE-7871.1.patch

Similar changes need to be made for the remaining end points  in Server.java

 WebHCat: Hive job with SQL server as MetastoreDB fails  when Unicode 
 characters are present in curl command
 ---

 Key: HIVE-7871
 URL: https://issues.apache.org/jira/browse/HIVE-7871
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-7871.1.patch


 Please follow the steps below to repro.
 1. Create a SQL Server. Create Username, Password, DB with Unicode characters 
 in their name.
 2. Create a cluster and run the below command against its templeton endpoint
 curl -i -u username:password \
 -d define=javax.jdo.option.ConnectionUserName=dbusername@SQLserver \
 -d define=hive.metastore.uris= \
 -d 
 define=javax.jdo.option.ConnectionURL=jdbc:sqlserver://SQLserver.database.windows.net;database=dbname;
  encrypt=true;trustServerCertificate=true;create=false \
 -d define=javax.jdo.option.ConnectionPassword=dbpassword \
 -d statusdir=/hivestatus \
 -d user.name=admin \
 -d enablelog=false \
 -d execute=show tables; \
 -s https://localhost:30111/templeton/v1/hive;
 The following error message is received.
 javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the 
 given database. JDBC url = 
 jdbc:sqlserver://SQLserver.database.windows.net;database=dbname; 
 encrypt=true;trustServerCertificate=true;create=false, username = 
 dbusername@SQLserver. Terminating connection pool (set lazyInit to true 
 if you expect to start your database after your app). Original Exception: 
 --
 com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user 
 'dbusername'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7871) WebHCat: Hive job with SQL server as MetastoreDB fails when Unicode characters are present in curl command

2014-08-25 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7871:


Status: Patch Available  (was: Open)

 WebHCat: Hive job with SQL server as MetastoreDB fails  when Unicode 
 characters are present in curl command
 ---

 Key: HIVE-7871
 URL: https://issues.apache.org/jira/browse/HIVE-7871
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-7871.1.patch


 Please follow the steps below to repro.
 1. Create a SQL Server. Create Username, Password, DB with Unicode characters 
 in their name.
 2. Create a cluster and run the below command against its templeton endpoint
 curl -i -u username:password \
 -d define=javax.jdo.option.ConnectionUserName=dbusername@SQLserver \
 -d define=hive.metastore.uris= \
 -d 
 define=javax.jdo.option.ConnectionURL=jdbc:sqlserver://SQLserver.database.windows.net;database=dbname;
  encrypt=true;trustServerCertificate=true;create=false \
 -d define=javax.jdo.option.ConnectionPassword=dbpassword \
 -d statusdir=/hivestatus \
 -d user.name=admin \
 -d enablelog=false \
 -d execute=show tables; \
 -s https://localhost:30111/templeton/v1/hive;
 The following error message is received.
 javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the 
 given database. JDBC url = 
 jdbc:sqlserver://SQLserver.database.windows.net;database=dbname; 
 encrypt=true;trustServerCertificate=true;create=false, username = 
 dbusername@SQLserver. Terminating connection pool (set lazyInit to true 
 if you expect to start your database after your app). Original Exception: 
 --
 com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user 
 'dbusername'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24830: HIVE-7548: Precondition checks should not fail the merge task in case of automatic trigger

2014-08-25 Thread Gunther Hagleitner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24830/#review51415
---



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
https://reviews.apache.org/r/24830/#comment89723

Better to throw an AssertionException here isn't it? Otherwise you will 
blindly delete it?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileMergeMapper.java
https://reviews.apache.org/r/24830/#comment89724

?


- Gunther Hagleitner


On Aug. 19, 2014, 12:29 a.m., Prasanth_J wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24830/
 ---
 
 (Updated Aug. 19, 2014, 12:29 a.m.)
 
 
 Review request for hive and Gunther Hagleitner.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 ORC fast merge (HIVE-7509) will fail the merge task in case if any of the 
 precondition checks fail. Precondition check fail is good for ALTER TABLE .. 
 CONCATENATE but not for automatic trigger of merge task from conditional 
 resolver. In case if a partition has non-compatible ORC files for merging 
 then the merge task should ignore it and not fail the task.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1d6a93a 
   ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeMapper.java beb4f7d 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileMergeMapper.java 
 b36152a 
   ql/src/test/queries/clientnegative/orc_merge1.q b2d42cd 
   ql/src/test/queries/clientnegative/orc_merge2.q 2f62ee7 
   ql/src/test/queries/clientnegative/orc_merge3.q 5158e2e 
   ql/src/test/queries/clientnegative/orc_merge4.q ad48572 
   ql/src/test/queries/clientnegative/orc_merge5.q e94a8cc 
   ql/src/test/queries/clientpositive/orc_merge_incompat1.q PRE-CREATION 
   ql/src/test/queries/clientpositive/orc_merge_incompat2.q PRE-CREATION 
   ql/src/test/results/clientpositive/orc_merge_incompat1.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/orc_merge_incompat2.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/24830/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Prasanth_J
 




[jira] [Commented] (HIVE-7548) Precondition checks should not fail the merge task in case of automatic trigger

2014-08-25 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109433#comment-14109433
 ] 

Gunther Hagleitner commented on HIVE-7548:
--

Comments on rb. Otherwise +1. [~gopalv]/[~ashutoshc] could you take a look at 
this (esp the regex)?

 Precondition checks should not fail the merge task in case of automatic 
 trigger
 ---

 Key: HIVE-7548
 URL: https://issues.apache.org/jira/browse/HIVE-7548
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7548.1.patch


 ORC fast merge (HIVE-7509) will fail the merge task in case if any of the 
 precondition checks fail. Precondition check fail is good for ALTER TABLE .. 
 CONCATENATE but not for automatic trigger of merge task from conditional 
 resolver. In case if a partition has non-compatible ORC files for merging 
 then the merge task should ignore it and not fail the task.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Hive on Tez Counters

2014-08-25 Thread Siddharth Seth
0.5 is almost out, the vote should be closed within a day. You'' have to
use a build of Hive from the Tez branch though.

TEZ-1118 is the patch you'll need to pick up to fix the CPU counters. This
should apply directly on the 0.4 branch, if that's the approach you want to
take.


On Mon, Aug 25, 2014 at 5:38 AM, Suma Shivaprasad 
sumasai.shivapra...@gmail.com wrote:

 Hi Siddharth/Gunther,

 Thanks for replying to my queries. I was particularly interested in
 the CPU counter
 since I was doing some benchmarking on queries. Can you please clarify if I
 just blindly take a mod(CPU counter) for all tasks and add them up..would
 they be fine..or should I take a patch from the fix and apply it on Tez 0.4
  to get it working until 0.5 is released?

 Thanks
 Suma


 On Fri, Aug 22, 2014 at 2:55 AM, Gunther Hagleitner 
 ghagleit...@hortonworks.com wrote:

  Hive logs the same counters regardless of whether you run with Tez or MR.
  We've removed some counters in hive 0.13 (HIVE-4518) - the specific one
  you're looking for might be in that list.
 
  Thanks,
  Gunther.
 
 
  On Thu, Aug 21, 2014 at 11:13 AM, Siddharth Seth ss...@apache.org
 wrote:
 
   I'll let Hive folks answer the questions about the Hive counters.
  
   In terms of the CPU counter - that was a bug in Tez-0.4.0, which has
 been
   fixed in 0.5.0.
  
   COMMITTED_HEAP_BYTES just represents the memory available to the JVM
   (Runtime.getRuntime().totalMemory()). This will only vary if the VM is
   started with a different Xms and Xmx option.
  
   In terms of Tez, the application logs are currently the best place.
 Hive
   may expose these in a more accessible manner though.
  
  
   On Wed, Aug 20, 2014 at 11:16 PM, Suma Shivaprasad 
   sumasai.shivapra...@gmail.com wrote:
  
Hi,
   
Needed info on where I can get detailed job counters for Hive on Tez.
  Am
running this on a HDP cluster with Hive 0.13 and see only the
 following
   job
counters through Hive Tez in Yarn application logs which I got
 through(
yarn logs -applicationId ...) .
   
a. Cannot see any ReduceOperator counters and also only
   DESERIALIZE_ERRORS
is the only counter present in MapOperator
b. The CPU_MILLISECONDS in some cases in -ve. Is CPU_MILLISECONDS
   accurate
c. What does COMMITTED_HEAP_BYTES indicate?
d. Is there any other place I should be checking the counters?
   
[[File System Counters
FILE: BYTES_READ=512,
FILE: BYTES_WRITTEN=3079881,
FILE: READ_OPS=0, FILE: LARGE_READ_OPS=0, FILE: WRITE_OPS=0, HDFS:
BYTES_READ=8215153, HDFS: BYTES_WRITTEN=0, HDFS: READ_OPS=3, HDFS:
LARGE_READ_OPS=0, HDFS: WRITE_OPS=0]
   
[org.apache.tez.common.counters.TaskCounter SPILLED_RECORDS=222543,
GC_TIME_MILLIS=172, *CPU_MILLISECONDS=-19700*,
PHYSICAL_MEMORY_BYTES=667566080, VIRTUAL_MEMORY_BYTES=1887797248,
COMMITTED_HEAP_BYTES=1011023872, INPUT_RECORDS_PROCESSED=222543,
OUTPUT_RECORDS=222543,
OUTPUT_BYTES=23543896,
OUTPUT_BYTES_WITH_OVERHEAD=23989024, OUTPUT_BYTES_PHYSICAL=3079369,
ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0,
ADDITIONAL_SPILL_COUNT=0]
   
   
[*org.apache.hadoop.hive.ql.exec.MapOperator*$Counter
DESERIALIZE_ERRORS=0]]
   
Thanks
Suma
   
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 



[jira] [Commented] (HIVE-7353) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager

2014-08-25 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109455#comment-14109455
 ] 

Szehon Ho commented on HIVE-7353:
-

Hi [~vgumashta] I dont see the TestSessionGlobalInitFile failures in the other 
runs, can you take a look?

 HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
 

 Key: HIVE-7353
 URL: https://issues.apache.org/jira/browse/HIVE-7353
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7353.1.patch, HIVE-7353.2.patch, HIVE-7353.3.patch, 
 HIVE-7353.4.patch, HIVE-7353.5.patch


 While using embedded metastore, while creating background threads to run 
 async operations, HiveServer2 ends up creating new instances of 
 JDOPersistanceManager which are cached in JDOPersistanceManagerFactory. Even 
 when the background thread is killed by the thread pool manager, the 
 JDOPersistanceManager are never GCed because they are cached by 
 JDOPersistanceManagerFactory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24962: HIVE-7730: Extend ReadEntity to add accessed columns from query

2014-08-25 Thread Szehon Ho


 On Aug. 22, 2014, 6:14 a.m., Szehon Ho wrote:
  ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java, line 54
  https://reviews.apache.org/r/24962/diff/1/?file=666753#file666753line54
 
  Can we make this final, and not have a setter?  The caller can just add 
  to the list.  It'll make the code a bit simpler.
  
  Also should it be set?
 
 Xiaomeng Huang wrote:
 Thanks, I think it better to be list. I get accessed columns from 
 tableToColumnAccessMap, which is a MapString, ListString. Hive's native 
 authorization is use this list too.
 I get the column list via a table name, then set it to readEntity 
 directly, don't need to add every one with a loop. so it is necessary to have 
 a setter.
 BTW, I can also to add a API addAccessedColumn(String column) to add one 
 column to this column list.
 
 Szehon Ho wrote:
 OK its fine if you think it should be list.
 
 For the other part, I was thinking to have just one method 
 getAccessedColumn() which returns list.
 
 Then caller (SemanticAnalyzer) can call:   
 entity.getAccessedColumns().addAll(...).  The benefit to me is 1) the list 
 can be made final, and 2) make the calling code cleaner (no need to construct 
 lists and set them).  Also its more consistent with the other collections in 
 this class.  Hope that makes sense, thanks!
 
 Xiaomeng Huang wrote:
 Yes, it fines to me. Fixed it, thanks!

Thanks Xiaomeng, can you please upload the latest to the JIRA for testing and 
commit?

And I had just one minor comment for your consideration, as it's still not 
uploaded.  With this, we dont need to construct a new linkedList for cols 
outside the switches (its kind of a waste), and we can just directly call 
entity.getAccessedColumns().addAll(tableToColumnAccessMap.get...), or you can 
make a local variable cols = tableToColumnAccessMap.get(...) and then 
entity.getAccessedColumns().addAll(cols).  It's not a huge deal, but I was 
thinking we can fix and upload the patch together.


- Szehon


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24962/#review51257
---


On Aug. 25, 2014, 3:17 a.m., Xiaomeng Huang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24962/
 ---
 
 (Updated Aug. 25, 2014, 3:17 a.m.)
 
 
 Review request for hive, Prasad Mujumdar and Szehon Ho.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 External authorization model can not get accessed columns from query. Hive 
 should store accessed columns to ReadEntity 
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/hooks/ReadEntity.java 7ed50b4 
   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b05d3b4 
 
 Diff: https://reviews.apache.org/r/24962/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Xiaomeng Huang
 




Re: Review Request 24830: HIVE-7548: Precondition checks should not fail the merge task in case of automatic trigger

2014-08-25 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24830/#review51419
---



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
https://reviews.apache.org/r/24830/#comment89734

There are other utility functions that extracts taskID/attemptID from file 
names. None of these methods throw exception if it could not find matches for 
the regex pattern. Example: getIdFromFilename() returns filename as Id if it 
cannot match pattern. I was also following the same convention. In this case, 
if there are no matches for copy file pattern it will return false and will 
fallback to old code path.

The regex will still work if files are loaded using LOAD DATA LOCAL 
INPATH statement. With this statement, the file names will be like
1) filename.txt
2) filename_copy_1.txt
3) filename_copy_2.txt

For this file pattern, there will be no match for taskId/attemptId 
extraction. Hence no files will be marked duplicate. We really don't have to 
worry about copy file names in this case as there will not be any duplicate 
elimination.



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileMergeMapper.java
https://reviews.apache.org/r/24830/#comment89735

Fixed it.


- Prasanth_J


On Aug. 19, 2014, 12:29 a.m., Prasanth_J wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24830/
 ---
 
 (Updated Aug. 19, 2014, 12:29 a.m.)
 
 
 Review request for hive and Gunther Hagleitner.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 ORC fast merge (HIVE-7509) will fail the merge task in case if any of the 
 precondition checks fail. Precondition check fail is good for ALTER TABLE .. 
 CONCATENATE but not for automatic trigger of merge task from conditional 
 resolver. In case if a partition has non-compatible ORC files for merging 
 then the merge task should ignore it and not fail the task.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1d6a93a 
   ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeMapper.java beb4f7d 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileMergeMapper.java 
 b36152a 
   ql/src/test/queries/clientnegative/orc_merge1.q b2d42cd 
   ql/src/test/queries/clientnegative/orc_merge2.q 2f62ee7 
   ql/src/test/queries/clientnegative/orc_merge3.q 5158e2e 
   ql/src/test/queries/clientnegative/orc_merge4.q ad48572 
   ql/src/test/queries/clientnegative/orc_merge5.q e94a8cc 
   ql/src/test/queries/clientpositive/orc_merge_incompat1.q PRE-CREATION 
   ql/src/test/queries/clientpositive/orc_merge_incompat2.q PRE-CREATION 
   ql/src/test/results/clientpositive/orc_merge_incompat1.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/orc_merge_incompat2.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/24830/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Prasanth_J
 




[jira] [Updated] (HIVE-7681) qualified tablenames usage does not work with several alter-table commands

2014-08-25 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7681:


Attachment: HIVE-7681.4.patch.txt

HIVE-7681.4.patch.txt - Uploading the patch to kick off tests and make sure 
that no new tests need update.


 qualified tablenames usage does not work with several alter-table commands
 --

 Key: HIVE-7681
 URL: https://issues.apache.org/jira/browse/HIVE-7681
 Project: Hive
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Navis
 Attachments: HIVE-7681.1.patch.txt, HIVE-7681.2.patch.txt, 
 HIVE-7681.3.patch.txt, HIVE-7681.4.patch.txt, HIVE-7681.4.patch.txt


 Changes were made in HIVE-4064 for use of qualified table names in more types 
 of queries. But several alter table commands don't work with qualified 
 - alter table default.tmpfoo set tblproperties (bar = bar value)
 - ALTER TABLE default.kv_rename_test CHANGE a a STRING
 - add,drop partition
 - alter index rebuild



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24627: HIVE-7704: Create tez task for fast file merging

2014-08-25 Thread Gunther Hagleitner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24627/#review51416
---



ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java
https://reviews.apache.org/r/24627/#comment89725

why do you need a map operator at all then? Can't you just write a net new 
processor that doesn't init map op?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java
https://reviews.apache.org/r/24627/#comment89727

conf setup should happen in initVertexConf.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java
https://reviews.apache.org/r/24627/#comment89728

if i read this right the only diff is the process or. can you use a var for 
this and keep a single call to new Vertex?

String procClassName;

if ... {
  procClassName = ...
}
...
new Vertext(...procClassName)

if you move all the conf setup into the initVertexConf method this should 
be more clear.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java
https://reviews.apache.org/r/24627/#comment89726

indentation seems broken



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java
https://reviews.apache.org/r/24627/#comment89729

that means you're setting a path as the alias?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MergeFileMapRecordProcessor.java
https://reviews.apache.org/r/24627/#comment89730

I'm assuming that Merge* and ORCMerge* contain a lot of copied code? (from 
the MR path). If that's the case can you factor that out?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezProcessor.java
https://reviews.apache.org/r/24627/#comment89731

this should be a different class. not every processor will need these 
things.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java
https://reviews.apache.org/r/24627/#comment89733

don't call it jobClose if it only applies to merge work.



ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
https://reviews.apache.org/r/24627/#comment89736

you seem to be fighting that merge work is only partly a map work. why not 
create a dummy op? that way everything is the same. you could even create a 
real op and move your merge logic into it.


- Gunther Hagleitner


On Aug. 15, 2014, 5:27 a.m., Prasanth_J wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24627/
 ---
 
 (Updated Aug. 15, 2014, 5:27 a.m.)
 
 
 Review request for hive and Gunther Hagleitner.
 
 
 Bugs: HIVE-7704
 https://issues.apache.org/jira/browse/HIVE-7704
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Currently tez falls back to MR task for merge file task. It will beneficial 
 to convert the merge file tasks to tez task to make use of the performance 
 gains from tez.
 
 
 Diffs
 -
 
   itests/src/test/resources/testconfiguration.properties b801678 
   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java cd017d8 
   ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java d5de58e 
   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java a2975cb 
   ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 3d74459 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1d6a93a 
   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java 4e0fd79 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java e116426 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MapRecordProcessor.java 
 8513e33 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MapTezProcessor.java 31f3bcd 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MergeFileMapRecordProcessor.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MergeFileTezProcessor.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/OrcMergeFileMapRecordProcessor.java
  PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/OrcMergeFileTezProcessor.java 
 PRE-CREATION 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/RCFileMergeFileMapRecordProcessor.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/RecordProcessor.java 1577827 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezProcessor.java c2ba782 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 951e918 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/RCFileMergeFileTezProcessor.java
  PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java 
 bf44548 
   ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileInputFormat.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileMapper.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileOutputFormat.java 
 PRE-CREATION 
   

[jira] [Commented] (HIVE-7704) Create tez task for fast file merging

2014-08-25 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109480#comment-14109480
 ] 

Gunther Hagleitner commented on HIVE-7704:
--

comments on rb

 Create tez task for fast file merging
 -

 Key: HIVE-7704
 URL: https://issues.apache.org/jira/browse/HIVE-7704
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7704.1.patch, HIVE-7704.2.patch, HIVE-7704.3.patch, 
 HIVE-7704.4.patch, HIVE-7704.4.patch


 Currently tez falls back to MR task for merge file task. It will beneficial 
 to convert the merge file tasks to tez task to make use of the performance 
 gains from tez. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7548) Precondition checks should not fail the merge task in case of automatic trigger

2014-08-25 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7548:
-

Attachment: HIVE-7548.2.patch

 Precondition checks should not fail the merge task in case of automatic 
 trigger
 ---

 Key: HIVE-7548
 URL: https://issues.apache.org/jira/browse/HIVE-7548
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7548.1.patch, HIVE-7548.2.patch


 ORC fast merge (HIVE-7509) will fail the merge task in case if any of the 
 precondition checks fail. Precondition check fail is good for ALTER TABLE .. 
 CONCATENATE but not for automatic trigger of merge task from conditional 
 resolver. In case if a partition has non-compatible ORC files for merging 
 then the merge task should ignore it and not fail the task.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7548) Precondition checks should not fail the merge task in case of automatic trigger

2014-08-25 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109482#comment-14109482
 ] 

Prasanth J commented on HIVE-7548:
--

Addressed Gunther's review comments.

 Precondition checks should not fail the merge task in case of automatic 
 trigger
 ---

 Key: HIVE-7548
 URL: https://issues.apache.org/jira/browse/HIVE-7548
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7548.1.patch, HIVE-7548.2.patch


 ORC fast merge (HIVE-7509) will fail the merge task in case if any of the 
 precondition checks fail. Precondition check fail is good for ALTER TABLE .. 
 CONCATENATE but not for automatic trigger of merge task from conditional 
 resolver. In case if a partition has non-compatible ORC files for merging 
 then the merge task should ignore it and not fail the task.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24830: HIVE-7548: Precondition checks should not fail the merge task in case of automatic trigger

2014-08-25 Thread Gopal V

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24830/#review51424
---



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
https://reviews.apache.org/r/24830/#comment89740

Use named capture in java as much as possible.

(?taskId[0-9]+) etc.



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
https://reviews.apache.org/r/24830/#comment89741

What about LOAD DATA INPATH?


- Gopal V


On Aug. 19, 2014, 12:29 a.m., Prasanth_J wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24830/
 ---
 
 (Updated Aug. 19, 2014, 12:29 a.m.)
 
 
 Review request for hive and Gunther Hagleitner.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 ORC fast merge (HIVE-7509) will fail the merge task in case if any of the 
 precondition checks fail. Precondition check fail is good for ALTER TABLE .. 
 CONCATENATE but not for automatic trigger of merge task from conditional 
 resolver. In case if a partition has non-compatible ORC files for merging 
 then the merge task should ignore it and not fail the task.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1d6a93a 
   ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeMapper.java beb4f7d 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileMergeMapper.java 
 b36152a 
   ql/src/test/queries/clientnegative/orc_merge1.q b2d42cd 
   ql/src/test/queries/clientnegative/orc_merge2.q 2f62ee7 
   ql/src/test/queries/clientnegative/orc_merge3.q 5158e2e 
   ql/src/test/queries/clientnegative/orc_merge4.q ad48572 
   ql/src/test/queries/clientnegative/orc_merge5.q e94a8cc 
   ql/src/test/queries/clientpositive/orc_merge_incompat1.q PRE-CREATION 
   ql/src/test/queries/clientpositive/orc_merge_incompat2.q PRE-CREATION 
   ql/src/test/results/clientpositive/orc_merge_incompat1.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/orc_merge_incompat2.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/24830/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Prasanth_J
 




[jira] [Commented] (HIVE-5799) session/operation timeout for hiveserver2

2014-08-25 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109489#comment-14109489
 ] 

Brock Noland commented on HIVE-5799:


# It appears the constant SessionManager. SESSION_CHECK_INTERVAL is unused.
# HiveSessionImpl. opHandleSet should be converted to a synchronized set or a 
concurrent set since it's now modified by the client thread and the background 
thread
# I think we should call OperationManager._getOperation  getOperationInternal 
as that seems to be more standard for this use case in the Hive code base.
# Perhaps we should change OperationManager.closeExpiredOperations to not use 
closeOperation but use similar code since it appears there is a harmless race 
condition there which will log exceptions when the operation is closed during 
closeExpiredOperations

Does anyone else have any feedback? Otherwise once these are fixed I think we 
can commit.

 session/operation timeout for hiveserver2
 -

 Key: HIVE-5799
 URL: https://issues.apache.org/jira/browse/HIVE-5799
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5799.1.patch.txt, HIVE-5799.10.patch.txt, 
 HIVE-5799.11.patch.txt, HIVE-5799.12.patch.txt, HIVE-5799.12.patch.txt, 
 HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt, 
 HIVE-5799.5.patch.txt, HIVE-5799.6.patch.txt, HIVE-5799.7.patch.txt, 
 HIVE-5799.8.patch.txt, HIVE-5799.9.patch.txt


 Need some timeout facility for preventing resource leakages from instable  or 
 bad clients.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7865) Extend TestFileDump test case to printout ORC row index information

2014-08-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109492#comment-14109492
 ] 

Sergey Shelukhin commented on HIVE-7865:


+1

 Extend TestFileDump test case to printout ORC row index information
 ---

 Key: HIVE-7865
 URL: https://issues.apache.org/jira/browse/HIVE-7865
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7865.1.patch


 It will be good to have test case to printout ORC row index entries. Some 
 changes to ORC format like HIVE-7832 uses different codepath to write row 
 index entries. To make sure it is not doing anything wrong a test case that 
 prints row index entries will be helpful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5799) session/operation timeout for hiveserver2

2014-08-25 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109500#comment-14109500
 ] 

Lars Francke commented on HIVE-5799:


 The patch looks mostly good, I have a couple of minor comments regarding 
 style/checkstyle. If you're interested in them could you please update RB?

 session/operation timeout for hiveserver2
 -

 Key: HIVE-5799
 URL: https://issues.apache.org/jira/browse/HIVE-5799
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5799.1.patch.txt, HIVE-5799.10.patch.txt, 
 HIVE-5799.11.patch.txt, HIVE-5799.12.patch.txt, HIVE-5799.12.patch.txt, 
 HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt, 
 HIVE-5799.5.patch.txt, HIVE-5799.6.patch.txt, HIVE-5799.7.patch.txt, 
 HIVE-5799.8.patch.txt, HIVE-5799.9.patch.txt


 Need some timeout facility for preventing resource leakages from instable  or 
 bad clients.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24688: parallel order by clause on a string column fails with IOException: Split points are out of order

2014-08-25 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24688/#review51426
---


Looks like an important bug to fix, but I dont know too much about this code, 
can you explain what is the bug in the getPartitionKey algorithm, and what is 
the fix?  Like why we need to alter the stepSize as we iterate.  Is there a 
test we can add for this as well to illustrate and validate the fix?

Also my confusion is if the other fixes on the patch are related?

1.  Adding setConf on the HiveTotalOrderPartitioner is related to the bug?
2.  What is the use of the new HiveConf ..min.reducer?  My guess is you found 
the algorithm not generating enough partitionKey sometimes, can you explain?


common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/24688/#comment89744

If this needs to be exposed, should be worded better.  Something like:

name = hive.optimize.sampling.orderby.min.reducer.ratio

If sampling is enabled, this is the minimum ratio allowed of reducers 
calculated by sampling to expected number of reducers.

Its might be confusing to user in my opinion, as the user has little 
control of what the expected reducer is, right?



ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
https://reviews.apache.org/r/24688/#comment89742

Please add some more context to this debug statement.



ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
https://reviews.apache.org/r/24688/#comment89743

If needs to be exposed, message can be Sampling generated x number of 
reducers, but it was expected to be y


- Szehon Ho


On Aug. 14, 2014, 2:29 a.m., Navis Ryu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24688/
 ---
 
 (Updated Aug. 14, 2014, 2:29 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7669
 https://issues.apache.org/jira/browse/HIVE-7669
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 The source table has 600 Million rows and it has a String column 
 l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated 
 across the 600 million rows)
 
 We are sorting it based on this string column l_shipinstruct as shown in 
 the below HiveQL with the following parameters. 
 {code:sql}
 set hive.optimize.sampling.orderby=true;
 set hive.optimize.sampling.orderby.number=1000;
 set hive.optimize.sampling.orderby.percent=0.1f;
 
 insert overwrite table lineitem_temp_report 
 select 
   l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, 
 l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, 
 l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment
 from 
   lineitem
 order by l_shipinstruct;
 {code}
 Stack Trace
 Diagnostic Messages for this Task:
 {noformat}
 Error: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
 ... 10 more
 Caused by: java.lang.IllegalArgumentException: Can't read partitions file
 at 
 org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
 at 
 org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42)
 at 
 org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37)
 ... 15 more
 Caused by: java.io.IOException: Split points are 

Re: Review Request 23320: HiveServer2 using embedded MetaStore leaks JDOPersistanceManager

2014-08-25 Thread Vaibhav Gumashta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23320/
---

(Updated Aug. 25, 2014, 7:15 p.m.)


Review request for hive, Navis Ryu, Sushanth Sowmyan, Szehon Ho, and Thejas 
Nair.


Bugs: HIVE-7353
https://issues.apache.org/jira/browse/HIVE-7353


Repository: hive-git


Description
---

https://issues.apache.org/jira/browse/HIVE-7353


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7f4afd9 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
06d7595 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 0693039 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java e387b8f 
  service/src/java/org/apache/hive/service/cli/CLIService.java d2cdfc1 
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
de54ca1 
  service/src/java/org/apache/hive/service/cli/session/HiveSession.java 9785e95 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 
eee1cc6 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
bc0a02c 
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
d573592 
  
service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java 
37b05fc 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
be2eb01 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 
c380b69 
  
service/src/java/org/apache/hive/service/server/ThreadFactoryWithGarbageCleanup.java
 PRE-CREATION 
  service/src/java/org/apache/hive/service/server/ThreadWithGarbageCleanup.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/23320/diff/


Testing
---

Manual testing using Yourkit.


Thanks,

Vaibhav Gumashta



[jira] [Updated] (HIVE-7353) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager

2014-08-25 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-7353:
---

Attachment: HIVE-7353.6.patch

[~szehon] This should fix it.

 HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
 

 Key: HIVE-7353
 URL: https://issues.apache.org/jira/browse/HIVE-7353
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7353.1.patch, HIVE-7353.2.patch, HIVE-7353.3.patch, 
 HIVE-7353.4.patch, HIVE-7353.5.patch, HIVE-7353.6.patch


 While using embedded metastore, while creating background threads to run 
 async operations, HiveServer2 ends up creating new instances of 
 JDOPersistanceManager which are cached in JDOPersistanceManagerFactory. Even 
 when the background thread is killed by the thread pool manager, the 
 JDOPersistanceManager are never GCed because they are cached by 
 JDOPersistanceManagerFactory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7353) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager

2014-08-25 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-7353:
---

Status: Open  (was: Patch Available)

 HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
 

 Key: HIVE-7353
 URL: https://issues.apache.org/jira/browse/HIVE-7353
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7353.1.patch, HIVE-7353.2.patch, HIVE-7353.3.patch, 
 HIVE-7353.4.patch, HIVE-7353.5.patch, HIVE-7353.6.patch


 While using embedded metastore, while creating background threads to run 
 async operations, HiveServer2 ends up creating new instances of 
 JDOPersistanceManager which are cached in JDOPersistanceManagerFactory. Even 
 when the background thread is killed by the thread pool manager, the 
 JDOPersistanceManager are never GCed because they are cached by 
 JDOPersistanceManagerFactory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7872) StorageBasedAuthorizationProvider should check access perms of parent directory for DROP actions

2014-08-25 Thread Jason Dere (JIRA)
Jason Dere created HIVE-7872:


 Summary: StorageBasedAuthorizationProvider should check access 
perms of parent directory for DROP actions
 Key: HIVE-7872
 URL: https://issues.apache.org/jira/browse/HIVE-7872
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Reporter: Jason Dere


When dropping a table partition, StorageBasedAuthorizationProvider is checking 
for write permission on the partition directory itself to check if the user is 
allowed to drop the partition. However to delete the partition directory, you 
really need write perms on the parent directory of the file you are going to 
delete. So SBA will authorize the user to drop the partition but actually 
deleting the partition directory will fail if the user does not have the 
correct access on the table (parent) directory.

SBA should also check the parent directory for DROP actions during its auth 
check.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6847) Improve / fix bugs in Hive scratch dir setup

2014-08-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109598#comment-14109598
 ] 

Hive QA commented on HIVE-6847:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12664179/HIVE-6847.5.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6114 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby8_noskew
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap_auto
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/488/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/488/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-488/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12664179

 Improve / fix bugs in Hive scratch dir setup
 

 Key: HIVE-6847
 URL: https://issues.apache.org/jira/browse/HIVE-6847
 Project: Hive
  Issue Type: Bug
  Components: CLI, HiveServer2
Affects Versions: 0.14.0
Reporter: Vikram Dixit K
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-6847.1.patch, HIVE-6847.2.patch, HIVE-6847.3.patch, 
 HIVE-6847.4.patch, HIVE-6847.5.patch


 Currently, the hive server creates scratch directory and changes permission 
 to 777 however, this is not great with respect to security. We need to create 
 user specific scratch directories instead. Also refer to HIVE-6782 1st 
 iteration of the patch for approach.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24830: HIVE-7548: Precondition checks should not fail the merge task in case of automatic trigger

2014-08-25 Thread j . prasanth . j


 On Aug. 25, 2014, 6:48 p.m., Gopal V wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java, line 1549
  https://reviews.apache.org/r/24830/diff/1/?file=663983#file663983line1549
 
  Use named capture in java as much as possible.
  
  (?taskId[0-9]+) etc.

Named capture is supported only in JDK7 and above. Using comments in next patch 
to maintain compat.


 On Aug. 25, 2014, 6:48 p.m., Gopal V wrote:
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java, line 1877
  https://reviews.apache.org/r/24830/diff/1/?file=663983#file663983line1877
 
  What about LOAD DATA INPATH?

By LOAD .. INTO in comment I meant meant LOAD DATA INPATH .. INTO TABLE.. 
Please look at my previous comment for Gunther's question in review board for 
LOAD DATA example.


- Prasanth_J


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24830/#review51424
---


On Aug. 19, 2014, 12:29 a.m., Prasanth_J wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24830/
 ---
 
 (Updated Aug. 19, 2014, 12:29 a.m.)
 
 
 Review request for hive and Gunther Hagleitner.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 ORC fast merge (HIVE-7509) will fail the merge task in case if any of the 
 precondition checks fail. Precondition check fail is good for ALTER TABLE .. 
 CONCATENATE but not for automatic trigger of merge task from conditional 
 resolver. In case if a partition has non-compatible ORC files for merging 
 then the merge task should ignore it and not fail the task.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1d6a93a 
   ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeMapper.java beb4f7d 
   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileMergeMapper.java 
 b36152a 
   ql/src/test/queries/clientnegative/orc_merge1.q b2d42cd 
   ql/src/test/queries/clientnegative/orc_merge2.q 2f62ee7 
   ql/src/test/queries/clientnegative/orc_merge3.q 5158e2e 
   ql/src/test/queries/clientnegative/orc_merge4.q ad48572 
   ql/src/test/queries/clientnegative/orc_merge5.q e94a8cc 
   ql/src/test/queries/clientpositive/orc_merge_incompat1.q PRE-CREATION 
   ql/src/test/queries/clientpositive/orc_merge_incompat2.q PRE-CREATION 
   ql/src/test/results/clientpositive/orc_merge_incompat1.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/orc_merge_incompat2.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/24830/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Prasanth_J
 




Re: Review Request 24830: HIVE-7548: Precondition checks should not fail the merge task in case of automatic trigger

2014-08-25 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24830/
---

(Updated Aug. 25, 2014, 7:54 p.m.)


Review request for hive and Gunther Hagleitner.


Bugs: HIVE-7548
https://issues.apache.org/jira/browse/HIVE-7548


Repository: hive-git


Description
---

ORC fast merge (HIVE-7509) will fail the merge task in case if any of the 
precondition checks fail. Precondition check fail is good for ALTER TABLE .. 
CONCATENATE but not for automatic trigger of merge task from conditional 
resolver. In case if a partition has non-compatible ORC files for merging then 
the merge task should ignore it and not fail the task.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 1d6a93a 
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeMapper.java beb4f7d 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileMergeMapper.java b36152a 
  ql/src/test/queries/clientnegative/orc_merge1.q b2d42cd 
  ql/src/test/queries/clientnegative/orc_merge2.q 2f62ee7 
  ql/src/test/queries/clientnegative/orc_merge3.q 5158e2e 
  ql/src/test/queries/clientnegative/orc_merge4.q ad48572 
  ql/src/test/queries/clientnegative/orc_merge5.q e94a8cc 
  ql/src/test/queries/clientpositive/orc_merge_incompat1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/orc_merge_incompat2.q PRE-CREATION 
  ql/src/test/results/clientpositive/orc_merge_incompat1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/orc_merge_incompat2.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/24830/diff/


Testing
---


Thanks,

Prasanth_J



[jira] [Updated] (HIVE-7548) Precondition checks should not fail the merge task in case of automatic trigger

2014-08-25 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7548:
-

Attachment: HIVE-7548.3.patch

Addressed Gopal's review comments

 Precondition checks should not fail the merge task in case of automatic 
 trigger
 ---

 Key: HIVE-7548
 URL: https://issues.apache.org/jira/browse/HIVE-7548
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7548.1.patch, HIVE-7548.2.patch, HIVE-7548.3.patch


 ORC fast merge (HIVE-7509) will fail the merge task in case if any of the 
 precondition checks fail. Precondition check fail is good for ALTER TABLE .. 
 CONCATENATE but not for automatic trigger of merge task from conditional 
 resolver. In case if a partition has non-compatible ORC files for merging 
 then the merge task should ignore it and not fail the task.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7353) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager

2014-08-25 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109642#comment-14109642
 ] 

Szehon Ho commented on HIVE-7353:
-

Thanks, do you mind updating the RB as well?

 HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
 

 Key: HIVE-7353
 URL: https://issues.apache.org/jira/browse/HIVE-7353
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7353.1.patch, HIVE-7353.2.patch, HIVE-7353.3.patch, 
 HIVE-7353.4.patch, HIVE-7353.5.patch, HIVE-7353.6.patch


 While using embedded metastore, while creating background threads to run 
 async operations, HiveServer2 ends up creating new instances of 
 JDOPersistanceManager which are cached in JDOPersistanceManagerFactory. Even 
 when the background thread is killed by the thread pool manager, the 
 JDOPersistanceManager are never GCed because they are cached by 
 JDOPersistanceManagerFactory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7865) Extend TestFileDump test case to printout ORC row index information

2014-08-25 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7865:
-

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

 Extend TestFileDump test case to printout ORC row index information
 ---

 Key: HIVE-7865
 URL: https://issues.apache.org/jira/browse/HIVE-7865
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.14.0

 Attachments: HIVE-7865.1.patch


 It will be good to have test case to printout ORC row index entries. Some 
 changes to ORC format like HIVE-7832 uses different codepath to write row 
 index entries. To make sure it is not doing anything wrong a test case that 
 prints row index entries will be helpful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7613) Research optimization of auto convert join to map join [Spark branch]

2014-08-25 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-7613:


Assignee: Szehon Ho

I'll take at this this week, but I might not be able to finish (be out next 
week).  I can hand it off to somebody else at that point.

 Research optimization of auto convert join to map join [Spark branch]
 -

 Key: HIVE-7613
 URL: https://issues.apache.org/jira/browse/HIVE-7613
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Szehon Ho
Priority: Minor

 ConvertJoinMapJoin is an optimization the replaces a common join(aka shuffle 
 join) with a map join(aka broadcast or fragment replicate join) when 
 possible. we need to research how to make it workable with Hive on Spark.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24986: HIVE-7553: decouple the auxiliary jar loading from hive server2 starting phase

2014-08-25 Thread Brock Noland

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24986/#review51436
---


Hi,

Thank you very much for your work!! This looks great! I have a few comments 
below.


common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
https://reviews.apache.org/r/24986/#comment89754

I've been trying to think of a good name. I think we should call this 
Reloadable jars since we are re-loading them. Thoughts?



ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionTask.java
https://reviews.apache.org/r/24986/#comment89764

Do we need to do this? getSessionSpecifiedClassLoader  won't return null



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
https://reviews.apache.org/r/24986/#comment89756

Let's put some trace logging in here as which classloader we are returning.

I don't see the classloader on Conf actually getting set anywhere. Is it 
set by someone for us?



ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java
https://reviews.apache.org/r/24986/#comment89757

I don't think we want HiveAuthroization and HiveAuthentication providers to 
be reloadble?



ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java
https://reviews.apache.org/r/24986/#comment89758

Same as above?



ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
https://reviews.apache.org/r/24986/#comment89760

This should be moved to the top of the class and be make final



ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
https://reviews.apache.org/r/24986/#comment89759

Let's use camelCaps not under_scores for variable names since that is more 
standard.



ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
https://reviews.apache.org/r/24986/#comment89761

I think that HIVEREFRESHJARS  should be a list of directories like 
HIVEAUXJARS



ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java
https://reviews.apache.org/r/24986/#comment89762

Can you put this in java.io.tmpdir?



ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java
https://reviews.apache.org/r/24986/#comment89768

Let's not print to std error. Let's print to log. I think we should also 
call Assert.fail(msg) with a good message.



ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java
https://reviews.apache.org/r/24986/#comment89767

We should fail of an exception is thrown



ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java
https://reviews.apache.org/r/24986/#comment89765

We should fail of an exception is thrown



ql/src/test/org/apache/hadoop/hive/ql/session/TestSessionState.java
https://reviews.apache.org/r/24986/#comment89766

We should fail of an exception is thrown



service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java
https://reviews.apache.org/r/24986/#comment89763

We should do something which this exception


Hi,

- Brock Noland


On Aug. 25, 2014, 6:45 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24986/
 ---
 
 (Updated Aug. 25, 2014, 6:45 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7553
 https://issues.apache.org/jira/browse/HIVE-7553
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-7553: decouple the auxiliary jar loading from hive server2 starting phase
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
 7f4afd9d64aff18329e7850342855aade42e21f5 
   hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatUtil.java 
 93a03adeab7ba3c3c91344955d303e4252005239 
   
 hcatalog/webhcat/java-client/src/main/java/org/apache/hive/hcatalog/api/HCatClient.java
  f25039dcf55b3b24bbf8dcba05855665a1c7f3b0 
   ql/src/java/org/apache/hadoop/hive/ql/exec/DefaultFetchFormatter.java 
 5924bcf1f55dc4c2dd06f312f929047b7df9de55 
   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 
 0c6a3d44ef1f796778768421dc02f8bf3ede6a8c 
   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionTask.java 
 bd45df1a401d1adb009e953d08205c7d5c2d5de2 
   ql/src/java/org/apache/hadoop/hive/ql/exec/ListSinkOperator.java 
 dcc19f70644c561e17df8c8660ca62805465f1d6 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 
 76fee612a583cdc2c632d27932623521b735e768 
   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java 
 eb2851b2c5fa52e0f555b3d8d1beea5d1ac3b225 
   ql/src/java/org/apache/hadoop/hive/ql/hooks/HookUtils.java 
 3f474f846c7af5f1f65f1c14f3ce51308f1279d4 
   ql/src/java/org/apache/hadoop/hive/ql/io/HivePassThroughOutputFormat.java 
 0962cadce0d515e046371d0a816f4efd70b8eef7 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java 
 

[jira] [Commented] (HIVE-7353) HiveServer2 using embedded MetaStore leaks JDOPersistanceManager

2014-08-25 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109662#comment-14109662
 ] 

Vaibhav Gumashta commented on HIVE-7353:


[~szehon] Done that already. Actually hold off on reviewing this one for few 
mins - I see one issue with the current patch - I'll put up an updated one in 
few mins. 



 HiveServer2 using embedded MetaStore leaks JDOPersistanceManager
 

 Key: HIVE-7353
 URL: https://issues.apache.org/jira/browse/HIVE-7353
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 0.14.0

 Attachments: HIVE-7353.1.patch, HIVE-7353.2.patch, HIVE-7353.3.patch, 
 HIVE-7353.4.patch, HIVE-7353.5.patch, HIVE-7353.6.patch


 While using embedded metastore, while creating background threads to run 
 async operations, HiveServer2 ends up creating new instances of 
 JDOPersistanceManager which are cached in JDOPersistanceManagerFactory. Even 
 when the background thread is killed by the thread pool manager, the 
 JDOPersistanceManager are never GCed because they are cached by 
 JDOPersistanceManagerFactory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7873) Re-enable lazy HiveBaseFunctionResultList

2014-08-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7873:
---

Issue Type: Sub-task  (was: Bug)
Parent: HIVE-7292

 Re-enable lazy HiveBaseFunctionResultList
 -

 Key: HIVE-7873
 URL: https://issues.apache.org/jira/browse/HIVE-7873
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
  Labels: spark

 We removed this optimization in HIVE-7799.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7873) Re-enable lazy HiveBaseFunctionResultList

2014-08-25 Thread Brock Noland (JIRA)
Brock Noland created HIVE-7873:
--

 Summary: Re-enable lazy HiveBaseFunctionResultList
 Key: HIVE-7873
 URL: https://issues.apache.org/jira/browse/HIVE-7873
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland


We removed this optimization in HIVE-7799.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7846) authorization api should support group, not assume case insensitive role names

2014-08-25 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7846:


Summary: authorization api should support group, not assume case 
insensitive role names  (was: authorization api should not assume case 
insensitive role names)

 authorization api should support group, not assume case insensitive role names
 --

 Key: HIVE-7846
 URL: https://issues.apache.org/jira/browse/HIVE-7846
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair

 The case insensitive behavior of roles should be specific to sql standard 
 authorization.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7846) authorization api should support group, not assume case insensitive role names

2014-08-25 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7846:


Description: 
The case insensitive behavior of roles should be specific to sql standard 
authorization.
Group type for principal also should be disabled at the sql std authorization 
layer, instead of disallowing it at the API level.

  was:
The case insensitive behavior of roles should be specific to sql standard 
authorization.



 authorization api should support group, not assume case insensitive role names
 --

 Key: HIVE-7846
 URL: https://issues.apache.org/jira/browse/HIVE-7846
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair

 The case insensitive behavior of roles should be specific to sql standard 
 authorization.
 Group type for principal also should be disabled at the sql std authorization 
 layer, instead of disallowing it at the API level.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-25 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109679#comment-14109679
 ] 

Brock Noland commented on HIVE-7799:


Hey guys, I created HIVE-7873 to track the improvement in M4.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, 
 HIVE-7799.3-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7799:
---

   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Thank you guys! I have committed this to spark!

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Fix For: spark-branch

 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, 
 HIVE-7799.3-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7873) Re-enable lazy HiveBaseFunctionResultList

2014-08-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7873:
---

Labels: Spark-M5 spark  (was: spark)

 Re-enable lazy HiveBaseFunctionResultList
 -

 Key: HIVE-7873
 URL: https://issues.apache.org/jira/browse/HIVE-7873
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
  Labels: Spark-M4, spark

 We removed this optimization in HIVE-7799.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7873) Re-enable lazy HiveBaseFunctionResultList

2014-08-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7873:
---

Labels: Spark-M4 spark  (was: Spark-M5 spark)

 Re-enable lazy HiveBaseFunctionResultList
 -

 Key: HIVE-7873
 URL: https://issues.apache.org/jira/browse/HIVE-7873
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
  Labels: Spark-M4, spark

 We removed this optimization in HIVE-7799.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7871) WebHCat: Hive job with SQL server as MetastoreDB fails when Unicode characters are present in curl command

2014-08-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109684#comment-14109684
 ] 

Hive QA commented on HIVE-7871:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12664180/HIVE-7871.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6118 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/489/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/489/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-489/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12664180

 WebHCat: Hive job with SQL server as MetastoreDB fails  when Unicode 
 characters are present in curl command
 ---

 Key: HIVE-7871
 URL: https://issues.apache.org/jira/browse/HIVE-7871
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-7871.1.patch


 Please follow the steps below to repro.
 1. Create a SQL Server. Create Username, Password, DB with Unicode characters 
 in their name.
 2. Create a cluster and run the below command against its templeton endpoint
 curl -i -u username:password \
 -d define=javax.jdo.option.ConnectionUserName=dbusername@SQLserver \
 -d define=hive.metastore.uris= \
 -d 
 define=javax.jdo.option.ConnectionURL=jdbc:sqlserver://SQLserver.database.windows.net;database=dbname;
  encrypt=true;trustServerCertificate=true;create=false \
 -d define=javax.jdo.option.ConnectionPassword=dbpassword \
 -d statusdir=/hivestatus \
 -d user.name=admin \
 -d enablelog=false \
 -d execute=show tables; \
 -s https://localhost:30111/templeton/v1/hive;
 The following error message is received.
 javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the 
 given database. JDBC url = 
 jdbc:sqlserver://SQLserver.database.windows.net;database=dbname; 
 encrypt=true;trustServerCertificate=true;create=false, username = 
 dbusername@SQLserver. Terminating connection pool (set lazyInit to true 
 if you expect to start your database after your app). Original Exception: 
 --
 com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user 
 'dbusername'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7842) load_dyn_part1.q fails with an assertion [Spark Branch]

2014-08-25 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109705#comment-14109705
 ] 

Brock Noland commented on HIVE-7842:


IIRC assertions are not enabled for MR when tasks are run so this might fail 
with MR as well.

 load_dyn_part1.q fails with an assertion [Spark Branch]
 ---

 Key: HIVE-7842
 URL: https://issues.apache.org/jira/browse/HIVE-7842
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Venki Korukanti
Assignee: Venki Korukanti
  Labels: Spark-M1
 Fix For: spark-branch


 On spark branch, load_dyn_part1.q fails with following assertion. Looks like 
 SerDe is receiving invalid ByteWritable buffer.
 {code}
 java.lang.AssertionError
 org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:205)
 org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:187)
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:186)
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47)
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27)
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
 org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7842) load_dyn_part1.q fails with an assertion [Spark Branch]

2014-08-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7842:
---

Issue Type: Sub-task  (was: Bug)
Parent: HIVE-7292

 load_dyn_part1.q fails with an assertion [Spark Branch]
 ---

 Key: HIVE-7842
 URL: https://issues.apache.org/jira/browse/HIVE-7842
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Venki Korukanti
Assignee: Venki Korukanti
  Labels: Spark-M1
 Fix For: spark-branch


 On spark branch, load_dyn_part1.q fails with following assertion. Looks like 
 SerDe is receiving invalid ByteWritable buffer.
 {code}
 java.lang.AssertionError
 org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:205)
 org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:187)
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:186)
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47)
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27)
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
 org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7842) load_dyn_part1.q fails with an assertion [Spark Branch]

2014-08-25 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109704#comment-14109704
 ] 

Brock Noland commented on HIVE-7842:


Linking to HIVE-7580. 

 load_dyn_part1.q fails with an assertion [Spark Branch]
 ---

 Key: HIVE-7842
 URL: https://issues.apache.org/jira/browse/HIVE-7842
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Venki Korukanti
Assignee: Venki Korukanti
  Labels: Spark-M1
 Fix For: spark-branch


 On spark branch, load_dyn_part1.q fails with following assertion. Looks like 
 SerDe is receiving invalid ByteWritable buffer.
 {code}
 java.lang.AssertionError
 org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:205)
 org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:187)
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:186)
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47)
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27)
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
 org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7842) load_dyn_part1.q fails with an assertion [Spark Branch]

2014-08-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7842:
---

Labels: Spark-M1  (was: )

 load_dyn_part1.q fails with an assertion [Spark Branch]
 ---

 Key: HIVE-7842
 URL: https://issues.apache.org/jira/browse/HIVE-7842
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Venki Korukanti
Assignee: Venki Korukanti
  Labels: Spark-M1
 Fix For: spark-branch


 On spark branch, load_dyn_part1.q fails with following assertion. Looks like 
 SerDe is receiving invalid ByteWritable buffer.
 {code}
 java.lang.AssertionError
 org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:205)
 org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:187)
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:186)
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47)
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27)
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
 org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7843) orc_analyze.q fails with an assertion in FileSinkOperator [Spark Branch]

2014-08-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7843:
---

Labels: Spark-M1  (was: )

 orc_analyze.q fails with an assertion in FileSinkOperator [Spark Branch]
 

 Key: HIVE-7843
 URL: https://issues.apache.org/jira/browse/HIVE-7843
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Venki Korukanti
Assignee: Venki Korukanti
  Labels: Spark-M1
 Fix For: spark-branch


 {code}
 java.lang.AssertionError: data length is different from num of DP columns
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynPartDirectory(FileSinkOperator.java:809)
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:730)
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.startGroup(FileSinkOperator.java:829)
 org.apache.hadoop.hive.ql.exec.Operator.defaultStartGroup(Operator.java:502)
 org.apache.hadoop.hive.ql.exec.Operator.startGroup(Operator.java:525)
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:198)
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47)
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27)
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
 org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7844) optimize_nullscan.q fails due to differences in explain plan [Spark Branch]

2014-08-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7844:
---

Issue Type: Sub-task  (was: Bug)
Parent: HIVE-7292

 optimize_nullscan.q fails due to differences in explain plan [Spark Branch]
 ---

 Key: HIVE-7844
 URL: https://issues.apache.org/jira/browse/HIVE-7844
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Venki Korukanti
Assignee: Venki Korukanti
  Labels: Spark-M1
 Fix For: spark-branch


 Looks like on spark branch, we are not optimizing query plans for limit 0 
 cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7843) orc_analyze.q fails with an assertion in FileSinkOperator [Spark Branch]

2014-08-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7843:
---

Issue Type: Sub-task  (was: Bug)
Parent: HIVE-7292

 orc_analyze.q fails with an assertion in FileSinkOperator [Spark Branch]
 

 Key: HIVE-7843
 URL: https://issues.apache.org/jira/browse/HIVE-7843
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Venki Korukanti
Assignee: Venki Korukanti
  Labels: Spark-M1
 Fix For: spark-branch


 {code}
 java.lang.AssertionError: data length is different from num of DP columns
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynPartDirectory(FileSinkOperator.java:809)
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:730)
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.startGroup(FileSinkOperator.java:829)
 org.apache.hadoop.hive.ql.exec.Operator.defaultStartGroup(Operator.java:502)
 org.apache.hadoop.hive.ql.exec.Operator.startGroup(Operator.java:525)
 org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:198)
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47)
 org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27)
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
 org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7438) Counters, statistics, and metrics [Spark Branch]

2014-08-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7438:
---

Labels: Spark-M2  (was: )

 Counters, statistics, and metrics [Spark Branch]
 

 Key: HIVE-7438
 URL: https://issues.apache.org/jira/browse/HIVE-7438
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
  Labels: Spark-M2
 Attachments: hive on spark job statistic design.docx


 Hive makes use of MapReduce counters for statistics and possibly for other 
 purposes. For Hive on Spark, we should achieve the same functionality using 
 Spark's accumulators.
 Hive also collects metrics from MapReduce jobs traditionally. Spark job very 
 likely publishes a different set of metrics, which, if made available, would 
 help user to get insights into their spark jobs. Thus, we should obtain the 
 metrics and make them available as we do for MapReduce.
 This task therefore includes:
 # identify Hive's existing functionality w.r.t. counters, statistics, and 
 metrics;
 # design and implement the same functionality in Spark.
 Please refer to the design document for more information. 
 https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark#HiveonSpark-CountersandMetrics



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7844) optimize_nullscan.q fails due to differences in explain plan [Spark Branch]

2014-08-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7844:
---

Labels: Spark-M1  (was: )

 optimize_nullscan.q fails due to differences in explain plan [Spark Branch]
 ---

 Key: HIVE-7844
 URL: https://issues.apache.org/jira/browse/HIVE-7844
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Venki Korukanti
Assignee: Venki Korukanti
  Labels: Spark-M1
 Fix For: spark-branch


 Looks like on spark branch, we are not optimizing query plans for limit 0 
 cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7438) Counters, statistics, and metrics [Spark Branch]

2014-08-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7438:
---

Labels: Spark-M3  (was: Spark-M2)

 Counters, statistics, and metrics [Spark Branch]
 

 Key: HIVE-7438
 URL: https://issues.apache.org/jira/browse/HIVE-7438
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
  Labels: Spark-M3
 Attachments: hive on spark job statistic design.docx


 Hive makes use of MapReduce counters for statistics and possibly for other 
 purposes. For Hive on Spark, we should achieve the same functionality using 
 Spark's accumulators.
 Hive also collects metrics from MapReduce jobs traditionally. Spark job very 
 likely publishes a different set of metrics, which, if made available, would 
 help user to get insights into their spark jobs. Thus, we should obtain the 
 metrics and make them available as we do for MapReduce.
 This task therefore includes:
 # identify Hive's existing functionality w.r.t. counters, statistics, and 
 metrics;
 # design and implement the same functionality in Spark.
 Please refer to the design document for more information. 
 https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark#HiveonSpark-CountersandMetrics



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7439) Spark job monitoring and error reporting [Spark Branch]

2014-08-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7439:
---

Labels: Spark-M3  (was: )

 Spark job monitoring and error reporting [Spark Branch]
 ---

 Key: HIVE-7439
 URL: https://issues.apache.org/jira/browse/HIVE-7439
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
  Labels: Spark-M3

 After Hive submits a job to Spark cluster, we need to report to user the job 
 progress, such as the percentage done, to the user. This is especially 
 important for long running queries. Moreover, if there is an error during job 
 submission or execution, it's also crucial for hive to fetch the error log 
 and/or stacktrace and feedback it to the user.
 Please refer design doc on wiki for more information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7439) Spark job monitoring and error reporting [Spark Branch]

2014-08-25 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109721#comment-14109721
 ] 

Brock Noland commented on HIVE-7439:


I think that we'll need the API from HIVE-7874 to do this work.

 Spark job monitoring and error reporting [Spark Branch]
 ---

 Key: HIVE-7439
 URL: https://issues.apache.org/jira/browse/HIVE-7439
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chengxiang Li
  Labels: Spark-M3

 After Hive submits a job to Spark cluster, we need to report to user the job 
 progress, such as the percentage done, to the user. This is especially 
 important for long running queries. Moreover, if there is an error during job 
 submission or execution, it's also crucial for hive to fetch the error log 
 and/or stacktrace and feedback it to the user.
 Please refer design doc on wiki for more information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7874) Support multiple concurrent users

2014-08-25 Thread Brock Noland (JIRA)
Brock Noland created HIVE-7874:
--

 Summary: Support multiple concurrent users
 Key: HIVE-7874
 URL: https://issues.apache.org/jira/browse/HIVE-7874
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Priority: Blocker


This JIRA is to track on the Hive side the solution to handling multiple user 
sessions.

At first we thought this would be SPARK-2243 but there have been discussions on 
the Spark side which suggest the solution will be different.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7874) Support multiple concurrent users

2014-08-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7874:
---

Description: This JIRA is to track on the Hive side the solution to 
handling multiple user sessions. We thought this would be SPARK-2243 but there 
have been discussions on the Spark side which suggest the solution will be 
different.  (was: This JIRA is to track on the Hive side the solution to 
handling multiple user sessions.

At first we thought this would be SPARK-2243 but there have been discussions on 
the Spark side which suggest the solution will be different.)

 Support multiple concurrent users
 -

 Key: HIVE-7874
 URL: https://issues.apache.org/jira/browse/HIVE-7874
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Priority: Blocker

 This JIRA is to track on the Hive side the solution to handling multiple user 
 sessions. We thought this would be SPARK-2243 but there have been discussions 
 on the Spark side which suggest the solution will be different.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7846) authorization api should support group, not assume case insensitive role names

2014-08-25 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7846:


Attachment: HIVE-7846.1.patch

 authorization api should support group, not assume case insensitive role names
 --

 Key: HIVE-7846
 URL: https://issues.apache.org/jira/browse/HIVE-7846
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-7846.1.patch


 The case insensitive behavior of roles should be specific to sql standard 
 authorization.
 Group type for principal also should be disabled at the sql std authorization 
 layer, instead of disallowing it at the API level.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 25037: HIVE-7846 - authorization api should support group, not assume case insensitive role names

2014-08-25 Thread Thejas Nair

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25037/
---

Review request for hive and Jason Dere.


Bugs: HIVE-7846
https://issues.apache.org/jira/browse/HIVE-7846


Repository: hive-git


Description
---

See https://issues.apache.org/jira/browse/HIVE-7846


Diffs
-

  
itests/util/src/main/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAccessControllerForTest.java
 89429b6 
  
itests/util/src/main/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAuthorizationValidatorForTest.java
 1d039ad 
  
itests/util/src/main/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAuthorizerFactoryForTest.java
 0f41a8f 
  
ql/src/java/org/apache/hadoop/hive/ql/parse/authorization/HiveAuthorizationTaskFactoryImpl.java
 f92ecf2 
  ql/src/java/org/apache/hadoop/hive/ql/plan/RoleDDLDesc.java 8413fb7 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/AuthorizationUtils.java
 2113f45 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HivePrincipal.java
 30a4496 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLAuthorizationUtils.java
 a6b008a 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAccessControllerWrapper.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAuthorizationValidator.java
 9ceac0c 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAuthorizerFactory.java
 9db3d74 
  ql/src/test/queries/clientnegative/authorization_grant_group.q PRE-CREATION 
  ql/src/test/queries/clientnegative/authorization_public_create.q 002389f 
  ql/src/test/queries/clientnegative/authorization_public_drop.q 69c5a8d 
  ql/src/test/queries/clientnegative/authorization_role_case.q PRE-CREATION 
  ql/src/test/queries/clientnegative/authorize_grant_public.q bfd3165 
  ql/src/test/queries/clientnegative/authorize_revoke_public.q 2b29822 
  ql/src/test/queries/clientpositive/authorization_1.q 25c9918 
  ql/src/test/queries/clientpositive/authorization_5.q 8869edc 
  ql/src/test/queries/clientpositive/authorization_grant_public_role.q fe177ac 
  ql/src/test/queries/clientpositive/authorization_role_grant2.q 95fa4e6 
  ql/src/test/results/clientnegative/authorization_grant_group.q.out 
PRE-CREATION 
  ql/src/test/results/clientnegative/authorization_public_create.q.out 4c9a2ad 
  ql/src/test/results/clientnegative/authorization_public_drop.q.out 520b56e 
  ql/src/test/results/clientnegative/authorization_role_case.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/authorize_grant_public.q.out ef4a1b1 
  ql/src/test/results/clientnegative/authorize_revoke_public.q.out 618fedd 
  ql/src/test/results/clientpositive/authorization_1.q.out dac0820 
  ql/src/test/results/clientpositive/authorization_5.q.out 6e5187e 
  ql/src/test/results/clientpositive/authorization_grant_public_role.q.out 
17b6c8a 
  ql/src/test/results/clientpositive/authorization_role_grant2.q.out 56e7667 

Diff: https://reviews.apache.org/r/25037/diff/


Testing
---

Tests of old default authorization mode modified to verify that roles with 
mixed case now work.
Added -ve test for group in grant with sql std auth.


Thanks,

Thejas Nair



[jira] [Updated] (HIVE-7392) Support Columns Stats for Partition Columns

2014-08-25 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7392:
---

Attachment: h-7392.patch

 Support Columns Stats for Partition Columns
 ---

 Key: HIVE-7392
 URL: https://issues.apache.org/jira/browse/HIVE-7392
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Ashutosh Chauhan
 Attachments: h-7392.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7392) Support Columns Stats for Partition Columns

2014-08-25 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7392:
---

Status: Patch Available  (was: Open)

 Support Columns Stats for Partition Columns
 ---

 Key: HIVE-7392
 URL: https://issues.apache.org/jira/browse/HIVE-7392
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Ashutosh Chauhan
 Attachments: h-7392.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 25038: implement NDV for partition colum

2014-08-25 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25038/
---

Review request for hive and John Pullokkaran.


Bugs: HIVE-7392
https://issues.apache.org/jira/browse/HIVE-7392


Repository: hive


Description
---

implement NDV for partition colum


Diffs
-

  
branches/cbo/ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/RelOptHiveTable.java
 1620394 

Diff: https://reviews.apache.org/r/25038/diff/


Testing
---


Thanks,

Ashutosh Chauhan



[jira] [Commented] (HIVE-7392) Support Columns Stats for Partition Columns

2014-08-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109786#comment-14109786
 ] 

Sergey Shelukhin commented on HIVE-7392:


Is handling column values as strings correct for databases created with all 
versions of hive?
In olden days it was possible to create partitions like a=2 and a=02 for 
integer a. There may still be similar cases now, but more restricted.
What will query return in such cases on Hive? It may be different value than 
from these stats.

 Support Columns Stats for Partition Columns
 ---

 Key: HIVE-7392
 URL: https://issues.apache.org/jira/browse/HIVE-7392
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Ashutosh Chauhan
 Attachments: h-7392.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7875) Hive cannot load data into partitioned table with Unicode key

2014-08-25 Thread Xiaobing Zhou (JIRA)
Xiaobing Zhou created HIVE-7875:
---

 Summary: Hive cannot load data into partitioned table with Unicode 
key
 Key: HIVE-7875
 URL: https://issues.apache.org/jira/browse/HIVE-7875
 Project: Hive
  Issue Type: Bug
 Environment: Windows Server 2008
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou


Steps to reproduce:
1) Copy the file partitioned.txt to the root folder HDFS. Copy the two hql 
files to your local directory.
2) Open Hive CLI.
3) Run:
hive source path to CreatePartitionedTable.hql;
4) Run
hive source path to LoadIntoPartitionedTable.hql;
The following error will be shown:
hive source C:\Scripts\partition\LoadIntoPartitionedTable.hql;
Loading data to table default.mypartitioned partition (tag=䶵)
Failed with exception null
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MoveTask



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7875) Hive cannot load data into partitioned table with Unicode key

2014-08-25 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HIVE-7875:


Fix Version/s: 0.14.0

 Hive cannot load data into partitioned table with Unicode key
 -

 Key: HIVE-7875
 URL: https://issues.apache.org/jira/browse/HIVE-7875
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
 Environment: Windows Server 2008
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
 Fix For: 0.14.0

 Attachments: CreatePartitionedTable.hql, 
 LoadIntoPartitionedTable.hql, partitioned.txt


 Steps to reproduce:
 1) Copy the file partitioned.txt to the root folder HDFS. Copy the two hql 
 files to your local directory.
 2) Open Hive CLI.
 3) Run:
 hive source path to CreatePartitionedTable.hql;
 4) Run
 hive source path to LoadIntoPartitionedTable.hql;
 The following error will be shown:
 hive source C:\Scripts\partition\LoadIntoPartitionedTable.hql;
 Loading data to table default.mypartitioned partition (tag=䶵)
 Failed with exception null
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.MoveTask



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7875) Hive cannot load data into partitioned table with Unicode key

2014-08-25 Thread Xiaobing Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaobing Zhou updated HIVE-7875:


Affects Version/s: 0.13.0

 Hive cannot load data into partitioned table with Unicode key
 -

 Key: HIVE-7875
 URL: https://issues.apache.org/jira/browse/HIVE-7875
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
 Environment: Windows Server 2008
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
 Fix For: 0.14.0

 Attachments: CreatePartitionedTable.hql, 
 LoadIntoPartitionedTable.hql, partitioned.txt


 Steps to reproduce:
 1) Copy the file partitioned.txt to the root folder HDFS. Copy the two hql 
 files to your local directory.
 2) Open Hive CLI.
 3) Run:
 hive source path to CreatePartitionedTable.hql;
 4) Run
 hive source path to LoadIntoPartitionedTable.hql;
 The following error will be shown:
 hive source C:\Scripts\partition\LoadIntoPartitionedTable.hql;
 Loading data to table default.mypartitioned partition (tag=䶵)
 Failed with exception null
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.MoveTask



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >