[jira] [Commented] (HIVE-12283) Fix test failures after HIVE-11844 [Spark Branch]

2015-10-28 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978273#comment-14978273
 ] 

Xuefu Zhang commented on HIVE-12283:


+1

> Fix test failures after HIVE-11844 [Spark Branch]
> -
>
> Key: HIVE-12283
> URL: https://issues.apache.org/jira/browse/HIVE-12283
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-12283.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12284) Merge master to Spark branch 10/28/2015 [Spark Branch]

2015-10-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12284:
---
Summary: Merge master to Spark branch 10/28/2015 [Spark Branch]  (was: 
CLONE - Merge master to Spark branch 10/26/2015 [Spark Branch])

> Merge master to Spark branch 10/28/2015 [Spark Branch]
> --
>
> Key: HIVE-12284
> URL: https://issues.apache.org/jira/browse/HIVE-12284
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: spark-branch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12279) Testcase to verify session temporary files are removed after HIVE-11768

2015-10-28 Thread Chinna Rao Lalam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978446#comment-14978446
 ] 

Chinna Rao Lalam commented on HIVE-12279:
-

Hi [~daijy],

Thanks for the patch. Here i have minor suggestion. As part of this test can we 
verify {{ShutdownHookManager.isRegisteredToDeleteOnExit(file)}}  for these 
files also, as 
{{org.apache.hive.common.util.TestShutdownHookManager.deleteOnExit()}}. Any 
thoughts.

> Testcase to verify session temporary files are removed after HIVE-11768
> ---
>
> Key: HIVE-12279
> URL: https://issues.apache.org/jira/browse/HIVE-12279
> Project: Hive
>  Issue Type: Test
>  Components: HiveServer2, Test
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 2.0.0
>
> Attachments: HIVE-12279.1.patch
>
>
> We need to make sure HS2 session temporary files are removed after session 
> ends.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7575) GetTables thrift call is very slow

2015-10-28 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978428#comment-14978428
 ] 

Aihua Xu commented on HIVE-7575:


Thanks @navis to add the coverage. Sorry, I didn't see the tests already there. 

Would it be better to create a class like TableMeta and return a list of 
TableMeta instead of list of strings? 

To Yongzhi's question: when we have many databases, the performance of the 
original getTables could be bad since we are making at least one trip for each 
database. Is that right?  

> GetTables thrift call is very slow
> --
>
> Key: HIVE-7575
> URL: https://issues.apache.org/jira/browse/HIVE-7575
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Ashu Pachauri
>Assignee: Navis
> Attachments: HIVE-7575.1.patch.txt, HIVE-7575.2.patch.txt, 
> HIVE-7575.3.patch.txt, HIVE-7575.4.patch.txt, HIVE-7575.5.patch.txt
>
>
> The GetTables thrift call takes a long time when the number of table is large.
> With around 5000 tables, the call takes around 80 seconds compared to a "Show 
> Tables" query on the same HiveServer2 instance which takes 3-7 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12284) Merge master to Spark branch 10/28/2015 [Spark Branch]

2015-10-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12284:
---
Attachment: HIVE-12284.1-spark.patch

It's a clean merge. Merged. Attach a dummy patch to verify the test.

> Merge master to Spark branch 10/28/2015 [Spark Branch]
> --
>
> Key: HIVE-12284
> URL: https://issues.apache.org/jira/browse/HIVE-12284
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: spark-branch
>
> Attachments: HIVE-12284.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12256) Move LLAP registry into llap-client module

2015-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978383#comment-14978383
 ] 

Hive QA commented on HIVE-12256:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12769063/HIVE-12256.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9711 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5826/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5826/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5826/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12769063 - PreCommit-HIVE-TRUNK-Build

> Move LLAP registry into llap-client module
> --
>
> Key: HIVE-12256
> URL: https://issues.apache.org/jira/browse/HIVE-12256
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 2.0.0
>
> Attachments: HIVE-12256.1.patch, HIVE-12256.2.patch, HIVE-12256.2.txt
>
>
> The registry may need to be accessed by the client to figure out the 
> available nodes. (ql module needs access)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-10-28 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978296#comment-14978296
 ] 

Xuefu Zhang commented on HIVE-12229:


Hi [~lirui], thanks for fixing this. Two minor questions:

1. detecting Spark local mode, instead of equals() in {{ 
sparkConf.get("spark.master").equals("local")}}, should we use startWith(), to 
cover cases such as local[2] as well as local-cluster?

2. If user adds a file which exists remote, should we overwrite instead of 
throwing an exception? Maybe the user just wants to replace the file added 
previously.

What's your thought?

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.2-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> 

[jira] [Commented] (HIVE-11092) First delta of an ORC ACID table contains non-descriptive schema

2015-10-28 Thread Elliot West (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978476#comment-14978476
 ] 

Elliot West commented on HIVE-11092:


Fixed by HIVE-4243 apparently. I'll confirm and close.

> First delta of an ORC ACID table contains non-descriptive schema
> 
>
> Key: HIVE-11092
> URL: https://issues.apache.org/jira/browse/HIVE-11092
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Elliot West
>Assignee: Elliot West
>Priority: Minor
>  Labels: orc, orcfile, transaction, transactions
>
> I've been reading ORC ACID data that backs transactional tables from a 
> process external to Hive. Initially I tried to use 'schema on read' but found 
> some inconsistencies in the schema returned from the initial delta file and 
> subsequent delta and base files. To reproduce the issue by example:
> {code}
> CREATE TABLE base_table ( id int, message string )
>   PARTITIONED BY ( continent string, country string )
>   CLUSTERED BY (id) INTO 1 BUCKETS
>   STORED AS ORC
>   TBLPROPERTIES ('transactional' = 'true');
>   
> INSERT INTO TABLE base_table PARTITION (continent = 'Asia', country = 'India')
> VALUES (1, 'x'), (2, 'y'), (3, 'z');
> UPDATE base_table SET message = 'updated' WHERE id = 1;
> {code}
> Now examining the raw data with the {{orcfiledump}} utility (edited for 
> brevity):
> {code}
> cd hive/warehouse/base_table/continent=Asia/country=India/
> hive --orcfiledump delta_001_001/bucket_0
> Type: 
> struct>
> 
> 
> hive --orcfiledump delta_002_002/bucket_0
> Type: 
> struct>
> 
> {code}
> The row schema for the first delta that resulted from the inserts has its 
> field names erased: {{row:struct<_col0:int,_col1:string>}}, whereas the delta 
> for the update reports the correct schema: 
> {{row:struct}}. I have also checked this with my own 
> reader code so am confident that {{FileDump}} is not at fault.
> I believe that the row field names, and hence schema, should be consistent 
> across all ORC files in the ACID data set. This will enable schema on read 
> with field access by name (not index), which is currently not possible. 
> Therefore I'd like to get this issue resolved.
> I'm happy to work on this, however after working through {{OrcRecordUpdater}} 
> and {{FileSinkOperator}} and related tests I've failed to reproduce or 
> isolate the issue at a smaller scale. I'd be grateful for some suggestions on 
> where to look next.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12272) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : columnPruner prunes everything when union is the last operator before FS

2015-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978487#comment-14978487
 ] 

Hive QA commented on HIVE-12272:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12769069/HIVE-12272.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1612 failed/errored test(s), 9698 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_project
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_exist
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alias_casted_column
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_char1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_index
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_update_status
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_update_status
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_varchar1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_view_as_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguitycheck
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_table_null_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_tbl_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_deep_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join_pkfk
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_excludeHadoop20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_multi
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_create_temp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_parts
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join19
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join21
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join25
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join26
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join28
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29

[jira] [Commented] (HIVE-12160) Hbase table query execution fails in secured cluster when hive.exec.mode.local.auto is set to true

2015-10-28 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978498#comment-14978498
 ] 

Aihua Xu commented on HIVE-12160:
-

[~prasadm] could you please help review the patch? 

> Hbase table query execution fails in secured cluster when 
> hive.exec.mode.local.auto is set to true
> --
>
> Key: HIVE-12160
> URL: https://issues.apache.org/jira/browse/HIVE-12160
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.1.0, 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-12160.patch, HIVE-12160_trace.txt
>
>
> In a secured cluster with kerberos, a simple query like {{select count(*) 
> from hbase_table;}} will fail with the following exception when 
> hive.exec.mode.local.auto is set to true.
> {noformat}
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 134 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=134)
> {noformat}
> There is another scenario which may be caused by the same reason.
> Set hive.auto.convert.join to true, the join query {{select * from hbase_t1 
> join hbase_t2 on hbase_t1.id = hbase_t2.id;}} also fails with the following 
> exception:
> {noformat}
> Error while processing statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask (state=08S01,code=2)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11985) don't store type names in metastore when metastore type names are not used

2015-10-28 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978538#comment-14978538
 ] 

Xuefu Zhang commented on HIVE-11985:


Shouldn't we wait for HIVE-12274, which would solve root cause of the problem 
as well for this?

> don't store type names in metastore when metastore type names are not used
> --
>
> Key: HIVE-11985
> URL: https://issues.apache.org/jira/browse/HIVE-11985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11985.01.patch, HIVE-11985.02.patch, 
> HIVE-11985.03.patch, HIVE-11985.05.patch, HIVE-11985.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12284) Merge master to Spark branch 10/28/2015 [Spark Branch]

2015-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978460#comment-14978460
 ] 

Hive QA commented on HIVE-12284:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12769267/HIVE-12284.1-spark.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 9687 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_inner_join
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join0
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_vector_outer_join5
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/984/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/984/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-984/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12769267 - PreCommit-HIVE-SPARK-Build

> Merge master to Spark branch 10/28/2015 [Spark Branch]
> --
>
> Key: HIVE-12284
> URL: https://issues.apache.org/jira/browse/HIVE-12284
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: spark-branch
>
> Attachments: HIVE-12284.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9600) add missing classes to hive-jdbc-standalone.jar

2015-10-28 Thread Chen Xin Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977922#comment-14977922
 ] 

Chen Xin Yu commented on HIVE-9600:
---

Thanks Ashutosh Chauhan!
I looked into HIVE-9599, did more testing bases on Vaibhav Gumashta's 
suggestion and add comments there.
My patch here is a part of the patch in HIVE-9599, and with the fix in 
HIVE-9599, issue described in this JIRA would be solved.
Could you please also help review HIVE-9599? Thanks!

> add missing classes to hive-jdbc-standalone.jar
> ---
>
> Key: HIVE-9600
> URL: https://issues.apache.org/jira/browse/HIVE-9600
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 1.2.1
>Reporter: Alexander Pivovarov
>Assignee: Chen Xin Yu
> Attachments: HIVE-9600.1.patch
>
>
> hive-jdbc-standalone.jar does not have hadoop Configuration and maybe other 
> hadoop-common classes required to open jdbc connection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12061) add file type support to file metadata by expr call

2015-10-28 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977848#comment-14977848
 ] 

Lefty Leverenz commented on HIVE-12061:
---

Does this need any documentation in the wiki?  I'm guessing not, but want to be 
sure.

> add file type support to file metadata by expr call
> ---
>
> Key: HIVE-12061
> URL: https://issues.apache.org/jira/browse/HIVE-12061
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.0.0
>
> Attachments: HIVE-12061.01.nogen.patch, HIVE-12061.01.patch, 
> HIVE-12061.02.patch, HIVE-12061.03.nogen.patch, HIVE-12061.03.patch, 
> HIVE-12061.04.patch, HIVE-12061.nogen.patch, HIVE-12061.patch
>
>
> Expr filtering, automatic caching, etc. should be aware of file types for 
> advanced features. For now only ORC is supported, but I want to add a 
> boundary between ORC-specific and general metastore code, that could later be 
> used for other formats if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11702) GetSchemas thrift call is slow on scale of 1000+ databases

2015-10-28 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977861#comment-14977861
 ] 

Navis commented on HIVE-11702:
--

[~erickt] Added short path for getSchemas(null) in recent patch of HIVE-7575. 
Didn't confirmed the effect.

> GetSchemas thrift call is slow on scale of 1000+ databases
> --
>
> Key: HIVE-11702
> URL: https://issues.apache.org/jira/browse/HIVE-11702
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.1
>Reporter: Jenny Kim
> Attachments: HIVE-11702.1.patch.txt
>
>
> Similar to https://issues.apache.org/jira/browse/HIVE-7575 GetSchemas also 
> starts to degrade in latency starting at the order of 1000+ databases, which 
> returned in about 30 seconds.
> However, SHOW DATABASES on the same Hive instance returns within a few 
> seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7575) GetTables thrift call is very slow

2015-10-28 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7575:

Attachment: HIVE-7575.5.patch.txt

Added short path for getSchemas(null). see HIVE-11702

> GetTables thrift call is very slow
> --
>
> Key: HIVE-7575
> URL: https://issues.apache.org/jira/browse/HIVE-7575
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Ashu Pachauri
>Assignee: Navis
> Attachments: HIVE-7575.1.patch.txt, HIVE-7575.2.patch.txt, 
> HIVE-7575.3.patch.txt, HIVE-7575.4.patch.txt, HIVE-7575.5.patch.txt
>
>
> The GetTables thrift call takes a long time when the number of table is large.
> With around 5000 tables, the call takes around 80 seconds compared to a "Show 
> Tables" query on the same HiveServer2 instance which takes 3-7 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11756) Avoid redundant key serialization in RS for distinct query

2015-10-28 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977866#comment-14977866
 ] 

Navis commented on HIVE-11756:
--

cannot reproduce fail of index_bitmap_auto. others seemed not related.

> Avoid redundant key serialization in RS for distinct query
> --
>
> Key: HIVE-11756
> URL: https://issues.apache.org/jira/browse/HIVE-11756
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-11756.1.patch.txt, HIVE-11756.2.patch.txt, 
> HIVE-11756.3.patch.txt, HIVE-11756.4.patch.txt
>
>
> Currently hive serializes twice to know the length of distribution key for 
> distinct queries. This introduces IndexedSerializer to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9013) Hive set command exposes metastore db password

2015-10-28 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977869#comment-14977869
 ] 

Lefty Leverenz commented on HIVE-9013:
--

Here are the new doc links:

* [Configuration Properties -- Restricted/Hidden List and Whitelist | 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27842758#ConfigurationProperties-Restricted/HiddenListandWhitelist]
* [Configuration Properties -- hive.conf.hidden.list | 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27842758#ConfigurationProperties-hive.conf.hidden.list]

> Hive set command exposes metastore db password
> --
>
> Key: HIVE-9013
> URL: https://issues.apache.org/jira/browse/HIVE-9013
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>  Labels: TODOC1.2, TODOC1.3
> Fix For: 1.3.0, 2.0.0, 1.2.2
>
> Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch, 
> HIVE-9013.4.patch, HIVE-9013.5.patch, HIVE-9013.5.patch, 
> HIVE-9013.5.patch-branch1, HIVE-9013.5.patch-branch1.2
>
>
> When auth is enabled, we still need set command to set some variables(e.g. 
> mapreduce.job.queuename), but set command alone also list all 
> information(including vars in restrict list), this exposes like 
> "javax.jdo.option.ConnectionPassword"
> I think conf var in the restrict list should also excluded from dump vars 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11881) Supporting HPL/SQL Packages

2015-10-28 Thread Dmitry Tolpeko (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977887#comment-14977887
 ] 

Dmitry Tolpeko commented on HIVE-11881:
---

Failed tests are not related to changes I introduced.

> Supporting HPL/SQL Packages
> ---
>
> Key: HIVE-11881
> URL: https://issues.apache.org/jira/browse/HIVE-11881
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Reporter: Dmitry Tolpeko
>Assignee: Dmitry Tolpeko
> Attachments: HIVE-11881.1.patch
>
>
> HPL/SQL should support packages similar to Oracle PL/SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12282) beeline - update command printing in verbose mode

2015-10-28 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977840#comment-14977840
 ] 

Lefty Leverenz commented on HIVE-12282:
---

Quite right, [~thejas], two instances of "passwd striped" should be "passwd 
stripped" -- if you ever get tired of coding, there's a bright future for you 
in tech writing.  (wink)

> beeline - update command printing in verbose mode
> -
>
> Key: HIVE-12282
> URL: https://issues.apache.org/jira/browse/HIVE-12282
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 2.0.0
>
> Attachments: HIVE-12282.1.patch
>
>
> In verbose mode, beeline prints the password used in commandline to STDERR. 
> This is not a good security practice. 
> Issue is in BeeLine.java code -
> {code}
> if (url != null) {
>   String com = "!connect "
>   + url + " "
>   + (user == null || user.length() == 0 ? "''" : user) + " "
>   + (pass == null || pass.length() == 0 ? "''" : pass) + " "
>   + (driver == null ? "" : driver);
>   debug("issuing: " + com);
>   dispatch(com);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7575) GetTables thrift call is very slow

2015-10-28 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7575:

Attachment: HIVE-7575.4.patch.txt

> GetTables thrift call is very slow
> --
>
> Key: HIVE-7575
> URL: https://issues.apache.org/jira/browse/HIVE-7575
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Ashu Pachauri
>Assignee: Navis
> Attachments: HIVE-7575.1.patch.txt, HIVE-7575.2.patch.txt, 
> HIVE-7575.3.patch.txt, HIVE-7575.4.patch.txt
>
>
> The GetTables thrift call takes a long time when the number of table is large.
> With around 5000 tables, the call takes around 80 seconds compared to a "Show 
> Tables" query on the same HiveServer2 instance which takes 3-7 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7575) GetTables thrift call is very slow

2015-10-28 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7575:

Attachment: (was: HIVE-7575.4.patch.txt)

> GetTables thrift call is very slow
> --
>
> Key: HIVE-7575
> URL: https://issues.apache.org/jira/browse/HIVE-7575
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Ashu Pachauri
>Assignee: Navis
> Attachments: HIVE-7575.1.patch.txt, HIVE-7575.2.patch.txt, 
> HIVE-7575.3.patch.txt, HIVE-7575.4.patch.txt
>
>
> The GetTables thrift call takes a long time when the number of table is large.
> With around 5000 tables, the call takes around 80 seconds compared to a "Show 
> Tables" query on the same HiveServer2 instance which takes 3-7 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11881) Supporting HPL/SQL Packages

2015-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977838#comment-14977838
 ] 

Hive QA commented on HIVE-11881:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12768942/HIVE-11881.1.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9720 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5822/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5822/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5822/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12768942 - PreCommit-HIVE-TRUNK-Build

> Supporting HPL/SQL Packages
> ---
>
> Key: HIVE-11881
> URL: https://issues.apache.org/jira/browse/HIVE-11881
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Reporter: Dmitry Tolpeko
>Assignee: Dmitry Tolpeko
> Attachments: HIVE-11881.1.patch
>
>
> HPL/SQL should support packages similar to Oracle PL/SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9013) Hive set command exposes metastore db password

2015-10-28 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977865#comment-14977865
 ] 

Lefty Leverenz commented on HIVE-9013:
--

The doc looks good, thanks [~decster]!

> Hive set command exposes metastore db password
> --
>
> Key: HIVE-9013
> URL: https://issues.apache.org/jira/browse/HIVE-9013
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Binglin Chang
>Assignee: Binglin Chang
>  Labels: TODOC1.2, TODOC1.3
> Fix For: 1.3.0, 2.0.0, 1.2.2
>
> Attachments: HIVE-9013.1.patch, HIVE-9013.2.patch, HIVE-9013.3.patch, 
> HIVE-9013.4.patch, HIVE-9013.5.patch, HIVE-9013.5.patch, 
> HIVE-9013.5.patch-branch1, HIVE-9013.5.patch-branch1.2
>
>
> When auth is enabled, we still need set command to set some variables(e.g. 
> mapreduce.job.queuename), but set command alone also list all 
> information(including vars in restrict list), this exposes like 
> "javax.jdo.option.ConnectionPassword"
> I think conf var in the restrict list should also excluded from dump vars 
> command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12283) Fix test failures after HIVE-11844 [Spark Branch]

2015-10-28 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li reassigned HIVE-12283:
-

Assignee: Rui Li

> Fix test failures after HIVE-11844 [Spark Branch]
> -
>
> Key: HIVE-12283
> URL: https://issues.apache.org/jira/browse/HIVE-12283
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12283) Fix test failures after HIVE-11844 [Spark Branch]

2015-10-28 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-12283:
--
Attachment: HIVE-12283.1-spark.patch

Fix {{vector_inner_join}} and {{vector_outer_join2}}.

> Fix test failures after HIVE-11844 [Spark Branch]
> -
>
> Key: HIVE-12283
> URL: https://issues.apache.org/jira/browse/HIVE-12283
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-12283.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12256) Move LLAP registry into llap-client module

2015-10-28 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977906#comment-14977906
 ] 

Gopal V commented on HIVE-12256:


The change LGTM - +1.

This brings an interesting question - is it time to rename llap-daemon-site to 
llap-site.xml, since it is now llap-client code?

> Move LLAP registry into llap-client module
> --
>
> Key: HIVE-12256
> URL: https://issues.apache.org/jira/browse/HIVE-12256
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 2.0.0
>
> Attachments: HIVE-12256.1.patch, HIVE-12256.2.patch, HIVE-12256.2.txt
>
>
> The registry may need to be accessed by the client to figure out the 
> available nodes. (ql module needs access)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12281) Vectorized MapJoin - use Operator::isLogDebugEnabled

2015-10-28 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-12281:
---
Attachment: HIVE-12281.1.patch

> Vectorized MapJoin - use Operator::isLogDebugEnabled
> 
>
> Key: HIVE-12281
> URL: https://issues.apache.org/jira/browse/HIVE-12281
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-12281.1.patch, vector-map-logging.png
>
>
> !vector-map-logging.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9599) remove derby, datanucleus and other not related to jdbc client classes from hive-jdbc-standalone.jar

2015-10-28 Thread Chen Xin Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977916#comment-14977916
 ] 

Chen Xin Yu commented on HIVE-9599:
---

Hi Vaibhav Gumashta,
I tested this patch running HS2 using http transport, it works well.
jdbc connection string:
jdbc:hive2://{host}:{http_port}/default;user={user};password={pwd};ssl=true;sslTrustStore={keystorePath};trustStorePassword={keyStorePassword}?hive.server2.transport.mode=http;hive.server2.thrift.http.path=cliservice


> remove derby, datanucleus and other not related to jdbc client classes from 
> hive-jdbc-standalone.jar
> 
>
> Key: HIVE-9599
> URL: https://issues.apache.org/jira/browse/HIVE-9599
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Minor
> Attachments: HIVE-9599.1.patch, HIVE-9599.2.patch
>
>
> Looks like the following packages (included to hive-jdbc-standalone.jar) are 
> not used when jdbc client opens jdbc connection and runs queries:
> {code}
> antlr/
> antlr/actions/cpp/
> antlr/actions/csharp/
> antlr/actions/java/
> antlr/actions/python/
> antlr/ASdebug/
> antlr/build/
> antlr/collections/
> antlr/collections/impl/
> antlr/debug/
> antlr/debug/misc/
> antlr/preprocessor/
> com/google/gson/
> com/google/gson/annotations/
> com/google/gson/internal/
> com/google/gson/internal/bind/
> com/google/gson/reflect/
> com/google/gson/stream/
> com/google/inject/
> com/google/inject/binder/
> com/google/inject/internal/
> com/google/inject/internal/asm/
> com/google/inject/internal/cglib/core/
> com/google/inject/internal/cglib/proxy/
> com/google/inject/internal/cglib/reflect/
> com/google/inject/internal/util/
> com/google/inject/matcher/
> com/google/inject/name/
> com/google/inject/servlet/
> com/google/inject/spi/
> com/google/inject/util/
> com/jamesmurty/utils/
> com/jcraft/jsch/
> com/jcraft/jsch/jce/
> com/jcraft/jsch/jcraft/
> com/jcraft/jsch/jgss/
> com/jolbox/bonecp/
> com/jolbox/bonecp/hooks/
> com/jolbox/bonecp/proxy/
> com/sun/activation/registries/
> com/sun/activation/viewers/
> com/sun/istack/
> com/sun/istack/localization/
> com/sun/istack/logging/
> com/sun/mail/handlers/
> com/sun/mail/iap/
> com/sun/mail/imap/
> com/sun/mail/imap/protocol/
> com/sun/mail/mbox/
> com/sun/mail/pop3/
> com/sun/mail/smtp/
> com/sun/mail/util/
> com/sun/xml/bind/
> com/sun/xml/bind/annotation/
> com/sun/xml/bind/api/
> com/sun/xml/bind/api/impl/
> com/sun/xml/bind/marshaller/
> com/sun/xml/bind/unmarshaller/
> com/sun/xml/bind/util/
> com/sun/xml/bind/v2/
> com/sun/xml/bind/v2/bytecode/
> com/sun/xml/bind/v2/model/annotation/
> com/sun/xml/bind/v2/model/core/
> com/sun/xml/bind/v2/model/impl/
> com/sun/xml/bind/v2/model/nav/
> com/sun/xml/bind/v2/model/runtime/
> com/sun/xml/bind/v2/runtime/
> com/sun/xml/bind/v2/runtime/output/
> com/sun/xml/bind/v2/runtime/property/
> com/sun/xml/bind/v2/runtime/reflect/
> com/sun/xml/bind/v2/runtime/reflect/opt/
> com/sun/xml/bind/v2/runtime/unmarshaller/
> com/sun/xml/bind/v2/schemagen/
> com/sun/xml/bind/v2/schemagen/episode/
> com/sun/xml/bind/v2/schemagen/xmlschema/
> com/sun/xml/bind/v2/util/
> com/sun/xml/txw2/
> com/sun/xml/txw2/annotation/
> com/sun/xml/txw2/output/
> com/thoughtworks/paranamer/
> contribs/mx/
> javax/activation/
> javax/annotation/
> javax/annotation/concurrent/
> javax/annotation/meta/
> javax/annotation/security/
> javax/el/
> javax/inject/
> javax/jdo/
> javax/jdo/annotations/
> javax/jdo/datastore/
> javax/jdo/identity/
> javax/jdo/listener/
> javax/jdo/metadata/
> javax/jdo/spi/
> javax/mail/
> javax/mail/event/
> javax/mail/internet/
> javax/mail/search/
> javax/mail/util/
> javax/security/auth/message/
> javax/security/auth/message/callback/
> javax/security/auth/message/config/
> javax/security/auth/message/module/
> javax/servlet/
> javax/servlet/http/
> javax/servlet/jsp/
> javax/servlet/jsp/el/
> javax/servlet/jsp/tagext/
> javax/transaction/
> javax/transaction/xa/
> javax/xml/bind/
> javax/xml/bind/annotation/
> javax/xml/bind/annotation/adapters/
> javax/xml/bind/attachment/
> javax/xml/bind/helpers/
> javax/xml/bind/util/
> javax/xml/stream/
> javax/xml/stream/events/
> javax/xml/stream/util/
> jline/
> jline/console/
> jline/console/completer/
> jline/console/history/
> jline/console/internal/
> jline/internal/
> net/iharder/base64/
> org/aopalliance/aop/
> org/aopalliance/intercept/
> org/apache/commons/beanutils/
> org/apache/commons/beanutils/converters/
> org/apache/commons/beanutils/expression/
> org/apache/commons/beanutils/locale/
> org/apache/commons/beanutils/locale/converters/
> org/apache/commons/cli/
> org/apache/commons/codec/
> org/apache/commons/codec/binary/
> 

[jira] [Commented] (HIVE-12281) Vectorized MapJoin - use Operator::isLogDebugEnabled

2015-10-28 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977912#comment-14977912
 ] 

Matt McCline commented on HIVE-12281:
-

+1 LGTM

> Vectorized MapJoin - use Operator::isLogDebugEnabled
> 
>
> Key: HIVE-12281
> URL: https://issues.apache.org/jira/browse/HIVE-12281
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-12281.1.patch, vector-map-logging.png
>
>
> !vector-map-logging.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9600) add missing classes to hive-jdbc-standalone.jar

2015-10-28 Thread Dan Marshall (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977928#comment-14977928
 ] 

Dan Marshall commented on HIVE-9600:


Thank you for your message. I am not working today and will review your message 
when I return.



> add missing classes to hive-jdbc-standalone.jar
> ---
>
> Key: HIVE-9600
> URL: https://issues.apache.org/jira/browse/HIVE-9600
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 1.2.1
>Reporter: Alexander Pivovarov
>Assignee: Chen Xin Yu
> Attachments: HIVE-9600.1.patch
>
>
> hive-jdbc-standalone.jar does not have hadoop Configuration and maybe other 
> hadoop-common classes required to open jdbc connection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12158) Add methods to HCatClient for partition synchronization

2015-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977930#comment-14977930
 ] 

Hive QA commented on HIVE-12158:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12768980/HIVE-12158.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9716 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5823/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5823/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5823/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12768980 - PreCommit-HIVE-TRUNK-Build

> Add methods to HCatClient for partition synchronization
> ---
>
> Key: HIVE-12158
> URL: https://issues.apache.org/jira/browse/HIVE-12158
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 2.0.0
>Reporter: David Maughan
>Assignee: David Maughan
>Priority: Minor
>  Labels: hcatalog
> Attachments: HIVE-12158.1.patch
>
>
> We have a use case where we have a list of partitions that are created as a 
> result of a batch job (new or updated) outside of Hive and would like to 
> synchronize them with the Hive MetaStore. We would like to use the HCatalog 
> {{HCatClient}} but it currently does not seem to support this. However it is 
> possible with the {{HiveMetaStoreClient}} directly. I am proposing to add the 
> following method to {{HCatClient}} and {{HCatClientHMSImpl}}:
> A method for altering partitions. The implementation would delegate to 
> {{HiveMetaStoreClient#alter_partitions}}. I've used "update" instead of 
> "alter" in the name so it's consistent with the 
> {{HCatClient#updateTableSchema}} method.
> {code}
> public void updatePartitions(List partitions) throws 
> HCatException
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12259) Command containing semicolon is broken in Beeline

2015-10-28 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977937#comment-14977937
 ] 

Lefty Leverenz commented on HIVE-12259:
---

Does this bug fix need to be documented in the wiki?

* [HiveServer2 Clients -- Beeline Commands | 
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-BeelineCommands]

> Command containing semicolon is broken in Beeline
> -
>
> Key: HIVE-12259
> URL: https://issues.apache.org/jira/browse/HIVE-12259
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-12259.patch
>
>
> The Beeline command (!cmd) containing semicolon is broken. 
> For example:
> !connect jdbc:hive2://localhost:10001/default;principal=hive/xyz@realm.com
> is broken because the included ";" makes it not to run with 
> execCommandWithPrefix as a whole command.
> {code}
>   if (line.startsWith(COMMAND_PREFIX) && !line.contains(";")) {
> // handle the case "!cmd" for beeline
> return execCommandWithPrefix(line);
>   } else {
> return commands.sql(line, getOpts().getEntireLineAsCommand());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12285) Add locking to HCatClient

2015-10-28 Thread Elliot West (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978698#comment-14978698
 ] 

Elliot West commented on HIVE-12285:


Removed {{showLocks}} requirement as it is not an essential component for 
implementing a client that can participate with Hive locks.

> Add locking to HCatClient
> -
>
> Key: HIVE-12285
> URL: https://issues.apache.org/jira/browse/HIVE-12285
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 2.0.0
>Reporter: Elliot West
>Assignee: Elliot West
>  Labels: concurrency, hcatalog, lock, locking, locks
>
> With the introduction of a concurrency model (HIVE-1293) Hive uses locks to 
> coordinate  access and updates to both table data and metadata. Within the 
> Hive CLI such lock management is seamless. However, Hive provides additional 
> APIs that permit interaction with data repositories, namely the HCatalog 
> APIs. Currently, operations implemented by this API do not participate with 
> Hive's locking scheme. Furthermore, access to the locking mechanisms is not 
> exposed by the APIs (as is the case with the Metastore Thrift API) and so 
> users are not able to explicitly interact with locks either. This has created 
> a less than ideal situation where users of the APIs have no choice but to 
> manipulate these data repositories outside of the command of Hive's lock 
> management, potentially resulting in situations where data inconsistencies 
> can occur both for external processes using the API and for queries executing 
> within Hive.
> h3. Scope of work
> This ticket is concerned with sections of the HCatalog API that deal with DDL 
> type operations using the metastore, not with those whose purpose is to 
> read/write table data. A separate issue already exists for adding locking to 
> HCat readers and writers (HIVE-6207).
> h3. Proposed work
> The following work items would serve as a minimum deliverable that would both 
> allow API users to effectively work with locks:
> * Comprehensively document on the wiki the locks required for various Hive 
> operations. At a minimum this should cover all operations exposed by 
> {{HCatClient}}. The [Locking design 
> document|https://cwiki.apache.org/confluence/display/Hive/Locking] can be 
> used as a starting point or perhaps updated.
> * Implement methods and types in the {{HCatClient}} API that allow users to 
> manipulate Hive locks. For the most part I'd expect these to delegate to the 
> metastore API implementations:
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.lock(LockRequest)}}
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.checkLock(long)}}
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.unlock(long)}}
> ** -{{org.apache.hadoop.hive.metastore.IMetaStoreClient.showLocks()}}-
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.heartbeat(long, long)}}
> ** {{org.apache.hadoop.hive.metastore.api.LockComponent}}
> ** {{org.apache.hadoop.hive.metastore.api.LockRequest}}
> ** {{org.apache.hadoop.hive.metastore.api.LockResponse}}
> ** {{org.apache.hadoop.hive.metastore.api.LockLevel}}
> ** {{org.apache.hadoop.hive.metastore.api.LockType}}
> ** {{org.apache.hadoop.hive.metastore.api.LockState}}
> ** -{{org.apache.hadoop.hive.metastore.api.ShowLocksResponse}}-
> h3. Additional proposals
> Explicit lock management should be fairly simple to add to {{HCatClient}}, 
> however it puts the onus on the API user to correctly understand and 
> implement code that uses lock in an appropriate manner. Failure to do so may 
> have undesirable consequences. With a simpler user model the operations 
> exposed on the API would automatically acquire and release the locks that 
> they need. This might work well for small numbers of operations, but not 
> perhaps for large sequences of invocations. (Do we need to worry about this 
> though as the API methods usually accept batches?).  Additionally tasks such 
> as heartbeat management could also be handled implicitly for long running 
> sets of operations. With these concerns in mind it may also be beneficial to 
> deliver some of the following:
> * A means to automatically acquire/release appropriate locks for 
> {{HCatClient}} operations.
> * A component that maintains a lock heartbeat from the client.
> * A strategy for switching between manual/automatic lock management, 
> analogous to SQL's {{autocommit}} for transactions.
> An API for lock and heartbeat management already exists in the HCatalog 
> Mutation API (see: 
> {{org.apache.hive.hcatalog.streaming.mutate.client.lock}}). It will likely 
> make sense to refactor either this code and/or code that uses it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11985) don't store type names in metastore when metastore type names are not used

2015-10-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978697#comment-14978697
 ] 

Ashutosh Chauhan commented on HIVE-11985:
-

Although HIVE-12274 will ameliorate the problem, we will still need checks for 
max length, since on all RDBMS there is a limit on max length for varchar. In 
particular, Oracle where this problem was found length is limited to 4K, where 
we already are at. We can update length check for other DBs once, HIVE-12274 
lands, but check is still needed, so I think this patch makes sense.

> don't store type names in metastore when metastore type names are not used
> --
>
> Key: HIVE-11985
> URL: https://issues.apache.org/jira/browse/HIVE-11985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11985.01.patch, HIVE-11985.02.patch, 
> HIVE-11985.03.patch, HIVE-11985.05.patch, HIVE-11985.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12285) Add locking to HCatClient

2015-10-28 Thread Elliot West (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-12285:
---
Description: 
With the introduction of a concurrency model (HIVE-1293) Hive uses locks to 
coordinate  access and updates to both table data and metadata. Within the Hive 
CLI such lock management is seamless. However, Hive provides additional APIs 
that permit interaction with data repositories, namely the HCatalog APIs. 
Currently, operations implemented by this API do not participate with Hive's 
locking scheme. Furthermore, access to the locking mechanisms is not exposed by 
the APIs (as is the case with the Metastore Thrift API) and so users are not 
able to explicitly interact with locks either. This has created a less than 
ideal situation where users of the APIs have no choice but to manipulate these 
data repositories outside of the command of Hive's lock management, potentially 
resulting in situations where data inconsistencies can occur both for external 
processes using the API and for queries executing within Hive.

h3. Scope of work
This ticket is concerned with sections of the HCatalog API that deal with DDL 
type operations using the metastore, not with those whose purpose is to 
read/write table data. A separate issue already exists for adding locking to 
HCat readers and writers (HIVE-6207).

h3. Proposed work
The following work items would serve as a minimum deliverable that would both 
allow API users to effectively work with locks:
* Comprehensively document on the wiki the locks required for various Hive 
operations. At a minimum this should cover all operations exposed by 
{{HCatClient}}. The [Locking design 
document|https://cwiki.apache.org/confluence/display/Hive/Locking] can be used 
as a starting point or perhaps updated.
* Implement methods and types in the {{HCatClient}} API that allow users to 
manipulate Hive locks. For the most part I'd expect these to delegate to the 
metastore API implementations:
** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.lock(LockRequest)}}
** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.checkLock(long)}}
** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.unlock(long)}}
** -{{org.apache.hadoop.hive.metastore.IMetaStoreClient.showLocks()}}-
** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.heartbeat(long, long)}}
** {{org.apache.hadoop.hive.metastore.api.LockComponent}}
** {{org.apache.hadoop.hive.metastore.api.LockRequest}}
** {{org.apache.hadoop.hive.metastore.api.LockResponse}}
** {{org.apache.hadoop.hive.metastore.api.LockLevel}}
** {{org.apache.hadoop.hive.metastore.api.LockType}}
** {{org.apache.hadoop.hive.metastore.api.LockState}}
** -{{org.apache.hadoop.hive.metastore.api.ShowLocksResponse}}-

h3. Additional proposals
Explicit lock management should be fairly simple to add to {{HCatClient}}, 
however it puts the onus on the API user to correctly understand and implement 
code that uses lock in an appropriate manner. Failure to do so may have 
undesirable consequences. With a simpler user model the operations exposed on 
the API would automatically acquire and release the locks that they need. This 
might work well for small numbers of operations, but not perhaps for large 
sequences of invocations. (Do we need to worry about this though as the API 
methods usually accept batches?).  Additionally tasks such as heartbeat 
management could also be handled implicitly for long running sets of 
operations. With these concerns in mind it may also be beneficial to deliver 
some of the following:
* A means to automatically acquire/release appropriate locks for {{HCatClient}} 
operations.
* A component that maintains a lock heartbeat from the client.
* A strategy for switching between manual/automatic lock management, analogous 
to SQL's {{autocommit}} for transactions.

An API for lock and heartbeat management already exists in the HCatalog 
Mutation API (see: {{org.apache.hive.hcatalog.streaming.mutate.client.lock}}). 
It will likely make sense to refactor either this code and/or code that uses it.

  was:
With the introduction of a concurrency model (HIVE-1293) Hive uses locks to 
coordinate  access and updates to both table data and metadata. Within the Hive 
CLI such lock management is seamless. However, Hive provides additional APIs 
that permit interaction with data repositories, namely the HCatalog APIs. 
Currently, operations implemented by this API do not participate with Hive's 
locking scheme. Furthermore, access to the locking mechanisms is not exposed by 
the APIs (as is the case with the Metastore Thrift API) and so users are not 
able to explicitly interact with locks either. This has created a less than 
ideal situation where users of the APIs have no choice but to manipulate these 
data repositories outside of the command of Hive's lock management, potentially 
resulting in situations where data inconsistencies can 

[jira] [Commented] (HIVE-11306) Add a bloom-1 filter for Hybrid MapJoin spills

2015-10-28 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978609#comment-14978609
 ] 

Gopal V commented on HIVE-11306:


Committed to master, thanks [~wzheng].

> Add a bloom-1 filter for Hybrid MapJoin spills
> --
>
> Key: HIVE-11306
> URL: https://issues.apache.org/jira/browse/HIVE-11306
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Wei Zheng
> Fix For: 2.0.0
>
> Attachments: HIVE-11306.1.patch, HIVE-11306.2.patch, 
> HIVE-11306.3.patch, HIVE-11306.5.patch, HIVE-11306.6.patch
>
>
> HIVE-9277 implemented Spillable joins for Tez, which suffers from a 
> corner-case performance issue when joining wide small tables against a narrow 
> big table (like a user info table join events stream).
> The fact that the wide table is spilled causes extra IO, even though the nDV 
> of the join key might be in the thousands.
> A cheap bloom-1 filter would add a massive performance gain for such queries, 
> massively cutting down on the spill IO costs for the big-table spills.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11306) Add a bloom-1 filter for Hybrid MapJoin spills

2015-10-28 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11306:
---
Fix Version/s: 2.0.0

> Add a bloom-1 filter for Hybrid MapJoin spills
> --
>
> Key: HIVE-11306
> URL: https://issues.apache.org/jira/browse/HIVE-11306
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Wei Zheng
> Fix For: 2.0.0
>
> Attachments: HIVE-11306.1.patch, HIVE-11306.2.patch, 
> HIVE-11306.3.patch, HIVE-11306.5.patch, HIVE-11306.6.patch
>
>
> HIVE-9277 implemented Spillable joins for Tez, which suffers from a 
> corner-case performance issue when joining wide small tables against a narrow 
> big table (like a user info table join events stream).
> The fact that the wide table is spilled causes extra IO, even though the nDV 
> of the join key might be in the thousands.
> A cheap bloom-1 filter would add a massive performance gain for such queries, 
> massively cutting down on the spill IO costs for the big-table spills.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11564) HBaseSchemaTool should be able to list objects

2015-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978636#comment-14978636
 ] 

Hive QA commented on HIVE-11564:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12769071/HIVE-11564.5.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9730 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5828/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5828/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5828/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12769071 - PreCommit-HIVE-TRUNK-Build

> HBaseSchemaTool should be able to list objects
> --
>
> Key: HIVE-11564
> URL: https://issues.apache.org/jira/browse/HIVE-11564
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Metastore
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 2.0.0
>
> Attachments: HIVE-11564.2.patch, HIVE-11564.3.patch, 
> HIVE-11564.4.patch, HIVE-11564.5.patch, HIVE-11564.patch
>
>
> Current HBaseSchemaTool can only fetch objects the user already knows the 
> name of.  It should also be able to list available objects (e.g. list all 
> databases).  
> It is also very user unfriendly in terms of error handling.  That needs to be 
> fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12279) Testcase to verify session temporary files are removed after HIVE-11768

2015-10-28 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978784#comment-14978784
 ] 

Daniel Dai commented on HIVE-12279:
---

I thought about it initially. However, 
ShutdownHookManager.isRegisteredToDeleteOnExit is not exposed. True I can use 
reflection to do it anyway, but it looks ugly.

> Testcase to verify session temporary files are removed after HIVE-11768
> ---
>
> Key: HIVE-12279
> URL: https://issues.apache.org/jira/browse/HIVE-12279
> Project: Hive
>  Issue Type: Test
>  Components: HiveServer2, Test
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 2.0.0
>
> Attachments: HIVE-12279.1.patch
>
>
> We need to make sure HS2 session temporary files are removed after session 
> ends.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12256) Move LLAP registry into llap-client module

2015-10-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978903#comment-14978903
 ] 

Sergey Shelukhin commented on HIVE-12256:
-

Hmm. Wouldn't llap clients put client configs in their own configs? For now 
it's hive, so client configs should go to hive-site, like client-side configs 
for metastore, etc.

> Move LLAP registry into llap-client module
> --
>
> Key: HIVE-12256
> URL: https://issues.apache.org/jira/browse/HIVE-12256
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 2.0.0
>
> Attachments: HIVE-12256.1.patch, HIVE-12256.2.patch, HIVE-12256.2.txt
>
>
> The registry may need to be accessed by the client to figure out the 
> available nodes. (ql module needs access)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12278) Skip logging lineage for explain queries

2015-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978805#comment-14978805
 ] 

Hive QA commented on HIVE-12278:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12769075/HIVE-12278.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9711 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_shutdown
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5829/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5829/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5829/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12769075 - PreCommit-HIVE-TRUNK-Build

> Skip logging lineage for explain queries
> 
>
> Key: HIVE-12278
> URL: https://issues.apache.org/jira/browse/HIVE-12278
> Project: Hive
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12278.1.patch
>
>
> For explain queries, we don't generate the lineage info. So we should not try 
> to log it at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12281) Vectorized MapJoin - use Operator::isLogDebugEnabled

2015-10-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978915#comment-14978915
 ] 

Ashutosh Chauhan commented on HIVE-12281:
-

isLogEnabled() is an anti-pattern. Its actually slower if logging is enabled. 
Now that we have switched to slf4j, we can make use of parameterized messages. 
See: http://www.slf4j.org/faq.html#logging_performance

> Vectorized MapJoin - use Operator::isLogDebugEnabled
> 
>
> Key: HIVE-12281
> URL: https://issues.apache.org/jira/browse/HIVE-12281
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-12281.1.patch, vector-map-logging.png
>
>
> !vector-map-logging.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-12281) Vectorized MapJoin - use Operator::isLogDebugEnabled

2015-10-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978927#comment-14978927
 ] 

Sergey Shelukhin edited comment on HIVE-12281 at 10/28/15 6:13 PM:
---

The logging call itself is expensive, at least before log4j2. With log4j2 it's 
supposed to be cheaper but we haven't really tested.
Note that in this case logging is disabled, so profile above is basically just 
for checks, nothing to do with string building


was (Author: sershe):
The logging call itself is expensive, at least before log4j2. With log4j2 it's 
supposed to be cheaper but we haven't really tested

> Vectorized MapJoin - use Operator::isLogDebugEnabled
> 
>
> Key: HIVE-12281
> URL: https://issues.apache.org/jira/browse/HIVE-12281
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-12281.1.patch, vector-map-logging.png
>
>
> !vector-map-logging.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12281) Vectorized MapJoin - use Operator::isLogDebugEnabled

2015-10-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978927#comment-14978927
 ] 

Sergey Shelukhin commented on HIVE-12281:
-

The logging call itself is expensive, at least before log4j2. With log4j2 it's 
supposed to be cheaper but we haven't really tested

> Vectorized MapJoin - use Operator::isLogDebugEnabled
> 
>
> Key: HIVE-12281
> URL: https://issues.apache.org/jira/browse/HIVE-12281
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-12281.1.patch, vector-map-logging.png
>
>
> !vector-map-logging.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12282) beeline - update command printing in verbose mode

2015-10-28 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978869#comment-14978869
 ] 

Thejas M Nair commented on HIVE-12282:
--

+1

> beeline - update command printing in verbose mode
> -
>
> Key: HIVE-12282
> URL: https://issues.apache.org/jira/browse/HIVE-12282
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 2.0.0
>
> Attachments: HIVE-12282.1.patch, HIVE-12282.2.patch
>
>
> In verbose mode, beeline prints the password used in commandline to STDERR. 
> This is not a good security practice. 
> Issue is in BeeLine.java code -
> {code}
> if (url != null) {
>   String com = "!connect "
>   + url + " "
>   + (user == null || user.length() == 0 ? "''" : user) + " "
>   + (pass == null || pass.length() == 0 ? "''" : pass) + " "
>   + (driver == null ? "" : driver);
>   debug("issuing: " + com);
>   dispatch(com);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12281) Vectorized MapJoin - use Operator::isLogDebugEnabled

2015-10-28 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978907#comment-14978907
 ] 

Gopal V commented on HIVE-12281:


There's an HTTP UI interface in LLAP which lets you turn on logging for a 
specific Operator class in LLAP.

It's exactly as fast now, so leave that in place just in case we ever have to 
debug it without restarting?

> Vectorized MapJoin - use Operator::isLogDebugEnabled
> 
>
> Key: HIVE-12281
> URL: https://issues.apache.org/jira/browse/HIVE-12281
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-12281.1.patch, vector-map-logging.png
>
>
> !vector-map-logging.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12235) Improve beeline logging for dynamic service discovery

2015-10-28 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978970#comment-14978970
 ] 

Szehon Ho commented on HIVE-12235:
--

[~vgumashta] do you mind taking another look?

> Improve beeline logging for dynamic service discovery
> -
>
> Key: HIVE-12235
> URL: https://issues.apache.org/jira/browse/HIVE-12235
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-12235.2.patch, HIVE-12235.patch
>
>
> It maybe nice to see which host it tried to, and ended up, connecting to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11356) SMB join on tez fails when one of the tables is empty

2015-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978838#comment-14978838
 ] 

Hive QA commented on HIVE-11356:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12769090/HIVE-11356.6.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5830/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5830/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5830/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-hwi ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-hwi ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-github-source-source/hwi/target/hive-hwi-2.0.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-hwi ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-hwi ---
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/hwi/target/hive-hwi-2.0.0-SNAPSHOT.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-hwi/2.0.0-SNAPSHOT/hive-hwi-2.0.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/hwi/pom.xml to 
/data/hive-ptest/working/maven/org/apache/hive/hive-hwi/2.0.0-SNAPSHOT/hive-hwi-2.0.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive ODBC 2.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-odbc ---
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/odbc/target
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/odbc 
(includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-odbc ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-odbc ---
[WARNING] Invalid project model for artifact 
[pentaho-aggdesigner-algorithm:org.pentaho:5.1.5-jhyde]. It will be ignored by 
the remote resources Mojo.
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-odbc ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-odbc ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/odbc/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/odbc/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/odbc/target/tmp/conf
 [copy] Copying 14 files to 
/data/hive-ptest/working/apache-github-source-source/odbc/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-odbc ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-odbc ---
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/odbc/pom.xml to 
/data/hive-ptest/working/maven/org/apache/hive/hive-odbc/2.0.0-SNAPSHOT/hive-odbc-2.0.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive Shims Aggregator 2.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-shims-aggregator 
---
[INFO] Deleting 
/data/hive-ptest/working/apache-github-source-source/shims/target
[INFO] Deleting /data/hive-ptest/working/apache-github-source-source/shims 
(includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-shims-aggregator ---
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ 
hive-shims-aggregator ---
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ 
hive-shims-aggregator ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ 
hive-shims-aggregator ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/shims/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/shims/target/warehouse
[mkdir] Created dir: 

[jira] [Updated] (HIVE-12282) beeline - update command printing in verbose mode

2015-10-28 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-12282:
--
Attachment: HIVE-12282.2.patch

Thanks for capturing the typo. Updated.

> beeline - update command printing in verbose mode
> -
>
> Key: HIVE-12282
> URL: https://issues.apache.org/jira/browse/HIVE-12282
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 2.0.0
>
> Attachments: HIVE-12282.1.patch, HIVE-12282.2.patch
>
>
> In verbose mode, beeline prints the password used in commandline to STDERR. 
> This is not a good security practice. 
> Issue is in BeeLine.java code -
> {code}
> if (url != null) {
>   String com = "!connect "
>   + url + " "
>   + (user == null || user.length() == 0 ? "''" : user) + " "
>   + (pass == null || pass.length() == 0 ? "''" : pass) + " "
>   + (driver == null ? "" : driver);
>   debug("issuing: " + com);
>   dispatch(com);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12256) Move LLAP registry into llap-client module

2015-10-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978893#comment-14978893
 ] 

Siddharth Seth commented on HIVE-12256:
---

Thanks for the review. Will commit in sometime.

bq. This brings an interesting question - is it time to rename llap-daemon-site 
to llap-site.xml, since it is now llap-client code?
Should we consider separating client / server configs ? This has always been a 
source of confusion in Hadoop.

> Move LLAP registry into llap-client module
> --
>
> Key: HIVE-12256
> URL: https://issues.apache.org/jira/browse/HIVE-12256
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 2.0.0
>
> Attachments: HIVE-12256.1.patch, HIVE-12256.2.patch, HIVE-12256.2.txt
>
>
> The registry may need to be accessed by the client to figure out the 
> available nodes. (ql module needs access)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12284) Merge master to Spark branch 10/28/2015 [Spark Branch]

2015-10-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12284:
---
Attachment: HIVE-12284.2-spark.patch

TestHWISessionManager.testHiveDriver failed on master also. Patch #2 updated 
the test results for Spark.

> Merge master to Spark branch 10/28/2015 [Spark Branch]
> --
>
> Key: HIVE-12284
> URL: https://issues.apache.org/jira/browse/HIVE-12284
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: spark-branch
>
> Attachments: HIVE-12284.1-spark.patch, HIVE-12284.2-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7693) Invalid column ref error in order by when using column alias in select clause and using having

2015-10-28 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-7693:
--
Attachment: HIVE-7693.05.patch

> Invalid column ref error in order by when using column alias in select clause 
> and using having
> --
>
> Key: HIVE-7693
> URL: https://issues.apache.org/jira/browse/HIVE-7693
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Deepesh Khandelwal
>Assignee: Pengcheng Xiong
> Fix For: 2.0.0
>
> Attachments: HIVE-7693.01.patch, HIVE-7693.02.patch, 
> HIVE-7693.03.patch, HIVE-7693.04.patch, HIVE-7693.05.patch
>
>
> Hive CLI session:
> {noformat}
> hive> create table abc(foo int, bar string);
> OK
> Time taken: 0.633 seconds
> hive> select foo as c0, count(*) as c1 from abc group by foo, bar having bar 
> like '%abc%' order by foo;
> FAILED: SemanticException [Error 10004]: Line 1:93 Invalid table alias or 
> column reference 'foo': (possible column names are: c0, c1)
> {noformat}
> Without having clause, the query runs fine, example:
> {code}
> select foo as c0, count(*) as c1 from abc group by foo, bar order by foo;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12230) custom UDF configure() not called in Vectorization mode

2015-10-28 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978782#comment-14978782
 ] 

Jason Dere commented on HIVE-12230:
---

I think this looks good, +1

> custom UDF configure() not called in Vectorization mode
> ---
>
> Key: HIVE-12230
> URL: https://issues.apache.org/jira/browse/HIVE-12230
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12230.01.patch
>
>
> PROBLEM:
> A custom UDF that overrides the configure()
> {code}
> @Override
>   public void configure(MapredContext context) {
>   greeting = "Hello ";
>   }
> {code}
> In vectorization mode, it is not called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12281) Vectorized MapJoin - use Operator::isLogDebugEnabled

2015-10-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978895#comment-14978895
 ] 

Sergey Shelukhin commented on HIVE-12281:
-

Any reason to not have isLogDebugEnabled static, since LOG is static? +1, can 
be fixed on commit if necessary

> Vectorized MapJoin - use Operator::isLogDebugEnabled
> 
>
> Key: HIVE-12281
> URL: https://issues.apache.org/jira/browse/HIVE-12281
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-12281.1.patch, vector-map-logging.png
>
>
> !vector-map-logging.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12061) add file type support to file metadata by expr call

2015-10-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978906#comment-14978906
 ] 

Sergey Shelukhin commented on HIVE-12061:
-

Not really. The parent issue might need docs, this is just a minor change to an 
unreleased API.

> add file type support to file metadata by expr call
> ---
>
> Key: HIVE-12061
> URL: https://issues.apache.org/jira/browse/HIVE-12061
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.0.0
>
> Attachments: HIVE-12061.01.nogen.patch, HIVE-12061.01.patch, 
> HIVE-12061.02.patch, HIVE-12061.03.nogen.patch, HIVE-12061.03.patch, 
> HIVE-12061.04.patch, HIVE-12061.nogen.patch, HIVE-12061.patch
>
>
> Expr filtering, automatic caching, etc. should be aware of file types for 
> advanced features. For now only ORC is supported, but I want to add a 
> boundary between ORC-specific and general metastore code, that could later be 
> used for other formats if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column

2015-10-28 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978923#comment-14978923
 ] 

Xuefu Zhang commented on HIVE-12063:


[~jdere], Would you have time for this? Thanks.

> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: HIVE-12063
> URL: https://issues.apache.org/jira/browse/HIVE-12063
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12063.1.patch, HIVE-12063.2.patch, HIVE-12063.patch
>
>
> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
> which caused many problems including treating 0.0, 0.00 and so on as 0, which 
> has different precision/scale. Please refer to HIVE-7373 description. 
> However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems 
> remained. HIVE-11835 was resolved recently to address one of the problems, 
> where 0.0, 0.00, and so on cannot be read into decimal(1,1).
> However, HIVE-11835 didn't address the problem of showing as 0 in query 
> result for any decimal values such as 0.0, 0.00, etc. This causes confusion 
> as 0 and 0.0 have different precision/scale than 0.
> The proposal here is to pad zeros for query result to the type's scale. This 
> not only removes the confusion described above, but also aligns with many 
> other DBs. Internal decimal number representation doesn't change, however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity

2015-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979041#comment-14979041
 ] 

Hive QA commented on HIVE-7723:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12769094/HIVE-7723.14.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9711 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape2
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5831/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5831/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5831/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12769094 - PreCommit-HIVE-TRUNK-Build

> Explain plan for complex query with lots of partitions is slow due to 
> in-efficient collection used to find a matching ReadEntity
> 
>
> Key: HIVE-7723
> URL: https://issues.apache.org/jira/browse/HIVE-7723
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Physical Optimizer
>Affects Versions: 0.13.1
>Reporter: Mostafa Mokhtar
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-7723.1.patch, HIVE-7723.10.patch, 
> HIVE-7723.11.patch, HIVE-7723.11.patch, HIVE-7723.12.patch, 
> HIVE-7723.13.patch, HIVE-7723.14.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, 
> HIVE-7723.4.patch, HIVE-7723.5.patch, HIVE-7723.6.patch, HIVE-7723.7.patch, 
> HIVE-7723.8.patch, HIVE-7723.9.patch
>
>
> Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it 
> showed that ReadEntity.equals is taking ~40% of the CPU.
> ReadEntity.equals is called from the snippet below.
> Again and again the set is iterated over to get the actual match, a HashMap 
> is a better option for this case as Set doesn't have a Get method.
> Also for ReadEntity equals is case-insensitive while hash is , which is an 
> undesired behavior.
> {code}
> public static ReadEntity addInput(Set inputs, ReadEntity 
> newInput) {
> // If the input is already present, make sure the new parent is added to 
> the input.
> if (inputs.contains(newInput)) {
>   for (ReadEntity input : inputs) {
> if (input.equals(newInput)) {
>   if ((newInput.getParents() != null) && 
> (!newInput.getParents().isEmpty())) {
> input.getParents().addAll(newInput.getParents());
> input.setDirect(input.isDirect() || newInput.isDirect());
>   }
>   return input;
> }
>   }
>   assert false;
> } else {
>   inputs.add(newInput);
>   return newInput;
> }
> // make compile happy
> return null;
>   }
> {code}
> This is the query used : 
> {code}
> select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
> ,cs1.b_streen_name ,cs1.b_city
>  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
> ,cs1.c_zip ,cs1.syear ,cs1.cnt
>  ,cs1.s1 ,cs1.s2 ,cs1.s3
>  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
> from
> (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
> store_name
>  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
> ,ad1.ca_street_name as b_streen_name
>  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
> c_street_number
>  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
> as c_zip
>  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
> as cnt
>  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
> ,sum(ss_coupon_amt) as s3
>   FROM   store_sales
> JOIN store_returns ON store_sales.ss_item_sk = 
> store_returns.sr_item_sk and store_sales.ss_ticket_number = 
> store_returns.sr_ticket_number
> JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
> JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
> JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
> JOIN date_dim d3 ON 

[jira] [Updated] (HIVE-12272) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : columnPruner prunes everything when union is the last operator before FS

2015-10-28 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12272:
---
Attachment: HIVE-12272.02.patch

FS operator should be treated specially.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) : columnPruner 
> prunes everything when union is the last operator before FS
> ---
>
> Key: HIVE-12272
> URL: https://issues.apache.org/jira/browse/HIVE-12272
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12272.01.patch, HIVE-12272.02.patch
>
>
> To repo, run testCliDriver_unionDistinct_2 with return path on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11616) DelegationTokenSecretManager reuse the same objectstore ,which has cocurrent issue

2015-10-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11616:

Attachment: HIVE-11616.02.patch

Fixing the test, very simple fix.

> DelegationTokenSecretManager reuse the same objectstore ,which has cocurrent 
> issue
> --
>
> Key: HIVE-11616
> URL: https://issues.apache.org/jira/browse/HIVE-11616
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1
>Reporter: wangwenli
>Assignee: Cody Fu
> Fix For: 0.12.1
>
> Attachments: HIVE-11616.01.patch, HIVE-11616.02.patch, 
> HIVE-11616.patch
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> sometime in metastore log, will get below exception,  after analysis, we 
> found that :
> when hivemetastore start, the DelegationTokenSecretManager will maintain the 
> same objectstore, see here
> saslServer.startDelegationTokenSecretManager(conf, *baseHandler.getMS()*, 
> ServerMode.METASTORE);
> this lead to the cocurrent issue.
> 2015-08-18 20:59:10,520 | ERROR | pool-6-thread-200 | Error occurred during 
> processing of message. | 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:296)
> org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: 
> org.datanucleus.transaction.NucleusTransactionException: Invalid state. 
> Transaction has already started
>   at 
> org.apache.hadoop.hive.thrift.DBTokenStore.invokeOnRawStore(DBTokenStore.java:154)
>   at 
> org.apache.hadoop.hive.thrift.DBTokenStore.getToken(DBTokenStore.java:88)
>   at 
> org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:112)
>   at 
> org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:56)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.getPassword(HadoopThriftAuthBridge.java:565)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.handle(HadoopThriftAuthBridge.java:596)
>   at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
>   at 
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
>   at 
> org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:539)
>   at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283)
>   at 
> org.apache.thrift.transport.HiveTSaslServerTransport.open(HiveTSaslServerTransport.java:133)
>   at 
> org.apache.thrift.transport.HiveTSaslServerTransport$Factory.getTransport(HiveTSaslServerTransport.java:261)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:739)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:736)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1652)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:736)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.datanucleus.transaction.NucleusTransactionException: Invalid 
> state. Transaction has already started
>   at 
> org.datanucleus.transaction.TransactionManager.begin(TransactionManager.java:47)
>   at org.datanucleus.TransactionImpl.begin(TransactionImpl.java:131)
>   at 
> org.datanucleus.api.jdo.JDOTransaction.internalBegin(JDOTransaction.java:88)
>   at org.datanucleus.api.jdo.JDOTransaction.begin(JDOTransaction.java:80)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.openTransaction(ObjectStore.java:420)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getToken(ObjectStore.java:6455)
>   at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> 

[jira] [Commented] (HIVE-12256) Move LLAP registry into llap-client module

2015-10-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979108#comment-14979108
 ] 

Siddharth Seth commented on HIVE-12256:
---

Going into hive-site should be ok. However, should the parameter names be the 
same for client / server usage ?

> Move LLAP registry into llap-client module
> --
>
> Key: HIVE-12256
> URL: https://issues.apache.org/jira/browse/HIVE-12256
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 2.0.0
>
> Attachments: HIVE-12256.1.patch, HIVE-12256.2.patch, HIVE-12256.2.txt
>
>
> The registry may need to be accessed by the client to figure out the 
> available nodes. (ql module needs access)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12249) Improve logging with tez

2015-10-28 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979111#comment-14979111
 ] 

Vikram Dixit K commented on HIVE-12249:
---

Test failures unrelated.

> Improve logging with tez
> 
>
> Key: HIVE-12249
> URL: https://issues.apache.org/jira/browse/HIVE-12249
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.2.1
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-12249.1.patch, HIVE-12249.10.patch, 
> HIVE-12249.2.patch, HIVE-12249.3.patch, HIVE-12249.4.patch, 
> HIVE-12249.5.patch, HIVE-12249.6.patch, HIVE-12249.7.patch, 
> HIVE-12249.8.patch, HIVE-12249.9.patch
>
>
> We need to improve logging across the board. TEZ-2851 added a caller context 
> so that one can correlate logs with the application. This jira adds a new 
> configuration for users that can be used to correlate the logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column

2015-10-28 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979022#comment-14979022
 ] 

Jason Dere commented on HIVE-12063:
---

Nicely done, the changes look good I think. Looks like these changes are only 
in the output formatting of the decimal values.
I think this Jira should probably be marked as an incompatible change to warn 
users that the display results will be different from before.
[~hagleitn] any other concerns here?

> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: HIVE-12063
> URL: https://issues.apache.org/jira/browse/HIVE-12063
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12063.1.patch, HIVE-12063.2.patch, HIVE-12063.patch
>
>
> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
> which caused many problems including treating 0.0, 0.00 and so on as 0, which 
> has different precision/scale. Please refer to HIVE-7373 description. 
> However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems 
> remained. HIVE-11835 was resolved recently to address one of the problems, 
> where 0.0, 0.00, and so on cannot be read into decimal(1,1).
> However, HIVE-11835 didn't address the problem of showing as 0 in query 
> result for any decimal values such as 0.0, 0.00, etc. This causes confusion 
> as 0 and 0.0 have different precision/scale than 0.
> The proposal here is to pad zeros for query result to the type's scale. This 
> not only removes the confusion described above, but also aligns with many 
> other DBs. Internal decimal number representation doesn't change, however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12284) Merge master to Spark branch 10/28/2015 [Spark Branch]

2015-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979131#comment-14979131
 ] 

Hive QA commented on HIVE-12284:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12769343/HIVE-12284.2-spark.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9687 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_context_ngrams
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/986/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/986/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-986/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12769343 - PreCommit-HIVE-SPARK-Build

> Merge master to Spark branch 10/28/2015 [Spark Branch]
> --
>
> Key: HIVE-12284
> URL: https://issues.apache.org/jira/browse/HIVE-12284
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: spark-branch
>
> Attachments: HIVE-12284.1-spark.patch, HIVE-12284.2-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12061) add file type support to file metadata by expr call

2015-10-28 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979666#comment-14979666
 ] 

Lefty Leverenz commented on HIVE-12061:
---

Thanks Sergey.

> add file type support to file metadata by expr call
> ---
>
> Key: HIVE-12061
> URL: https://issues.apache.org/jira/browse/HIVE-12061
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.0.0
>
> Attachments: HIVE-12061.01.nogen.patch, HIVE-12061.01.patch, 
> HIVE-12061.02.patch, HIVE-12061.03.nogen.patch, HIVE-12061.03.patch, 
> HIVE-12061.04.patch, HIVE-12061.nogen.patch, HIVE-12061.patch
>
>
> Expr filtering, automatic caching, etc. should be aware of file types for 
> advanced features. For now only ORC is supported, but I want to add a 
> boundary between ORC-specific and general metastore code, that could later be 
> used for other formats if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12238) Vectorization: Thread-safety errors in VectorUDFDate

2015-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979678#comment-14979678
 ] 

Hive QA commented on HIVE-12238:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12769184/HIVE-12238.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5837/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5837/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5837/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-5837/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 53fc319 HIVE-12276 Fix messages in InvalidTable (Eugene Koifman, 
reviewed by Jason Dere)
+ git clean -f -d
Removing 
beeline/src/test/org/apache/hive/beeline/TestBeelineArgParsing.java.orig
+ git checkout master
Already on 'master'
+ git reset --hard origin/master
HEAD is now at 53fc319 HIVE-12276 Fix messages in InvalidTable (Eugene Koifman, 
reviewed by Jason Dere)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12769184 - PreCommit-HIVE-TRUNK-Build

> Vectorization: Thread-safety errors in VectorUDFDate
> 
>
> Key: HIVE-12238
> URL: https://issues.apache.org/jira/browse/HIVE-12238
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 1.2.1, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-12238.1.patch
>
>
> {code}
> Caused by: java.lang.NumberFormatException: For input string: ""
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Long.parseLong(Long.java:601)
> at java.lang.Long.parseLong(Long.java:631)
> at java.text.DigitList.getLong(DigitList.java:195)
> at java.text.DecimalFormat.parse(DecimalFormat.java:2051)
> at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:1869)
> at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1514)   
>  at java.text.DateFormat.parse(DateFormat.java:364)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFDateString$1.evaluate(VectorUDFDateString.java:48)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.StringUnaryUDF.evaluate(StringUnaryUDF.java:90)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:121)
> at 
> 

[jira] [Commented] (HIVE-12257) Enhance ORC FileDump utility to handle flush_length files

2015-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979762#comment-14979762
 ] 

Hive QA commented on HIVE-12257:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12768473/HIVE-12257.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9730 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5839/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5839/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5839/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12768473 - PreCommit-HIVE-TRUNK-Build

> Enhance ORC FileDump utility to handle flush_length files
> -
>
> Key: HIVE-12257
> URL: https://issues.apache.org/jira/browse/HIVE-12257
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12257.1.patch
>
>
> ORC file dump utility currently does not handle delta directories that 
> contain *_flush_length files. These files contains offsets to footer in the 
> corresponding delta file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12237) Use slf4j as logging facade

2015-10-28 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-12237:
--
Labels: TODOC2.0  (was: )

> Use slf4j as logging facade
> ---
>
> Key: HIVE-12237
> URL: https://issues.apache.org/jira/browse/HIVE-12237
> Project: Hive
>  Issue Type: Task
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-12237.1.patch, HIVE-12237.2.patch, 
> HIVE-12237.3.patch, HIVE-12237.4.patch, HIVE-12237.5.patch, 
> HIVE-12237.6.patch, HIVE-12237.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12282) beeline - update command printing in verbose mode

2015-10-28 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979675#comment-14979675
 ] 

Lefty Leverenz commented on HIVE-12282:
---

+1 for the typo fix

> beeline - update command printing in verbose mode
> -
>
> Key: HIVE-12282
> URL: https://issues.apache.org/jira/browse/HIVE-12282
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 2.0.0
>
> Attachments: HIVE-12282.1.patch, HIVE-12282.2.patch
>
>
> In verbose mode, beeline prints the password used in commandline to STDERR. 
> This is not a good security practice. 
> Issue is in BeeLine.java code -
> {code}
> if (url != null) {
>   String com = "!connect "
>   + url + " "
>   + (user == null || user.length() == 0 ? "''" : user) + " "
>   + (pass == null || pass.length() == 0 ? "''" : pass) + " "
>   + (driver == null ? "" : driver);
>   debug("issuing: " + com);
>   dispatch(com);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11987) CompactionTxnHandler.createValidCompactTxnList() can use much less memory

2015-10-28 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979693#comment-14979693
 ] 

Lefty Leverenz commented on HIVE-11987:
---

Acronym clarification:  HWM means high water mark.

> CompactionTxnHandler.createValidCompactTxnList() can use much less memory
> -
>
> Key: HIVE-11987
> URL: https://issues.apache.org/jira/browse/HIVE-11987
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
>
> This method only needs HWM and list of txn IDs in 'a' state and smallest 'o' 
> txn id.
> It's currently implemented to get the list from TxnHandler.getOpenTxnsInfo(),
> which returns (txn id, state, host, user) for each txn and includes Aborted 
> txns.
> This can easily be 120 bytes or more per txn overhead (over 1 Java long) 
> which not an issue in general but when the system is misconfigured, the 
> number of opened/aborted txns can get into the millions.  This creates 
> unnecessary memory pressure on metastore.
> Should consider fixing this.
> This should be easy to fix since the result of getOpenTxnsInfo() doesn't go 
> over the wire.
> Also, ValidCompactorTxnList doesn't actually need to store the 'o' txn ids, 
> just the 'a' ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11306) Add a bloom-1 filter for Hybrid MapJoin spills

2015-10-28 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979711#comment-14979711
 ] 

Lefty Leverenz commented on HIVE-11306:
---

Great, thanks [~gopalv]!

The doc looks good but needs version information with a link to this JIRA 
issue.  So I'll add that, and you can review it.

> Add a bloom-1 filter for Hybrid MapJoin spills
> --
>
> Key: HIVE-11306
> URL: https://issues.apache.org/jira/browse/HIVE-11306
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Wei Zheng
> Fix For: 2.0.0
>
> Attachments: HIVE-11306.1.patch, HIVE-11306.2.patch, 
> HIVE-11306.3.patch, HIVE-11306.5.patch, HIVE-11306.6.patch
>
>
> HIVE-9277 implemented Spillable joins for Tez, which suffers from a 
> corner-case performance issue when joining wide small tables against a narrow 
> big table (like a user info table join events stream).
> The fact that the wide table is spilled causes extra IO, even though the nDV 
> of the join key might be in the thousands.
> A cheap bloom-1 filter would add a massive performance gain for such queries, 
> massively cutting down on the spill IO costs for the big-table spills.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12277) Hive macro results on macro_duplicate.q different after adding ORDER BY

2015-10-28 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12277:
---
Attachment: HIVE-12277.01.patch

address [~jdere]'s comments.

> Hive macro results on macro_duplicate.q different after adding ORDER BY
> ---
>
> Key: HIVE-12277
> URL: https://issues.apache.org/jira/browse/HIVE-12277
> Project: Hive
>  Issue Type: Bug
>  Components: Macros
>Reporter: Jason Dere
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12277.01.patch
>
>
> Added an order-by to the query in macro_duplicate.q:
> {noformat}
> -select math_square(a), math_square(b),factorial(a), factorial(b), 
> math_add(a), math_add(b),int(c) from macro_testing;
> \ No newline at end of file
> +select math_square(a), math_square(b),factorial(a), factorial(b), 
> math_add(a), math_add(b),int(c) from macro_testing order by int(c);
> {noformat}
> And the results from math_add() changed unexpectedly:
> {noformat}
> -1  4   1   2   2   4   3
> -16 25  24  120 8   10  6
> +1  4   1   2   1   4   3
> +16 25  24  120 16  25  6
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12277) Hive macro results on macro_duplicate.q different after adding ORDER BY

2015-10-28 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12277:
---
Attachment: (was: HIVE-12277.01.patch)

> Hive macro results on macro_duplicate.q different after adding ORDER BY
> ---
>
> Key: HIVE-12277
> URL: https://issues.apache.org/jira/browse/HIVE-12277
> Project: Hive
>  Issue Type: Bug
>  Components: Macros
>Reporter: Jason Dere
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12277.01.patch
>
>
> Added an order-by to the query in macro_duplicate.q:
> {noformat}
> -select math_square(a), math_square(b),factorial(a), factorial(b), 
> math_add(a), math_add(b),int(c) from macro_testing;
> \ No newline at end of file
> +select math_square(a), math_square(b),factorial(a), factorial(b), 
> math_add(a), math_add(b),int(c) from macro_testing order by int(c);
> {noformat}
> And the results from math_add() changed unexpectedly:
> {noformat}
> -1  4   1   2   2   4   3
> -16 25  24  120 8   10  6
> +1  4   1   2   1   4   3
> +16 25  24  120 16  25  6
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12257) Enhance ORC FileDump utility to handle flush_length files

2015-10-28 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979641#comment-14979641
 ] 

Eugene Koifman commented on HIVE-12257:
---

I think FileDump.printData()
should include e.getMessage() in System.err.println("Unable to dump data for 
file: " + file);


I think getReaderInfo(final Configuration conf, final Path sideFile, final Path 
path) implementation may be unreliable.
See OrcRawRecordMerger.getLastFlushLength.
Instead of relying on NN metadata for length, it has a while loop.  This (I 
believe) is to make sure to read until EOF even if NN doesn't yet have the 
latest info.

(I guess ReaderImpl.extractMetaInfoFromFooter can't use the same trick since 
that would be a perf problem)

> Enhance ORC FileDump utility to handle flush_length files
> -
>
> Key: HIVE-12257
> URL: https://issues.apache.org/jira/browse/HIVE-12257
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12257.1.patch
>
>
> ORC file dump utility currently does not handle delta directories that 
> contain *_flush_length files. These files contains offsets to footer in the 
> corresponding delta file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12257) Enhance ORC FileDump utility to handle flush_length files

2015-10-28 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979650#comment-14979650
 ] 

Prasanth Jayachandran commented on HIVE-12257:
--

Thanks for the review comments. In the next version of patch which I am yet to 
complete, I am using the file length to only if the file is closed at which 
point what NN reports will be correct. In other cases I am already using 
OrcRawRecordMerger.getLastFlushLength. 

Will include you other comments in the next patch.

> Enhance ORC FileDump utility to handle flush_length files
> -
>
> Key: HIVE-12257
> URL: https://issues.apache.org/jira/browse/HIVE-12257
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12257.1.patch
>
>
> ORC file dump utility currently does not handle delta directories that 
> contain *_flush_length files. These files contains offsets to footer in the 
> corresponding delta file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7693) Invalid column ref error in order by when using column alias in select clause and using having

2015-10-28 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-7693:
--
Attachment: (was: HIVE-7693.05.patch)

> Invalid column ref error in order by when using column alias in select clause 
> and using having
> --
>
> Key: HIVE-7693
> URL: https://issues.apache.org/jira/browse/HIVE-7693
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Deepesh Khandelwal
>Assignee: Pengcheng Xiong
> Fix For: 2.0.0
>
> Attachments: HIVE-7693.01.patch, HIVE-7693.02.patch, 
> HIVE-7693.03.patch, HIVE-7693.04.patch, HIVE-7693.05.patch
>
>
> Hive CLI session:
> {noformat}
> hive> create table abc(foo int, bar string);
> OK
> Time taken: 0.633 seconds
> hive> select foo as c0, count(*) as c1 from abc group by foo, bar having bar 
> like '%abc%' order by foo;
> FAILED: SemanticException [Error 10004]: Line 1:93 Invalid table alias or 
> column reference 'foo': (possible column names are: c0, c1)
> {noformat}
> Without having clause, the query runs fine, example:
> {code}
> select foo as c0, count(*) as c1 from abc group by foo, bar order by foo;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7693) Invalid column ref error in order by when using column alias in select clause and using having

2015-10-28 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-7693:
--
Attachment: HIVE-7693.05.patch

> Invalid column ref error in order by when using column alias in select clause 
> and using having
> --
>
> Key: HIVE-7693
> URL: https://issues.apache.org/jira/browse/HIVE-7693
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Deepesh Khandelwal
>Assignee: Pengcheng Xiong
> Fix For: 2.0.0
>
> Attachments: HIVE-7693.01.patch, HIVE-7693.02.patch, 
> HIVE-7693.03.patch, HIVE-7693.04.patch, HIVE-7693.05.patch
>
>
> Hive CLI session:
> {noformat}
> hive> create table abc(foo int, bar string);
> OK
> Time taken: 0.633 seconds
> hive> select foo as c0, count(*) as c1 from abc group by foo, bar having bar 
> like '%abc%' order by foo;
> FAILED: SemanticException [Error 10004]: Line 1:93 Invalid table alias or 
> column reference 'foo': (possible column names are: c0, c1)
> {noformat}
> Without having clause, the query runs fine, example:
> {code}
> select foo as c0, count(*) as c1 from abc group by foo, bar order by foo;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11564) HBaseSchemaTool should be able to list objects

2015-10-28 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979841#comment-14979841
 ] 

Lefty Leverenz commented on HIVE-11564:
---

This needs doc, right?  We can give it a TODOC2.0 label, but it should also be 
linked to HIVE-9752 (Documentation for HBase metastore) just to keep track of 
everything in one place.  Or maybe HIVE-9752 is sufficient and this issue 
doesn't need its own TODOC2.0 label.  What do you think, [~alangates]?

> HBaseSchemaTool should be able to list objects
> --
>
> Key: HIVE-11564
> URL: https://issues.apache.org/jira/browse/HIVE-11564
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Metastore
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 2.0.0
>
> Attachments: HIVE-11564.2.patch, HIVE-11564.3.patch, 
> HIVE-11564.4.patch, HIVE-11564.5.patch, HIVE-11564.patch
>
>
> Current HBaseSchemaTool can only fetch objects the user already knows the 
> name of.  It should also be able to list available objects (e.g. list all 
> databases).  
> It is also very user unfriendly in terms of error handling.  That needs to be 
> fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11987) CompactionTxnHandler.createValidCompactTxnList() can use much less memory

2015-10-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11987:
--
Description: 
This method only needs HWM and list of txn IDs in 'a' state and smallest 'o' 
txn id.

It's currently implemented to get the list from TxnHandler.getOpenTxnsInfo(),
which returns (txn id, state, host, user) for each txn and includes Aborted 
txns.

This can easily be 120 bytes or more per txn overhead (over 1 Java long) which 
not an issue in general but when the system is misconfigured, the number of 
opened/aborted txns can get into the millions.  This creates unnecessary memory 
pressure on metastore.

Should consider fixing this.
This should be easy to fix since the result of getOpenTxnsInfo() doesn't go 
over the wire.

Also, ValidCompactorTxnList doesn't actually need to store the 'o' txn ids, 
just the 'a' ones.

  was:
This method only needs HWM and list of txn IDs in 'o' state.

It's currently implemented to get the list from TxnHandler.getOpenTxnsInfo(),
which returns (txn id, state, host, user) for each txn and includes Aborted 
txns.

This can easily be 120 bytes or more per txn overhead (over 1 Java long) which 
not an issue in general but when the system is misconfigured, the number of 
opened/aborted txns can get into the millions.  This creates unnecessary memory 
pressure on metastore.

Should consider fixing this.


> CompactionTxnHandler.createValidCompactTxnList() can use much less memory
> -
>
> Key: HIVE-11987
> URL: https://issues.apache.org/jira/browse/HIVE-11987
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
>
> This method only needs HWM and list of txn IDs in 'a' state and smallest 'o' 
> txn id.
> It's currently implemented to get the list from TxnHandler.getOpenTxnsInfo(),
> which returns (txn id, state, host, user) for each txn and includes Aborted 
> txns.
> This can easily be 120 bytes or more per txn overhead (over 1 Java long) 
> which not an issue in general but when the system is misconfigured, the 
> number of opened/aborted txns can get into the millions.  This creates 
> unnecessary memory pressure on metastore.
> Should consider fixing this.
> This should be easy to fix since the result of getOpenTxnsInfo() doesn't go 
> over the wire.
> Also, ValidCompactorTxnList doesn't actually need to store the 'o' txn ids, 
> just the 'a' ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12282) beeline - update command printing in verbose mode

2015-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979674#comment-14979674
 ] 

Hive QA commented on HIVE-12282:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12769338/HIVE-12282.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9716 tests executed
*Failed tests:*
{noformat}
TestHS2AuthzSessionContext - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vectorization_16.q-mapjoin_mapjoin.q-groupby2.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5836/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5836/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5836/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12769338 - PreCommit-HIVE-TRUNK-Build

> beeline - update command printing in verbose mode
> -
>
> Key: HIVE-12282
> URL: https://issues.apache.org/jira/browse/HIVE-12282
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 2.0.0
>
> Attachments: HIVE-12282.1.patch, HIVE-12282.2.patch
>
>
> In verbose mode, beeline prints the password used in commandline to STDERR. 
> This is not a good security practice. 
> Issue is in BeeLine.java code -
> {code}
> if (url != null) {
>   String com = "!connect "
>   + url + " "
>   + (user == null || user.length() == 0 ? "''" : user) + " "
>   + (pass == null || pass.length() == 0 ? "''" : pass) + " "
>   + (driver == null ? "" : driver);
>   debug("issuing: " + com);
>   dispatch(com);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11306) Add a bloom-1 filter for Hybrid MapJoin spills

2015-10-28 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979706#comment-14979706
 ] 

Gopal V commented on HIVE-11306:


[~leftylev]: this patch implements that idea - instead of a TODOC, I filled in 
the details of this implementation into the docs.

> Add a bloom-1 filter for Hybrid MapJoin spills
> --
>
> Key: HIVE-11306
> URL: https://issues.apache.org/jira/browse/HIVE-11306
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Wei Zheng
> Fix For: 2.0.0
>
> Attachments: HIVE-11306.1.patch, HIVE-11306.2.patch, 
> HIVE-11306.3.patch, HIVE-11306.5.patch, HIVE-11306.6.patch
>
>
> HIVE-9277 implemented Spillable joins for Tez, which suffers from a 
> corner-case performance issue when joining wide small tables against a narrow 
> big table (like a user info table join events stream).
> The fact that the wide table is spilled causes extra IO, even though the nDV 
> of the join key might be in the thousands.
> A cheap bloom-1 filter would add a massive performance gain for such queries, 
> massively cutting down on the spill IO costs for the big-table spills.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12281) Vectorized MapJoin - use Operator::isLogDebugEnabled

2015-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979767#comment-14979767
 ] 

Hive QA commented on HIVE-12281:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12769222/HIVE-12281.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5841/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5841/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5841/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-5841/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 99a043a HIVE-12245: Support column comments for an HBase backed 
table (Chaoyu Tang, reviewed by Jimmy Xiang)
+ git clean -f -d
+ git checkout master
Already on 'master'
+ git reset --hard origin/master
HEAD is now at 99a043a HIVE-12245: Support column comments for an HBase backed 
table (Chaoyu Tang, reviewed by Jimmy Xiang)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12769222 - PreCommit-HIVE-TRUNK-Build

> Vectorized MapJoin - use Operator::isLogDebugEnabled
> 
>
> Key: HIVE-12281
> URL: https://issues.apache.org/jira/browse/HIVE-12281
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-12281.1.patch, vector-map-logging.png
>
>
> !vector-map-logging.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12283) Fix test failures after HIVE-11844 [Spark Branch]

2015-10-28 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979823#comment-14979823
 ] 

Rui Li commented on HIVE-12283:
---

Thanks for the review.

> Fix test failures after HIVE-11844 [Spark Branch]
> -
>
> Key: HIVE-12283
> URL: https://issues.apache.org/jira/browse/HIVE-12283
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: spark-branch
>
> Attachments: HIVE-12283.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11306) Add a bloom-1 filter for Hybrid MapJoin spills

2015-10-28 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979654#comment-14979654
 ] 

Lefty Leverenz commented on HIVE-11306:
---

Does this need documentation in the wiki?  (If so, please add a TODOC2.0 label.)

* [Hybrid Hybrid Grace Hash Join, v1.0 -- Bloom Filter | 
https://cwiki.apache.org/confluence/display/Hive/Hybrid+Hybrid+Grace+Hash+Join%2C+v1.0#HybridHybridGraceHashJoin,v1.0-BloomFilter]

> Add a bloom-1 filter for Hybrid MapJoin spills
> --
>
> Key: HIVE-11306
> URL: https://issues.apache.org/jira/browse/HIVE-11306
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Wei Zheng
> Fix For: 2.0.0
>
> Attachments: HIVE-11306.1.patch, HIVE-11306.2.patch, 
> HIVE-11306.3.patch, HIVE-11306.5.patch, HIVE-11306.6.patch
>
>
> HIVE-9277 implemented Spillable joins for Tez, which suffers from a 
> corner-case performance issue when joining wide small tables against a narrow 
> big table (like a user info table join events stream).
> The fact that the wide table is spilled causes extra IO, even though the nDV 
> of the join key might be in the thousands.
> A cheap bloom-1 filter would add a massive performance gain for such queries, 
> massively cutting down on the spill IO costs for the big-table spills.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11497) Make sure --orcfiledump utility includes OrcRecordUpdate.AcidStats

2015-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979683#comment-14979683
 ] 

Hive QA commented on HIVE-11497:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12769196/HIVE-11497.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5838/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5838/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5838/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-5838/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 53fc319 HIVE-12276 Fix messages in InvalidTable (Eugene Koifman, 
reviewed by Jason Dere)
+ git clean -f -d
+ git checkout master
Already on 'master'
+ git reset --hard origin/master
HEAD is now at 53fc319 HIVE-12276 Fix messages in InvalidTable (Eugene Koifman, 
reviewed by Jason Dere)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12769196 - PreCommit-HIVE-TRUNK-Build

> Make sure --orcfiledump utility includes OrcRecordUpdate.AcidStats
> --
>
> Key: HIVE-11497
> URL: https://issues.apache.org/jira/browse/HIVE-11497
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0, 1.3.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Prasanth Jayachandran
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11497-branch-1.patch, HIVE-11497.patch, 
> HIVE-11497.patch
>
>
> OrcRecordUpdater.AcidStats maintains counts on I/U/D events in the file 
> (going back to Hive 0.14).
> current branch-1, has OrcRecordUpdater.parserAcidStats() to read it and 
> should be included in _orcfiledump_ output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12218) Unable to create a like table for an hbase backed table

2015-10-28 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-12218:
---
Attachment: HIVE-12218-branch-1.patch

Attach patch to branch-1. Basically resolving the conflict by removing some 
imports in DDLTask.java. [~xuefuz] could you please review it? Thanks 

> Unable to create a like table for an hbase backed table
> ---
>
> Key: HIVE-12218
> URL: https://issues.apache.org/jira/browse/HIVE-12218
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 2.0.0
>
> Attachments: HIVE-12218-branch-1.patch, HIVE-12218.patch
>
>
> For an HBase backed table:
> {code}
> CREATE TABLE hbasetbl (key string, state string, country string, country_id 
> int)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = "info:state,info:country,info:country_id"
> );
> {code}
> Create its like table using query such as 
> create table hbasetbl_like like hbasetbl;
> It fails with error:
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. 
> org.apache.hadoop.hive.ql.metadata.HiveException: must specify an InputFormat 
> class



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7575) GetTables thrift call is very slow

2015-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979763#comment-14979763
 ] 

Hive QA commented on HIVE-7575:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12769216/HIVE-7575.5.patch.txt

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5840/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5840/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5840/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-5840/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   53fc319..99a043a  master -> origin/master
+ git reset --hard HEAD
HEAD is now at 53fc319 HIVE-12276 Fix messages in InvalidTable (Eugene Koifman, 
reviewed by Jason Dere)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
+ git reset --hard origin/master
HEAD is now at 99a043a HIVE-12245: Support column comments for an HBase backed 
table (Chaoyu Tang, reviewed by Jimmy Xiang)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12769216 - PreCommit-HIVE-TRUNK-Build

> GetTables thrift call is very slow
> --
>
> Key: HIVE-7575
> URL: https://issues.apache.org/jira/browse/HIVE-7575
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Ashu Pachauri
>Assignee: Navis
> Attachments: HIVE-7575.1.patch.txt, HIVE-7575.2.patch.txt, 
> HIVE-7575.3.patch.txt, HIVE-7575.4.patch.txt, HIVE-7575.5.patch.txt
>
>
> The GetTables thrift call takes a long time when the number of table is large.
> With around 5000 tables, the call takes around 80 seconds compared to a "Show 
> Tables" query on the same HiveServer2 instance which takes 3-7 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7575) GetTables thrift call is very slow

2015-10-28 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7575:

Attachment: HIVE-7575.6.patch.txt

Rebased to trunk & addressed comment

> GetTables thrift call is very slow
> --
>
> Key: HIVE-7575
> URL: https://issues.apache.org/jira/browse/HIVE-7575
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Ashu Pachauri
>Assignee: Navis
> Attachments: HIVE-7575.1.patch.txt, HIVE-7575.2.patch.txt, 
> HIVE-7575.3.patch.txt, HIVE-7575.4.patch.txt, HIVE-7575.5.patch.txt, 
> HIVE-7575.6.patch.txt
>
>
> The GetTables thrift call takes a long time when the number of table is large.
> With around 5000 tables, the call takes around 80 seconds compared to a "Show 
> Tables" query on the same HiveServer2 instance which takes 3-7 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7693) Invalid column ref error in order by when using column alias in select clause and using having

2015-10-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979847#comment-14979847
 ] 

Hive QA commented on HIVE-7693:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12769344/HIVE-7693.05.patch

{color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 9734 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_window
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_resolution
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown_negative
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_having
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_resolution
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_having
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarDataNucleusUnCaching
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5842/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5842/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5842/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12769344 - PreCommit-HIVE-TRUNK-Build

> Invalid column ref error in order by when using column alias in select clause 
> and using having
> --
>
> Key: HIVE-7693
> URL: https://issues.apache.org/jira/browse/HIVE-7693
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Deepesh Khandelwal
>Assignee: Pengcheng Xiong
> Fix For: 2.0.0
>
> Attachments: HIVE-7693.01.patch, HIVE-7693.02.patch, 
> HIVE-7693.03.patch, HIVE-7693.04.patch, HIVE-7693.05.patch
>
>
> Hive CLI session:
> {noformat}
> hive> create table abc(foo int, bar string);
> OK
> Time taken: 0.633 seconds
> hive> select foo as c0, count(*) as c1 from abc group by foo, bar having bar 
> like '%abc%' order by foo;
> FAILED: SemanticException [Error 10004]: Line 1:93 Invalid table alias or 
> column reference 'foo': (possible column names are: c0, c1)
> {noformat}
> Without having clause, the query runs fine, example:
> {code}
> select foo as c0, count(*) as c1 from abc group by foo, bar order by foo;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7575) GetTables thrift call is very slow

2015-10-28 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979848#comment-14979848
 ] 

Navis commented on HIVE-7575:
-

[~aihuaxu] Wanted method signature to be simplistic but so be it. (used 
TableMetaData by mistake. I'll change it to TableMeta in next patch)

bq. To Yongzhi's question: when we have many databases, the performance of the 
original getTables could be bad since we are making at least one trip for each 
database. Is that right?

It would be one of the root cause. But seeing HIVE-11702, it takes much time 
though getSchema(null) uses just one call to metastore. Pattern matching query 
seems much more expensive than expected(Even with simple * pattern) 

> GetTables thrift call is very slow
> --
>
> Key: HIVE-7575
> URL: https://issues.apache.org/jira/browse/HIVE-7575
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Ashu Pachauri
>Assignee: Navis
> Attachments: HIVE-7575.1.patch.txt, HIVE-7575.2.patch.txt, 
> HIVE-7575.3.patch.txt, HIVE-7575.4.patch.txt, HIVE-7575.5.patch.txt, 
> HIVE-7575.6.patch.txt
>
>
> The GetTables thrift call takes a long time when the number of table is large.
> With around 5000 tables, the call takes around 80 seconds compared to a "Show 
> Tables" query on the same HiveServer2 instance which takes 3-7 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11378) Remove hadoop-1 support from master branch

2015-10-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979193#comment-14979193
 ] 

Alan Gates commented on HIVE-11378:
---

[~leftylev] I've gone over the above docs and edited them.  Reviews are 
appreciated.

> Remove hadoop-1 support from master branch
> --
>
> Key: HIVE-11378
> URL: https://issues.apache.org/jira/browse/HIVE-11378
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11378.2.patch, HIVE-11378.3.patch, 
> HIVE-11378.4.patch, HIVE-11378.5.patch, HIVE-11378.patch
>
>
> When we branched branch-1 one of the goals was the ability to remove hadoop1 
> support from master.  I propose to do this softly at first by removing it 
> from the poms removing the 20S implementation of the shims.  
> I am not going to remove the shim layer.  That would be much more disruptive. 
>  Also, I haven't done the homework to see if we could, as there may still be 
> incompatibility issues between various versions of hadoop2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11616) DelegationTokenSecretManager reuse the same objectstore ,which has cocurrent issue

2015-10-28 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979206#comment-14979206
 ] 

Xuefu Zhang commented on HIVE-11616:


Could we update the affects version, and after the fix goes in, fix version 
please? the current ones are confusing.

> DelegationTokenSecretManager reuse the same objectstore ,which has cocurrent 
> issue
> --
>
> Key: HIVE-11616
> URL: https://issues.apache.org/jira/browse/HIVE-11616
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1
>Reporter: wangwenli
>Assignee: Cody Fu
> Fix For: 0.12.1
>
> Attachments: HIVE-11616.01.patch, HIVE-11616.02.patch, 
> HIVE-11616.patch
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> sometime in metastore log, will get below exception,  after analysis, we 
> found that :
> when hivemetastore start, the DelegationTokenSecretManager will maintain the 
> same objectstore, see here
> {code}
> saslServer.startDelegationTokenSecretManager(conf, *baseHandler.getMS()*, 
> ServerMode.METASTORE);
> {code}
> this lead to the cocurrent issue.
> {code}
> 2015-08-18 20:59:10,520 | ERROR | pool-6-thread-200 | Error occurred during 
> processing of message. | 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:296)
> org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: 
> org.datanucleus.transaction.NucleusTransactionException: Invalid state. 
> Transaction has already started
>   at 
> org.apache.hadoop.hive.thrift.DBTokenStore.invokeOnRawStore(DBTokenStore.java:154)
>   at 
> org.apache.hadoop.hive.thrift.DBTokenStore.getToken(DBTokenStore.java:88)
>   at 
> org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:112)
>   at 
> org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:56)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.getPassword(HadoopThriftAuthBridge.java:565)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.handle(HadoopThriftAuthBridge.java:596)
>   at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
>   at 
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
>   at 
> org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:539)
>   at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283)
>   at 
> org.apache.thrift.transport.HiveTSaslServerTransport.open(HiveTSaslServerTransport.java:133)
>   at 
> org.apache.thrift.transport.HiveTSaslServerTransport$Factory.getTransport(HiveTSaslServerTransport.java:261)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:739)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:736)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1652)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:736)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.datanucleus.transaction.NucleusTransactionException: Invalid 
> state. Transaction has already started
>   at 
> org.datanucleus.transaction.TransactionManager.begin(TransactionManager.java:47)
>   at org.datanucleus.TransactionImpl.begin(TransactionImpl.java:131)
>   at 
> org.datanucleus.api.jdo.JDOTransaction.internalBegin(JDOTransaction.java:88)
>   at org.datanucleus.api.jdo.JDOTransaction.begin(JDOTransaction.java:80)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.openTransaction(ObjectStore.java:420)
>   at 
> org.apache.hadoop.hive.metastore.ObjectStore.getToken(ObjectStore.java:6455)
>   at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 

[jira] [Updated] (HIVE-12208) Vectorized JOIN NPE on dynamically partitioned hash-join + map-join

2015-10-28 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-12208:
---
Summary: Vectorized JOIN NPE on dynamically partitioned hash-join + 
map-join  (was: Vectorized JOIN NPE on dynamically partitioned hash-join)

> Vectorized JOIN NPE on dynamically partitioned hash-join + map-join
> ---
>
> Key: HIVE-12208
> URL: https://issues.apache.org/jira/browse/HIVE-12208
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.0.0
>Reporter: Gopal V
>
> TPC-DS Q19 with reducer vectorized join optimizations
> {code}
> set hive.optimize.dynamic.partition.hashjoin=true;
> set hive.vectorized.execution.reduce.enabled=true;
> set hive.mapjoin.hybridgrace.hashtable=false;
> select  i_brand_id brand_id, i_brand brand, i_manufact_id, i_manufact,
>   sum(ss_ext_sales_price) ext_price
>  from date_dim, store_sales, item,customer,customer_address,store
>  where date_dim.d_date_sk = store_sales.ss_sold_date_sk
>and store_sales.ss_item_sk = item.i_item_sk
>and i_manager_id=7
>and d_moy=11
>and d_year=1999
>and store_sales.ss_customer_sk = customer.c_customer_sk 
>and customer.c_current_addr_sk = customer_address.ca_address_sk
>and substr(ca_zip,1,5) <> substr(s_zip,1,5) 
>and store_sales.ss_store_sk = store.s_store_sk 
>  group by i_brand
>   ,i_brand_id
>   ,i_manufact_id
>   ,i_manufact
>  order by ext_price desc
>  ,i_brand
>  ,i_brand_id
>  ,i_manufact_id
>  ,i_manufact
> {code}
> possibly a trivial plan setup issue, since the NPE is pretty much immediate.
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:368)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:852)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:603)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:362)
>   ... 19 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerGenerateResultOperator.commonSetup(VectorMapJoinInnerGenerateResultOperator.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:96)
>   ... 22 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12208) Vectorized JOIN NPE on dynamically partitioned hash-join + map-join

2015-10-28 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-12208:
---
Attachment: query82.txt

> Vectorized JOIN NPE on dynamically partitioned hash-join + map-join
> ---
>
> Key: HIVE-12208
> URL: https://issues.apache.org/jira/browse/HIVE-12208
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.0.0
>Reporter: Gopal V
> Attachments: query82.txt
>
>
> TPC-DS Q19 with reducer vectorized join optimizations
> {code}
> set hive.optimize.dynamic.partition.hashjoin=true;
> set hive.vectorized.execution.reduce.enabled=true;
> set hive.mapjoin.hybridgrace.hashtable=false;
> select  i_brand_id brand_id, i_brand brand, i_manufact_id, i_manufact,
>   sum(ss_ext_sales_price) ext_price
>  from date_dim, store_sales, item,customer,customer_address,store
>  where date_dim.d_date_sk = store_sales.ss_sold_date_sk
>and store_sales.ss_item_sk = item.i_item_sk
>and i_manager_id=7
>and d_moy=11
>and d_year=1999
>and store_sales.ss_customer_sk = customer.c_customer_sk 
>and customer.c_current_addr_sk = customer_address.ca_address_sk
>and substr(ca_zip,1,5) <> substr(s_zip,1,5) 
>and store_sales.ss_store_sk = store.s_store_sk 
>  group by i_brand
>   ,i_brand_id
>   ,i_manufact_id
>   ,i_manufact
>  order by ext_price desc
>  ,i_brand
>  ,i_brand_id
>  ,i_manufact_id
>  ,i_manufact
> {code}
> possibly a trivial plan setup issue, since the NPE is pretty much immediate.
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:368)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:852)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:603)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:362)
>   ... 19 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerGenerateResultOperator.commonSetup(VectorMapJoinInnerGenerateResultOperator.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:96)
>   ... 22 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12208) Vectorized JOIN NPE on dynamically partitioned hash-join + map-join

2015-10-28 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979246#comment-14979246
 ] 

Gopal V commented on HIVE-12208:


The bug disappears when the broadcast mapjoin has been pushed out of the 
Reducer 5.

This is only an error in a scenario where the hashtable is not built for each 
version of the plan - which never happens for 1 container Tez.

> Vectorized JOIN NPE on dynamically partitioned hash-join + map-join
> ---
>
> Key: HIVE-12208
> URL: https://issues.apache.org/jira/browse/HIVE-12208
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.0.0
>Reporter: Gopal V
> Attachments: query82.txt
>
>
> TPC-DS Q82 with reducer vectorized join optimizations
> {code}
>   Reducer 5 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE), Map 3 
> (BROADCAST_EDGE), Map 4 (CUSTOM_SIMPLE_EDGE)
> {code}
> {code}
> set hive.optimize.dynamic.partition.hashjoin=true;
> set hive.vectorized.execution.reduce.enabled=true;
> set hive.mapjoin.hybridgrace.hashtable=false;
> select  i_item_id
>,i_item_desc
>,i_current_price
>  from item, inventory, date_dim, store_sales
>  where i_current_price between 30 and 30+30
>  and inv_item_sk = i_item_sk
>  and d_date_sk=inv_date_sk
>  and d_date between '2002-05-30' and '2002-07-30'
>  and i_manufact_id in (437,129,727,663)
>  and inv_quantity_on_hand between 100 and 500
>  and ss_item_sk = i_item_sk
>  group by i_item_id,i_item_desc,i_current_price
>  order by i_item_id
>  limit 100
> {code}
> possibly a trivial plan setup issue, since the NPE is pretty much immediate.
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:368)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:852)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:603)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:362)
>   ... 19 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerGenerateResultOperator.commonSetup(VectorMapJoinInnerGenerateResultOperator.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:96)
>   ... 22 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity

2015-10-28 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7723:

Attachment: HIVE-7723.15.patch

> Explain plan for complex query with lots of partitions is slow due to 
> in-efficient collection used to find a matching ReadEntity
> 
>
> Key: HIVE-7723
> URL: https://issues.apache.org/jira/browse/HIVE-7723
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Physical Optimizer
>Affects Versions: 0.13.1
>Reporter: Mostafa Mokhtar
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-7723.1.patch, HIVE-7723.10.patch, 
> HIVE-7723.11.patch, HIVE-7723.11.patch, HIVE-7723.12.patch, 
> HIVE-7723.13.patch, HIVE-7723.14.patch, HIVE-7723.15.patch, 
> HIVE-7723.2.patch, HIVE-7723.3.patch, HIVE-7723.4.patch, HIVE-7723.5.patch, 
> HIVE-7723.6.patch, HIVE-7723.7.patch, HIVE-7723.8.patch, HIVE-7723.9.patch
>
>
> Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it 
> showed that ReadEntity.equals is taking ~40% of the CPU.
> ReadEntity.equals is called from the snippet below.
> Again and again the set is iterated over to get the actual match, a HashMap 
> is a better option for this case as Set doesn't have a Get method.
> Also for ReadEntity equals is case-insensitive while hash is , which is an 
> undesired behavior.
> {code}
> public static ReadEntity addInput(Set inputs, ReadEntity 
> newInput) {
> // If the input is already present, make sure the new parent is added to 
> the input.
> if (inputs.contains(newInput)) {
>   for (ReadEntity input : inputs) {
> if (input.equals(newInput)) {
>   if ((newInput.getParents() != null) && 
> (!newInput.getParents().isEmpty())) {
> input.getParents().addAll(newInput.getParents());
> input.setDirect(input.isDirect() || newInput.isDirect());
>   }
>   return input;
> }
>   }
>   assert false;
> } else {
>   inputs.add(newInput);
>   return newInput;
> }
> // make compile happy
> return null;
>   }
> {code}
> This is the query used : 
> {code}
> select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
> ,cs1.b_streen_name ,cs1.b_city
>  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
> ,cs1.c_zip ,cs1.syear ,cs1.cnt
>  ,cs1.s1 ,cs1.s2 ,cs1.s3
>  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
> from
> (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
> store_name
>  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
> ,ad1.ca_street_name as b_streen_name
>  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
> c_street_number
>  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
> as c_zip
>  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
> as cnt
>  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
> ,sum(ss_coupon_amt) as s3
>   FROM   store_sales
> JOIN store_returns ON store_sales.ss_item_sk = 
> store_returns.sr_item_sk and store_sales.ss_ticket_number = 
> store_returns.sr_ticket_number
> JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
> JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
> JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
> JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
> JOIN store ON store_sales.ss_store_sk = store.s_store_sk
> JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
> cd1.cd_demo_sk
> JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
> cd2.cd_demo_sk
> JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
> JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
> hd1.hd_demo_sk
> JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
> hd2.hd_demo_sk
> JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
> ad1.ca_address_sk
> JOIN customer_address ad2 ON customer.c_current_addr_sk = 
> ad2.ca_address_sk
> JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
> JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
> JOIN item ON store_sales.ss_item_sk = item.i_item_sk
> JOIN
>  (select cs_item_sk
> ,sum(cs_ext_list_price) as 
> sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
>   from catalog_sales JOIN catalog_returns
>   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
> and catalog_sales.cs_order_number = 

[jira] [Commented] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column

2015-10-28 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979273#comment-14979273
 ] 

Jason Dere commented on HIVE-12063:
---

We should at have some way to alert users about this change as it affects the 
output of the results. There are some users that complain about Hive's existing 
Decimal formatting because it looks different from other RDBMS when trying to 
validate results, I'm sure there may be some people that will want to know if 
the HIve format changes. I guess it can also go into the release notes section 
of this Jira.

> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: HIVE-12063
> URL: https://issues.apache.org/jira/browse/HIVE-12063
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12063.1.patch, HIVE-12063.2.patch, HIVE-12063.patch
>
>
> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
> which caused many problems including treating 0.0, 0.00 and so on as 0, which 
> has different precision/scale. Please refer to HIVE-7373 description. 
> However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems 
> remained. HIVE-11835 was resolved recently to address one of the problems, 
> where 0.0, 0.00, and so on cannot be read into decimal(1,1).
> However, HIVE-11835 didn't address the problem of showing as 0 in query 
> result for any decimal values such as 0.0, 0.00, etc. This causes confusion 
> as 0 and 0.0 have different precision/scale than 0.
> The proposal here is to pad zeros for query result to the type's scale. This 
> not only removes the confusion described above, but also aligns with many 
> other DBs. Internal decimal number representation doesn't change, however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column

2015-10-28 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979177#comment-14979177
 ] 

Xuefu Zhang commented on HIVE-12063:


Yes, the display result would be different (for some numbers), but I'm not sure 
why it's incompatible though. [~jdere], could you articulate?

> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: HIVE-12063
> URL: https://issues.apache.org/jira/browse/HIVE-12063
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12063.1.patch, HIVE-12063.2.patch, HIVE-12063.patch
>
>
> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
> which caused many problems including treating 0.0, 0.00 and so on as 0, which 
> has different precision/scale. Please refer to HIVE-7373 description. 
> However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems 
> remained. HIVE-11835 was resolved recently to address one of the problems, 
> where 0.0, 0.00, and so on cannot be read into decimal(1,1).
> However, HIVE-11835 didn't address the problem of showing as 0 in query 
> result for any decimal values such as 0.0, 0.00, etc. This causes confusion 
> as 0 and 0.0 have different precision/scale than 0.
> The proposal here is to pad zeros for query result to the type's scale. This 
> not only removes the confusion described above, but also aligns with many 
> other DBs. Internal decimal number representation doesn't change, however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12215) Exchange partition does not show outputs field for post/pre execute hooks

2015-10-28 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-12215:

Attachment: HIVE-12215.patch

> Exchange partition does not show outputs field for post/pre execute hooks
> -
>
> Key: HIVE-12215
> URL: https://issues.apache.org/jira/browse/HIVE-12215
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-12215.patch
>
>
> The pre/post execute hook interface has fields that indicate which Hive 
> objects were read / written to as a result of running the query. For the 
> exchange partition operation, these fields (ReadEntity and WriteEntity) are 
> empty. 
> This is an important issue as the hook interface may be configured to perform 
> critical warehouse operations.
> See
> {noformat}
> ql/src/test/results/clientpositive/exchange_partition3.q.out
> {noformat}
> {noformat}
> PREHOOK: query: -- This will exchange both partitions hr=1 and hr=2
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2
> PREHOOK: type: ALTERTABLE_EXCHANGEPARTITION
> POSTHOOK: query: -- This will exchange both partitions hr=1 and hr=2
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2
> POSTHOOK: type: ALTERTABLE_EXCHANGEPARTITION
> {noformat}
> Seems it should also print output fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity

2015-10-28 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-7723:

Attachment: (was: HIVE-7723.14.patch)

> Explain plan for complex query with lots of partitions is slow due to 
> in-efficient collection used to find a matching ReadEntity
> 
>
> Key: HIVE-7723
> URL: https://issues.apache.org/jira/browse/HIVE-7723
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Physical Optimizer
>Affects Versions: 0.13.1
>Reporter: Mostafa Mokhtar
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-7723.1.patch, HIVE-7723.10.patch, 
> HIVE-7723.11.patch, HIVE-7723.11.patch, HIVE-7723.12.patch, 
> HIVE-7723.13.patch, HIVE-7723.14.patch, HIVE-7723.15.patch, 
> HIVE-7723.2.patch, HIVE-7723.3.patch, HIVE-7723.4.patch, HIVE-7723.5.patch, 
> HIVE-7723.6.patch, HIVE-7723.7.patch, HIVE-7723.8.patch, HIVE-7723.9.patch
>
>
> Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it 
> showed that ReadEntity.equals is taking ~40% of the CPU.
> ReadEntity.equals is called from the snippet below.
> Again and again the set is iterated over to get the actual match, a HashMap 
> is a better option for this case as Set doesn't have a Get method.
> Also for ReadEntity equals is case-insensitive while hash is , which is an 
> undesired behavior.
> {code}
> public static ReadEntity addInput(Set inputs, ReadEntity 
> newInput) {
> // If the input is already present, make sure the new parent is added to 
> the input.
> if (inputs.contains(newInput)) {
>   for (ReadEntity input : inputs) {
> if (input.equals(newInput)) {
>   if ((newInput.getParents() != null) && 
> (!newInput.getParents().isEmpty())) {
> input.getParents().addAll(newInput.getParents());
> input.setDirect(input.isDirect() || newInput.isDirect());
>   }
>   return input;
> }
>   }
>   assert false;
> } else {
>   inputs.add(newInput);
>   return newInput;
> }
> // make compile happy
> return null;
>   }
> {code}
> This is the query used : 
> {code}
> select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
> ,cs1.b_streen_name ,cs1.b_city
>  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
> ,cs1.c_zip ,cs1.syear ,cs1.cnt
>  ,cs1.s1 ,cs1.s2 ,cs1.s3
>  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
> from
> (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
> store_name
>  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
> ,ad1.ca_street_name as b_streen_name
>  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
> c_street_number
>  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
> as c_zip
>  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
> as cnt
>  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
> ,sum(ss_coupon_amt) as s3
>   FROM   store_sales
> JOIN store_returns ON store_sales.ss_item_sk = 
> store_returns.sr_item_sk and store_sales.ss_ticket_number = 
> store_returns.sr_ticket_number
> JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
> JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
> JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
> JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
> JOIN store ON store_sales.ss_store_sk = store.s_store_sk
> JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
> cd1.cd_demo_sk
> JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
> cd2.cd_demo_sk
> JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
> JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
> hd1.hd_demo_sk
> JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
> hd2.hd_demo_sk
> JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
> ad1.ca_address_sk
> JOIN customer_address ad2 ON customer.c_current_addr_sk = 
> ad2.ca_address_sk
> JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
> JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
> JOIN item ON store_sales.ss_item_sk = item.i_item_sk
> JOIN
>  (select cs_item_sk
> ,sum(cs_ext_list_price) as 
> sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
>   from catalog_sales JOIN catalog_returns
>   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
> and 

  1   2   >