[jira] [Updated] (HIVE-14223) beeline should look for jdbc standalone jar in dist/jdbc dir instead of dist/lib

2016-07-12 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14223:
-
Attachment: HIVE-14223.2.patch

> beeline should look for jdbc standalone jar in dist/jdbc dir instead of 
> dist/lib
> 
>
> Key: HIVE-14223
> URL: https://issues.apache.org/jira/browse/HIVE-14223
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.0.1, 2.2.0, 2.1.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14223.1.patch, HIVE-14223.2.patch
>
>
> HIVE-13134 changed the jdbc-standalone jar path to dist/jdbc instead of 
> dist/lib. beeline.sh still looks for the jar in dist/lib which throws the 
> following error
> {code}
> ls: cannot access /work/hive2/lib/hive-jdbc-*-standalone.jar: No such file or 
> directory
> {code}
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13873) Column pruning for nested fields

2016-07-12 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374395#comment-15374395
 ] 

Ferdinand Xu commented on HIVE-13873:
-

Thanks [~xuefuz] for your review.

{quote}
1. nested column pruning should goes beyond just select op or groupby op. 
{quote}
Good catch. I will take this into consideration in my next patch.
{quote}
2. Secondly, there may need a consolidation/merging process in determining 
finally read schema. For instance,
{noformat}
select msg from t where msg.a='x';
{noformat}
In this case, the projected column should be just msg rather than msg + msg.a.
{quote}
OK, the logic will be first to check whether a sub attribution will be 
filtered. If so, the other attributions within the same struct will not be 
filtered. Will update the patch as well.

{quote}
3. While it's fine to support just struct at first, we may need to consider how 
to find a more extensible way to pass the projected fields to the reader to 
support other types (array and map). I have no idea on this, so love to hear 
your thoughts.
{quote}
Good suggestion, I have considered it before. Given array as an example, it 
will generate the schema like this.
{noformat}
optional group max_nested_map (LIST) {
repeated group bag {
  optional group array_element (LIST) {
 required binary key (UTF8);
  }
}
{noformat}
For bag and array_element, they're generated in Parquet side and Hive will not 
be aware of the column path. One way I am thinking is to serializing typeinfo 
and types which used to generate the full schema to generate the requested 
schema. More investigations are needed anyway. Maybe we could make it happen in 
another ticket.

> Column pruning for nested fields
> 
>
> Key: HIVE-13873
> URL: https://issues.apache.org/jira/browse/HIVE-13873
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Reporter: Xuefu Zhang
>Assignee: Ferdinand Xu
> Attachments: HIVE-13873.wip.patch
>
>
> Some columnar file formats such as Parquet store fields in struct type also 
> column by column using encoding described in Google Dramel pager. It's very 
> common in big data where data are stored in structs while queries only needs 
> a subset of the the fields in the structs. However, presently Hive still 
> needs to read the whole struct regardless whether all fields are selected. 
> Therefore, pruning unwanted sub-fields in struct or nested fields at file 
> reading time would be a big performance boost for such scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14223) beeline should look for jdbc standalone jar in dist/jdbc dir instead of dist/lib

2016-07-12 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374392#comment-15374392
 ] 

Prasanth Jayachandran edited comment on HIVE-14223 at 7/13/16 5:48 AM:
---

Yeah. Looks like that's the case. I haven't seen any issues by not including 
jdbc-standalone*.jar.

I will update the patch to remove it from beeline script so that it doesn't 
throw the error. 


was (Author: prasanth_j):
Yeah. Looks like that's the case. I haven't seen any issues by not include 
jdbc-standalone*.jar.

I will update the patch to remove it from beeline script so that it doesn't 
throw the error. 

> beeline should look for jdbc standalone jar in dist/jdbc dir instead of 
> dist/lib
> 
>
> Key: HIVE-14223
> URL: https://issues.apache.org/jira/browse/HIVE-14223
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.0.1, 2.2.0, 2.1.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14223.1.patch
>
>
> HIVE-13134 changed the jdbc-standalone jar path to dist/jdbc instead of 
> dist/lib. beeline.sh still looks for the jar in dist/lib which throws the 
> following error
> {code}
> ls: cannot access /work/hive2/lib/hive-jdbc-*-standalone.jar: No such file or 
> directory
> {code}
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14223) beeline should look for jdbc standalone jar in dist/jdbc dir instead of dist/lib

2016-07-12 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374392#comment-15374392
 ] 

Prasanth Jayachandran commented on HIVE-14223:
--

Yeah. Looks like that's the case. I haven't seen any issues by not include 
jdbc-standalone*.jar.

I will update the patch to remove it from beeline script so that it doesn't 
throw the error. 

> beeline should look for jdbc standalone jar in dist/jdbc dir instead of 
> dist/lib
> 
>
> Key: HIVE-14223
> URL: https://issues.apache.org/jira/browse/HIVE-14223
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.0.1, 2.2.0, 2.1.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14223.1.patch
>
>
> HIVE-13134 changed the jdbc-standalone jar path to dist/jdbc instead of 
> dist/lib. beeline.sh still looks for the jar in dist/lib which throws the 
> following error
> {code}
> ls: cannot access /work/hive2/lib/hive-jdbc-*-standalone.jar: No such file or 
> directory
> {code}
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14223) beeline should look for jdbc standalone jar in dist/jdbc dir instead of dist/lib

2016-07-12 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374378#comment-15374378
 ] 

Gopal V commented on HIVE-14223:


Actually, beeline doesn't need jdbc-standalone.jar to run queries. 

Since the standalone jar has shaded hadoop-* jars, that's generally going to 
complicate the whole "yarn jar" part at the top.

> beeline should look for jdbc standalone jar in dist/jdbc dir instead of 
> dist/lib
> 
>
> Key: HIVE-14223
> URL: https://issues.apache.org/jira/browse/HIVE-14223
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.0.1, 2.2.0, 2.1.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14223.1.patch
>
>
> HIVE-13134 changed the jdbc-standalone jar path to dist/jdbc instead of 
> dist/lib. beeline.sh still looks for the jar in dist/lib which throws the 
> following error
> {code}
> ls: cannot access /work/hive2/lib/hive-jdbc-*-standalone.jar: No such file or 
> directory
> {code}
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14139) NPE dropping permanent function

2016-07-12 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-14139:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks [~sershe] for the review.

> NPE dropping permanent function
> ---
>
> Key: HIVE-14139
> URL: https://issues.apache.org/jira/browse/HIVE-14139
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: 2.2.0
>
> Attachments: HIVE-14139.1.patch, HIVE-14139.2.patch, 
> HIVE-14139.3.patch, HIVE-14139.4.patch
>
>
> To reproduce:
> 1. Start a CLI session and create a permanent function.
> 2. Exit current CLI session.
> 3. Start a new CLI session and drop the function.
> Stack trace:
> {noformat}
> FAILED: error during drop function: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.removePersistentFunctionUnderLock(Registry.java:513)
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.unregisterFunction(Registry.java:501)
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.unregisterPermanentFunction(FunctionRegistry.java:1532)
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionTask.dropPermanentFunction(FunctionTask.java:228)
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:95)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1860)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1564)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1316)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1085)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1073)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-07-12 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13974:

Status: Patch Available  (was: In Progress)

> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 2.1.0, 1.3.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, 
> HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, 
> HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, 
> HIVE-13974.09.patch, HIVE-13974.091.patch, HIVE-13974.092.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-07-12 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13974:

Attachment: HIVE-13974.092.patch

> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, 
> HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, 
> HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, 
> HIVE-13974.09.patch, HIVE-13974.091.patch, HIVE-13974.092.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14205) Hive doesn't support union type with AVRO file format

2016-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374366#comment-15374366
 ] 

Hive QA commented on HIVE-14205:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12817392/HIVE-14205.1.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10314 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_masking_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_interval_arithmetic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_interval_arithmetic
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/489/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/489/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-489/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12817392 - PreCommit-HIVE-MASTER-Build

> Hive doesn't support union type with AVRO file format
> -
>
> Key: HIVE-14205
> URL: https://issues.apache.org/jira/browse/HIVE-14205
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-14205.1.patch
>
>
> Reproduce steps:
> {noformat}
> hive> CREATE TABLE avro_union_test
> > PARTITIONED BY (p int)
> > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> > STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> > OUTPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> > TBLPROPERTIES ('avro.schema.literal'='{
> >"type":"record",
> >"name":"nullUnionTest",
> >"fields":[
> >   {
> >  "name":"value",
> >  "type":[
> > "null",
> > "int",
> > "long"
> >  ],
> >  "default":null
> >   }
> >]
> > }');
> OK
> Time taken: 0.105 seconds
> hive> alter table avro_union_test add partition (p=1);
> OK
> Time taken: 0.093 seconds
> hive> select * from avro_union_test;
> FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: 
> Failed with exception Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported 
> yet.java.lang.RuntimeException: Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported yet.
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581)
>   at 
> 

[jira] [Commented] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-07-12 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374363#comment-15374363
 ] 

Matt McCline commented on HIVE-13974:
-

Ok, restarting with Owen's new HIVE-14004 patch as a base.

> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 1.3.0, 2.1.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, 
> HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, 
> HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, 
> HIVE-13974.09.patch, HIVE-13974.091.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

2016-07-12 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-13974:

Status: In Progress  (was: Patch Available)

> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---
>
> Key: HIVE-13974
> URL: https://issues.apache.org/jira/browse/HIVE-13974
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, ORC, Transactions
>Affects Versions: 2.1.0, 1.3.0, 2.2.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Blocker
> Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, 
> HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, 
> HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, 
> HIVE-13974.09.patch, HIVE-13974.091.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14195) HiveMetaStoreClient getFunction() does not throw NoSuchObjectException

2016-07-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14195:

   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Committed everywhere. Thanks for the contribution!

> HiveMetaStoreClient getFunction() does not throw NoSuchObjectException
> --
>
> Key: HIVE-14195
> URL: https://issues.apache.org/jira/browse/HIVE-14195
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14195.2.patch, HIVE-14195.patch
>
>
> HiveMetaStoreClient getFunction(dbName, funcName) does not throw 
> NoSuchObjectException when no function with funcName exists in the db. 
> Instead, I need to search the MetaException message for 
> 'NoSuchObjectException'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14111) better concurrency handling for TezSessionState - part I

2016-07-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14111:

   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Committed everywhere. Thanks for the review!

> better concurrency handling for TezSessionState - part I
> 
>
> Key: HIVE-14111
> URL: https://issues.apache.org/jira/browse/HIVE-14111
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14111.01.patch, HIVE-14111.02.patch, 
> HIVE-14111.03.patch, HIVE-14111.04.patch, HIVE-14111.05.patch, 
> HIVE-14111.06.patch, HIVE-14111.patch, sessionPoolNotes.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-14210) ExecDriver should call jobclient.close() to trigger cleanup

2016-07-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-14210.
-
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   1.2.2
   1.3.0

Committed everywhere. Thanks for the contribution!

> ExecDriver should call jobclient.close() to trigger cleanup
> ---
>
> Key: HIVE-14210
> URL: https://issues.apache.org/jira/browse/HIVE-14210
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Affects Versions: 1.2.1, 2.0.0, 2.1.0
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
> Fix For: 1.3.0, 1.2.2, 2.2.0, 2.1.1
>
> Attachments: HIVE-14210.1.patch, HIVE-14210.patch
>
>
> We found an issue in a customer environment where the HS2 crashed after a few 
> days and the Java core dump contained several thousands of truststore 
> reloader threads:
> "Truststore reloader thread" #126 daemon prio=5 os_prio=0 
> tid=0x7f680d2e3000 nid=0x98fd waiting on 
> condition [0x7f67e482c000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run
> (ReloadingX509TrustManager.java:225)
> at java.lang.Thread.run(Thread.java:745)
> We found the issue to be caused by a bug in Hadoop where the 
> TimelineClientImpl is not destroying the SSLFactory if SSL is enabled in 
> Hadoop and the timeline server is running. I opened YARN-5309 which has more 
> details on the problem, and a patch was submitted a few days back.
> In addition to the changes in Hadoop, there are a couple of Hive changes 
> required:
> - ExecDriver needs to call jobclient.close() to trigger the clean-up of the 
> resources after the submitted job is done/failed
> - Hive needs to pick up a newer release of Hadoop to pick up MAPREDUCE-6618 
> and MAPREDUCE-6621 that fixed issues with calling jobclient.close(). Both 
> fixes are included in Hadoop 2.6.4. 
> However, since we also need to pick up YARN-5309, we need to wait for a new 
> release of Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14218) LLAP: ACL validation fails if the user name is different from principal user name

2016-07-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14218:

   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Committed everywhere. Thanks for the review!

> LLAP: ACL validation fails if the user name is different from principal user 
> name
> -
>
> Key: HIVE-14218
> URL: https://issues.apache.org/jira/browse/HIVE-14218
> Project: Hive
>  Issue Type: Bug
>Reporter: Shraddha Sumit
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14218.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14139) NPE dropping permanent function

2016-07-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374314#comment-15374314
 ] 

Sergey Shelukhin commented on HIVE-14139:
-

+1, sorry for the delay

> NPE dropping permanent function
> ---
>
> Key: HIVE-14139
> URL: https://issues.apache.org/jira/browse/HIVE-14139
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-14139.1.patch, HIVE-14139.2.patch, 
> HIVE-14139.3.patch, HIVE-14139.4.patch
>
>
> To reproduce:
> 1. Start a CLI session and create a permanent function.
> 2. Exit current CLI session.
> 3. Start a new CLI session and drop the function.
> Stack trace:
> {noformat}
> FAILED: error during drop function: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.removePersistentFunctionUnderLock(Registry.java:513)
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.unregisterFunction(Registry.java:501)
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.unregisterPermanentFunction(FunctionRegistry.java:1532)
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionTask.dropPermanentFunction(FunctionTask.java:228)
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:95)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1860)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1564)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1316)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1085)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1073)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14223) beeline should look for jdbc standalone jar in dist/jdbc dir instead of dist/lib

2016-07-12 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14223:
-
Description: 
HIVE-13134 changed the jdbc-standalone jar path to dist/jdbc instead of 
dist/lib. beeline.sh still looks for the jar in dist/lib which throws the 
following error

{code}
ls: cannot access /work/hive2/lib/hive-jdbc-*-standalone.jar: No such file or 
directory
{code}

NO PRECOMMIT TESTS

  was:
HIVE-13134 changed the jdbc-standalone jar path to dist/jdbc instead of 
dist/lib. beeline.sh still looks for the jar in dist/lib which throws the 
following error

{code}
ls: cannot access /work/hive2/lib/hive-jdbc-*-standalone.jar: No such file or 
directory
{code}


> beeline should look for jdbc standalone jar in dist/jdbc dir instead of 
> dist/lib
> 
>
> Key: HIVE-14223
> URL: https://issues.apache.org/jira/browse/HIVE-14223
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.0.1, 2.2.0, 2.1.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14223.1.patch
>
>
> HIVE-13134 changed the jdbc-standalone jar path to dist/jdbc instead of 
> dist/lib. beeline.sh still looks for the jar in dist/lib which throws the 
> following error
> {code}
> ls: cannot access /work/hive2/lib/hive-jdbc-*-standalone.jar: No such file or 
> directory
> {code}
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14223) beeline should look for jdbc standalone jar in dist/jdbc dir instead of dist/lib

2016-07-12 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14223:
-
Attachment: HIVE-14223.1.patch

[~sseth]/[~thejas] Can someone please take a look?

> beeline should look for jdbc standalone jar in dist/jdbc dir instead of 
> dist/lib
> 
>
> Key: HIVE-14223
> URL: https://issues.apache.org/jira/browse/HIVE-14223
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.0.1, 2.2.0, 2.1.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14223.1.patch
>
>
> HIVE-13134 changed the jdbc-standalone jar path to dist/jdbc instead of 
> dist/lib. beeline.sh still looks for the jar in dist/lib which throws the 
> following error
> {code}
> ls: cannot access /work/hive2/lib/hive-jdbc-*-standalone.jar: No such file or 
> directory
> {code}
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14213) Add timeouts for various components in llap status check

2016-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374252#comment-15374252
 ] 

Hive QA commented on HIVE-14213:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12817342/HIVE-14213.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10314 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_masking_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_interval_arithmetic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_interval_arithmetic
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/488/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/488/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-488/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12817342 - PreCommit-HIVE-MASTER-Build

> Add timeouts for various components in llap status check
> 
>
> Key: HIVE-14213
> URL: https://issues.apache.org/jira/browse/HIVE-14213
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14213.01.patch
>
>
> The llapstatus check connects to various compoennts - YARN, HDFS via Slider, 
> ZooKeeper. If either of these components are down - the command can take a 
> long time to exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10875) Select query with view in subquery adds underlying table as direct input

2016-07-12 Thread niklaus xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374215#comment-15374215
 ] 

niklaus xiao commented on HIVE-10875:
-

Seems this query has the same issue
{code}
select * from V union all select * from V;
{/code}

[~thejas] Can you take a look?

> Select query with view in subquery adds underlying table as direct input
> 
>
> Key: HIVE-10875
> URL: https://issues.apache.org/jira/browse/HIVE-10875
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.2.1
>
> Attachments: HIVE-10875.1.patch, HIVE-10875.2.patch
>
>
> In the following case, 
> {code}
> create view V as select * from T;
> select * from (select * from V) A;
> {code}
> The semantic analyzer inputs contain input table T as a direct input instead 
> of adding it as an indirect input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-10875) Select query with view in subquery adds underlying table as direct input

2016-07-12 Thread niklaus xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374215#comment-15374215
 ] 

niklaus xiao edited comment on HIVE-10875 at 7/13/16 2:36 AM:
--

Seems this query has the same issue
{code}
select * from V union all select * from V;
{code}

[~thejas] Can you take a look?


was (Author: niklaus.xiao):
Seems this query has the same issue
{code}
select * from V union all select * from V;
{/code}

[~thejas] Can you take a look?

> Select query with view in subquery adds underlying table as direct input
> 
>
> Key: HIVE-10875
> URL: https://issues.apache.org/jira/browse/HIVE-10875
> Project: Hive
>  Issue Type: Bug
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 1.2.1
>
> Attachments: HIVE-10875.1.patch, HIVE-10875.2.patch
>
>
> In the following case, 
> {code}
> create view V as select * from T;
> select * from (select * from V) A;
> {code}
> The semantic analyzer inputs contain input table T as a direct input instead 
> of adding it as an indirect input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13873) Column pruning for nested fields

2016-07-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374184#comment-15374184
 ] 

Xuefu Zhang commented on HIVE-13873:


[~Ferd], thanks for working on this. Patch looks good for the initial cut as I 
went through the patch. Here I have a couple of immature thoughts to share with 
you:

1. nested column pruning should goes beyond just select op or groupby op. For 
instance, 
{code}
select msg.a from t where msg.b = 'x';
{code}
In this case, parquet reader should only read a and b from msg field. Thus, I 
think we need to consider expressions from more operators.

2. Secondly, there may need a consolidation/merging process in determining 
finally read schema. For instance,
{code}
select msg from t where msg.a='x';
{code}
In this case, the projected column should be just msg rather than msg + msg.a.

3. While it's fine to support just struct at first, we may need to consider how 
to find a more extensible way to pass the projected fields to the reader to 
support other types (array and map). I have no idea on this, so love to hear 
your thoughts.


> Column pruning for nested fields
> 
>
> Key: HIVE-13873
> URL: https://issues.apache.org/jira/browse/HIVE-13873
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Reporter: Xuefu Zhang
>Assignee: Ferdinand Xu
> Attachments: HIVE-13873.wip.patch
>
>
> Some columnar file formats such as Parquet store fields in struct type also 
> column by column using encoding described in Google Dramel pager. It's very 
> common in big data where data are stored in structs while queries only needs 
> a subset of the the fields in the structs. However, presently Hive still 
> needs to read the whole struct regardless whether all fields are selected. 
> Therefore, pruning unwanted sub-fields in struct or nested fields at file 
> reading time would be a big performance boost for such scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14219) LLAP external client on secure cluster: Protocol interface org.apache.hadoop.hive.llap.protocol.LlapTaskUmbilicalProtocol is not known

2016-07-12 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374165#comment-15374165
 ] 

Jason Dere commented on HIVE-14219:
---

yep.

> LLAP external client on secure cluster: Protocol interface 
> org.apache.hadoop.hive.llap.protocol.LlapTaskUmbilicalProtocol is not known
> --
>
> Key: HIVE-14219
> URL: https://issues.apache.org/jira/browse/HIVE-14219
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-14219.1.patch
>
>
> {noformat}
> 2016-07-07T23:10:35,249 INFO  [TaskHeartbeatThread[]]: task.TezTaskRunner2 
> (:()) - TaskReporter reporter error which will cause the task to fail
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  Protocol interface 
> org.apache.hadoop.hive.llap.protocol.LlapTaskUmbilicalProtocol is not known.
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1551)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1495)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1395)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:241)
>   at com.sun.proxy.$Proxy39.heartbeat(Unknown Source)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.heartbeat(LlapTaskReporter.java:280)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:202)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:139)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14219) LLAP external client on secure cluster: Protocol interface org.apache.hadoop.hive.llap.protocol.LlapTaskUmbilicalProtocol is not known

2016-07-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374154#comment-15374154
 ] 

Sergey Shelukhin commented on HIVE-14219:
-

+1. It's the same constant value right?

> LLAP external client on secure cluster: Protocol interface 
> org.apache.hadoop.hive.llap.protocol.LlapTaskUmbilicalProtocol is not known
> --
>
> Key: HIVE-14219
> URL: https://issues.apache.org/jira/browse/HIVE-14219
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-14219.1.patch
>
>
> {noformat}
> 2016-07-07T23:10:35,249 INFO  [TaskHeartbeatThread[]]: task.TezTaskRunner2 
> (:()) - TaskReporter reporter error which will cause the task to fail
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  Protocol interface 
> org.apache.hadoop.hive.llap.protocol.LlapTaskUmbilicalProtocol is not known.
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1551)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1495)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1395)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:241)
>   at com.sun.proxy.$Proxy39.heartbeat(Unknown Source)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.heartbeat(LlapTaskReporter.java:280)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:202)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:139)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14137) Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty tables

2016-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374137#comment-15374137
 ] 

Hive QA commented on HIVE-14137:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12817305/HIVE-14137.6.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10315 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_masking_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_interval_arithmetic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_interval_arithmetic
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/487/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/487/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-487/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12817305 - PreCommit-HIVE-MASTER-Build

> Hive on Spark throws FileAlreadyExistsException for jobs with multiple empty 
> tables
> ---
>
> Key: HIVE-14137
> URL: https://issues.apache.org/jira/browse/HIVE-14137
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14137.1.patch, HIVE-14137.2.patch, 
> HIVE-14137.3.patch, HIVE-14137.4.patch, HIVE-14137.5.patch, 
> HIVE-14137.6.patch, HIVE-14137.patch
>
>
> The following queries:
> {code}
> -- Setup
> drop table if exists empty1;
> create table empty1 (col1 bigint) stored as parquet tblproperties 
> ('parquet.compress'='snappy');
> drop table if exists empty2;
> create table empty2 (col1 bigint, col2 bigint) stored as parquet 
> tblproperties ('parquet.compress'='snappy');
> drop table if exists empty3;
> create table empty3 (col1 bigint) stored as parquet tblproperties 
> ('parquet.compress'='snappy');
> -- All empty HDFS directories.
> -- Fails with [08S01]: Error while processing statement: FAILED: Execution 
> Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask.
> select empty1.col1
> from empty1
> inner join empty2
> on empty2.col1 = empty1.col1
> inner join empty3
> on empty3.col1 = empty2.col2;
> -- Two empty HDFS directories.
> -- Create an empty file in HDFS.
> insert into empty1 select * from empty1 where false;
> -- Same query fails with [08S01]: Error while processing statement: FAILED: 
> Execution Error, return code 3 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.
> select empty1.col1
> from empty1
> inner join empty2
> on empty2.col1 = empty1.col1
> inner join empty3
> on empty3.col1 = empty2.col2;
> -- One empty HDFS directory.
> -- Create an empty file in HDFS.
> insert into empty2 select * from empty2 where false;
> -- Same query succeeds.
> select empty1.col1
> from empty1
> inner join empty2
> on empty2.col1 = empty1.col1
> inner join empty3
> on empty3.col1 = empty2.col2;
> {code}
> Will result in the following exception:
> {code}
> org.apache.hadoop.fs.FileAlreadyExistsException: 
> /tmp/hive/hive/1f3837aa-9407-4780-92b1-42a66d205139/hive_2016-06-24_15-45-23_206_79177714958655528-2/-mr-10004/0/emptyFile
>  for client 172.26.14.151 already exists
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2784)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2676)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2561)
>   at 
> 

[jira] [Commented] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER

2016-07-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374131#comment-15374131
 ] 

Ashutosh Chauhan commented on HIVE-14221:
-

lets not change config in HiveConf, but only in data/conf/hive-site.xml for 
tests.

> set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
> 
>
> Key: HIVE-14221
> URL: https://issues.apache.org/jira/browse/HIVE-14221
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-14221.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-12 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374022#comment-15374022
 ] 

Eugene Koifman commented on HIVE-13369:
---

The attached patch checks "base_n" files against ValidTxnList to see there are 
any open txns with id < n.
If so, it looks for a different base_n file.
If it runs out of base files, it checks if there are delta files still present 
that contain all the requisite history.
If not, it raises an error.
This is suitable to ensure correctness for current autoCommit=true mode.
Strictly speaking this analysis should only care about 'open' txns but 
ValidTxnList doesn't distinguish between open and aborted, so this may generate 
a false positive error. (In practice it's very unlikely).

[~owen.omalley] could you review please?

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, 
> HIVE-13369.3.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 

[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Status: Patch Available  (was: Open)

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, 
> HIVE-13369.3.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Attachment: HIVE-13369.3.patch

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch, 
> HIVE-13369.3.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13369) AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing the "best" base file

2016-07-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13369:
--
Status: Open  (was: Patch Available)

> AcidUtils.getAcidState() is not paying attention toValidTxnList when choosing 
> the "best" base file
> --
>
> Key: HIVE-13369
> URL: https://issues.apache.org/jira/browse/HIVE-13369
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13369.1.patch, HIVE-13369.2.patch
>
>
> The JavaDoc on getAcidState() reads, in part:
> "Note that because major compactions don't
>preserve the history, we can't use a base directory that includes a
>transaction id that we must exclude."
> which is correct but there is nothing in the code that does this.
> And if we detect a situation where txn X must be excluded but and there are 
> deltas that contain X, we'll have to abort the txn.  This can't (reasonably) 
> happen with auto commit mode, but with multi statement txns it's possible.
> Suppose some long running txn starts and lock in snapshot at 17 (HWM).  An 
> hour later it decides to access some partition for which all txns < 20 (for 
> example) have already been compacted (i.e. GC'd).  
> ==
> Here is a more concrete example.  Let's say the file for table A are as 
> follows and created in the order listed.
> delta_4_4
> delta_5_5
> delta_4_5
> base_5
> delta_16_16
> delta_17_17
> base_17  (for example user ran major compaction)
> let's say getAcidState() is called with ValidTxnList(20:16), i.e. with HWM=20 
> and ExceptionList=<16>
> Assume that all txns <= 20 commit.
> Reader can't use base_17 because it has result of txn16.  So it should chose 
> base_5 "TxnBase bestBase" in _getChildState()_.
> Then the reset of the logic in _getAcidState()_ should choose delta_16_16 and 
> delta_17_17 in _Directory_ object.  This would represent acceptable snapshot 
> for such reader.
> The issue is if at the same time the Cleaner process is running.  It will see 
> everything with txnid<17 as obsolete.  Then it will check lock manger state 
> and decide to delete (as there may not be any locks in LM for table A).  The 
> order in which the files are deleted is undefined right now.  It may delete 
> delta_16_16 and delta_17_17 first and right at this moment the read request 
> with ValidTxnList(20:16) arrives (such snapshot may have bee locked in by 
> some multi-stmt txn that started some time ago.  It acquires locks after the 
> Cleaner checks LM state and calls getAcidState(). This request will choose 
> base_5 but it won't see delta_16_16 and delta_17_17 and thus return the 
> snapshot w/o modifications made by those txns.
> [This is not possible currently since we only support autoCommit=true.  The 
> reason is the a query (0) opens txn (if appropriate), (1) acquires locks, (2) 
> locks in the snapshot.  The cleaner won't delete anything for a given 
> compaction (partition) if there are locks on it.  Thus for duration of the 
> transaction, nothing will be deleted so it's safe to use base_5]
> This is a subtle race condition but possible.
> 1. So the safest thing to do to ensure correctness is to use the latest 
> base_x as the "best" and check against exceptions in ValidTxnList and throw 
> an exception if there is an exception <=x.
> 2. A better option is to keep 2 exception lists: aborted and open and only 
> throw if there is an open txn <=x.  Compaction throws away data from aborted 
> txns and thus there is no harm using base with aborted txns in its range.
> 3. You could make each txn record the lowest open txn id at its start and 
> prevent the cleaner from cleaning anything delta with id range that includes 
> this open txn id for any txn that is still running.  This has a drawback of 
> potentially delaying GC of old files for arbitrarily long periods.  So this 
> should be a user config choice.   The implementation is not trivial.
> I would go with 1 now and do 2/3 together with multi-statement txn work.
> Side note:  if 2 deltas have overlapping ID range, then 1 must be a subset of 
> the other



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14213) Add timeouts for various components in llap status check

2016-07-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373976#comment-15373976
 ] 

Sergey Shelukhin commented on HIVE-14213:
-

What I mean is that you use the new configs (or defaults) to set the component 
configs. Why not have the user set the component configs directly, and set them 
to defaults only if not already set?

> Add timeouts for various components in llap status check
> 
>
> Key: HIVE-14213
> URL: https://issues.apache.org/jira/browse/HIVE-14213
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14213.01.patch
>
>
> The llapstatus check connects to various compoennts - YARN, HDFS via Slider, 
> ZooKeeper. If either of these components are down - the command can take a 
> long time to exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14209) Add some logging info for session and operation management

2016-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373944#comment-15373944
 ] 

Hive QA commented on HIVE-14209:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12817264/HIVE-14209.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 29 failed/errored test(s), 10315 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_masking_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_interval_arithmetic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_gby_empty
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_udf_udaf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_union
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_windowing
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cte_mat_5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge12
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_vec_mapwork_part_all_primitive
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_text_nonvec_mapwork_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_temp_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union_with_udf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_interval_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_interval_arithmetic
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_not
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/486/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/486/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-486/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 29 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12817264 - PreCommit-HIVE-MASTER-Build

> Add some logging info for session and operation management
> --
>
> Key: HIVE-14209
> URL: https://issues.apache.org/jira/browse/HIVE-14209
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Minor
> Attachments: HIVE-14209.1.patch
>
>
> It's hard to track the session and operation open and close in multiple user 
> env. Add some logging info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER

2016-07-12 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373923#comment-15373923
 ] 

Pengcheng Xiong commented on HIVE-14221:


[~ashutoshc], could u take a look? Thanks.

> set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
> 
>
> Key: HIVE-14221
> URL: https://issues.apache.org/jira/browse/HIVE-14221
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-14221.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14213) Add timeouts for various components in llap status check

2016-07-12 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373913#comment-15373913
 ] 

Siddharth Seth commented on HIVE-14213:
---

bq. Why do we need a separate set of config settings? 
That's really only for the case where defaults are incorrect. A bunch of these 
settings are common to multiple commands. e.g. the yarn logs command, or yarn 
application -list would use the parameters for retry. Similarly various dfs 
commands. I don't think the main config settings can be changed in 
yarn-site/core-site just for this command - hence the new config variables.

bq. On the same note, if the component settings are already set, and the new 
ones are not set, this will override them with defaults. Perhaps we can just 
have the default constants for the original parameters (from YARN etc.), and 
set them if not already set? If the user wants to change them they can just set 
the originals too
Didn't quite understand this. If the new configs are set - they'll be used. 
Otherwise the defaults will be used. The defaults are supposed to be good 
enough.

> Add timeouts for various components in llap status check
> 
>
> Key: HIVE-14213
> URL: https://issues.apache.org/jira/browse/HIVE-14213
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14213.01.patch
>
>
> The llapstatus check connects to various compoennts - YARN, HDFS via Slider, 
> ZooKeeper. If either of these components are down - the command can take a 
> long time to exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER

2016-07-12 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14221:
---
Status: Patch Available  (was: Open)

> set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
> 
>
> Key: HIVE-14221
> URL: https://issues.apache.org/jira/browse/HIVE-14221
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-14221.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER

2016-07-12 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14221:
---
Attachment: HIVE-14221.01.patch

> set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
> 
>
> Key: HIVE-14221
> URL: https://issues.apache.org/jira/browse/HIVE-14221
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-14221.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER

2016-07-12 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-14221:
--

Assignee: Pengcheng Xiong

> set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
> 
>
> Key: HIVE-14221
> URL: https://issues.apache.org/jira/browse/HIVE-14221
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14063) beeline to auto connect to the HiveServer2

2016-07-12 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373903#comment-15373903
 ] 

Vihang Karajgaonkar commented on HIVE-14063:


Adding password to the file could be a security risk. So we should not support 
password with this configuration file. In case of Kerberos environment, the 
file could provide the principal. In case of LDAP environment, the password 
should be prompted to the user when connection is initiated. In default mode of 
none, an empty password should suffice to connect to the hs2 (so we can skip 
the password prompt for that mode too)

> beeline to auto connect to the HiveServer2
> --
>
> Key: HIVE-14063
> URL: https://issues.apache.org/jira/browse/HIVE-14063
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: beeline.conf.template
>
>
> Currently one has to give an jdbc:hive2 url in order for Beeline to connect a 
> hiveserver2 instance. It would be great if Beeline can get the info somehow 
> (from a properties file at a well-known location?) and connect automatically 
> if user doesn't specify such a url. If the properties file is not present, 
> then beeline would expect user to provide the url and credentials using 
> !connect or ./beeline -u .. commands
> While Beeline is flexible (being a mere JDBC client), most environments would 
> have just a single HS2. Having users to manually connect into this via either 
> "beeline ~/.propsfile" or -u or !connect statements is lowering the 
> experience part.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-12 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373876#comment-15373876
 ] 

Wei Zheng commented on HIVE-13934:
--

patch 8. But many tests will still fail because currently hive is using tez 
0.8.3, while TEZ-3286 is available only in 0.8.4. [~sseth] [~hitesh]

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, 
> HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch, 
> HIVE-13934.7.patch, HIVE-13934.8.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-12 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13934:
-
Status: Patch Available  (was: Open)

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, 
> HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch, 
> HIVE-13934.7.patch, HIVE-13934.8.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-12 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13934:
-
Status: Open  (was: Patch Available)

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, 
> HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch, 
> HIVE-13934.7.patch, HIVE-13934.8.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-12 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13934:
-
Attachment: HIVE-13934.8.patch

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, 
> HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch, 
> HIVE-13934.7.patch, HIVE-13934.8.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4797) Hive Lead/Lag OLAP Not Functioning

2016-07-12 Thread Derrick Austin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373839#comment-15373839
 ] 

Derrick Austin commented on HIVE-4797:
--

It appears to be bad documentation. With LAG/LEAD, use parameters in the 
function instead.

For example, the following appears to work correctly:

LAG(my_field, 1, 0) OVER (ORDER BY b) AS lag_1_month_ago,
LAG(my_field, 12, 0) OVER (ORDER BY b) AS lag_12_months_ago

> Hive Lead/Lag OLAP Not Functioning
> --
>
> Key: HIVE-4797
> URL: https://issues.apache.org/jira/browse/HIVE-4797
> Project: Hive
>  Issue Type: Bug
>  Components: OLAP
>Affects Versions: 0.11.0
> Environment: Linux version 2.6.18-308.24.1.el5 
> (mockbu...@x86-022.build.eng.bos.redhat.com) (gcc version 4.1.2 20080704 (Red 
> Hat 4.1.2-52))
> Java 1.6.0_31
> Hadoop 1.2.0
> Hive 0.11.0
>Reporter: Joshua Lee
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Unable to use built in LAG/LEAD functionality. Following the example in 
> documentation at 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
>  leads to error. This leads me to believe that this is a bug rather than 
> something wrong with my query. Specifically:
> -- Set up database
> hive> create table lag_test(a int, b int, c string, d string) row format 
> delimited fields terminated by "\t";
> -- load test data using local file
> -- Run test query
> hive> SELECT a, LEAD(a) OVER (PARTITION BY b ORDER BY C ROWS BETWEEN CURRENT 
> ROW AND 1 FOLLOWING) FROM lag_test; -- copied from documentation
> FAILED: SemanticException Failed to breakup Windowing invocations into 
> Groups. At least 1 group must only depend on input columns. Also check for 
> circular dependencies.
> Underlying error: Expecting left window frame boundary for function 
> LEAD((TOK_TABLE_OR_COL a)) 
> org.apache.hadoop.hive.ql.parse.WindowingSpec$WindowSpec@39fe9830 as _wcol0 
> to be unbounded. Found : 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13258) LLAP: Add hdfs bytes read and spilled bytes to tez print summary

2016-07-12 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13258:
-
Attachment: HIVE-13258.6.patch

orc_llap.q was previously never part of MiniLlap tests. The golden file is bit 
outdated. Updated it in this patch. Also fixed other related failures.

> LLAP: Add hdfs bytes read and spilled bytes to tez print summary
> 
>
> Key: HIVE-13258
> URL: https://issues.apache.org/jira/browse/HIVE-13258
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13258.1.patch, HIVE-13258.1.patch, 
> HIVE-13258.2.patch, HIVE-13258.3.patch, HIVE-13258.4.patch, 
> HIVE-13258.5.patch, HIVE-13258.5.patch, HIVE-13258.6.patch, 
> llap-fs-counters-full-cache-hit.png, llap-fs-counters.png
>
>
> When printing counters to console it will be useful to print hdfs bytes read 
> and spilled bytes which will help with debugging issues faster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14074) RELOAD FUNCTION should update dropped functions

2016-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373745#comment-15373745
 ] 

Hive QA commented on HIVE-14074:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12817255/HIVE-14074.03.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10314 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_masking_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_interval_arithmetic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_interval_arithmetic
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/485/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/485/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-485/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12817255 - PreCommit-HIVE-MASTER-Build

> RELOAD FUNCTION should update dropped functions
> ---
>
> Key: HIVE-14074
> URL: https://issues.apache.org/jira/browse/HIVE-14074
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.1
>Reporter: Abdullah Yousufi
>Assignee: Abdullah Yousufi
> Fix For: 2.2.0
>
> Attachments: HIVE-14074.01.patch, HIVE-14074.02.patch, 
> HIVE-14074.03.patch
>
>
> Due to HIVE-2573, functions are stored in a per-session registry and only 
> loaded in from the metastore when hs2 or hive cli is started. Running RELOAD 
> FUNCTION in the current session is a way to force a reload of the functions, 
> so that changes that occurred in other running sessions will be reflected in 
> the current session, without having to restart the current session. However, 
> while functions that are created in other sessions will now appear in the 
> current session, functions that have been dropped are not removed from the 
> current session's registry. It seems inconsistent that created functions are 
> updated while dropped functions are not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release

2016-07-12 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373722#comment-15373722
 ] 

Owen O'Malley commented on HIVE-14007:
--

Those variables actually control Hive's use of ORC rather than ORC itself. Thus 
those variables only exist in Hive and aren't duplicated.

> Replace ORC module with ORC release
> ---
>
> Key: HIVE-14007
> URL: https://issues.apache.org/jira/browse/HIVE-14007
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.2.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
> Attachments: HIVE-14007.patch, HIVE-14007.patch, HIVE-14007.patch
>
>
> This completes moving the core ORC reader & writer to the ORC project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14004) Minor compaction produces ArrayIndexOutOfBoundsException: 7 in SchemaEvolution.getFileType

2016-07-12 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373699#comment-15373699
 ] 

Owen O'Malley commented on HIVE-14004:
--

I should give more details. The problem was that OrcInputFormat was modifying 
the passed in Options object and that ACID was reusing the Options object 
between the deltas. Thus, when some of the delta files had fewer columns, the 
include array wasn't long enough.

> Minor compaction produces ArrayIndexOutOfBoundsException: 7 in 
> SchemaEvolution.getFileType
> --
>
> Key: HIVE-14004
> URL: https://issues.apache.org/jira/browse/HIVE-14004
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Owen O'Malley
> Attachments: HIVE-14004.01.patch, HIVE-14004.02.patch, 
> HIVE-14004.03.patch, HIVE-14004.patch
>
>
> Easiest way to repro is to add TestTxnCommands2
> {noformat}
>   @Test
>   public void testCompactWithDelete() throws Exception {
> int[][] tableData = {{1,2},{3,4}};
> runStatementOnDriver("insert into " + Table.ACIDTBL + "(a,b) " + 
> makeValuesClause(tableData));
> runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MAJOR'");
> Worker t = new Worker();
> t.setThreadId((int) t.getId());
> t.setHiveConf(hiveConf);
> AtomicBoolean stop = new AtomicBoolean();
> AtomicBoolean looped = new AtomicBoolean();
> stop.set(true);
> t.init(stop, looped);
> t.run();
> runStatementOnDriver("delete from " + Table.ACIDTBL + " where b = 4");
> runStatementOnDriver("update " + Table.ACIDTBL + " set b = -2 where b = 
> 2");
> runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MINOR'");
> t.run();
>   }
> {noformat}
> to TestTxnCommands2 and run it.
> Test won't fail but if you look 
> in target/tmp/log/hive.log for the following exception (from Minor 
> compaction).
> {noformat}
> 2016-06-09T18:36:39,071 WARN  [Thread-190[]]: mapred.LocalJobRunner 
> (LocalJobRunner.java:run(560)) - job_local1233973168_0005
> java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) 
> ~[hadoop-mapreduce-client-common-2.6.1.jar:?]
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) 
> [hadoop-mapreduce-client-common-2.6.1.jar:?]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.orc.impl.SchemaEvolution.getFileType(SchemaEvolution.java:67) 
> ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2031)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:1716)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:1716)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.RecordReaderImpl.(RecordReaderImpl.java:208) 
> ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:63)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:365) 
> ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:207)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:508)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1977)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:630)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:609)
>  ~[classes/:?]
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
> at 
> 

[jira] [Commented] (HIVE-14004) Minor compaction produces ArrayIndexOutOfBoundsException: 7 in SchemaEvolution.getFileType

2016-07-12 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373668#comment-15373668
 ] 

Prasanth Jayachandran commented on HIVE-14004:
--

+1

> Minor compaction produces ArrayIndexOutOfBoundsException: 7 in 
> SchemaEvolution.getFileType
> --
>
> Key: HIVE-14004
> URL: https://issues.apache.org/jira/browse/HIVE-14004
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Owen O'Malley
> Attachments: HIVE-14004.01.patch, HIVE-14004.02.patch, 
> HIVE-14004.03.patch, HIVE-14004.patch
>
>
> Easiest way to repro is to add TestTxnCommands2
> {noformat}
>   @Test
>   public void testCompactWithDelete() throws Exception {
> int[][] tableData = {{1,2},{3,4}};
> runStatementOnDriver("insert into " + Table.ACIDTBL + "(a,b) " + 
> makeValuesClause(tableData));
> runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MAJOR'");
> Worker t = new Worker();
> t.setThreadId((int) t.getId());
> t.setHiveConf(hiveConf);
> AtomicBoolean stop = new AtomicBoolean();
> AtomicBoolean looped = new AtomicBoolean();
> stop.set(true);
> t.init(stop, looped);
> t.run();
> runStatementOnDriver("delete from " + Table.ACIDTBL + " where b = 4");
> runStatementOnDriver("update " + Table.ACIDTBL + " set b = -2 where b = 
> 2");
> runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MINOR'");
> t.run();
>   }
> {noformat}
> to TestTxnCommands2 and run it.
> Test won't fail but if you look 
> in target/tmp/log/hive.log for the following exception (from Minor 
> compaction).
> {noformat}
> 2016-06-09T18:36:39,071 WARN  [Thread-190[]]: mapred.LocalJobRunner 
> (LocalJobRunner.java:run(560)) - job_local1233973168_0005
> java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) 
> ~[hadoop-mapreduce-client-common-2.6.1.jar:?]
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) 
> [hadoop-mapreduce-client-common-2.6.1.jar:?]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.orc.impl.SchemaEvolution.getFileType(SchemaEvolution.java:67) 
> ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2031)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:1716)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:1716)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.RecordReaderImpl.(RecordReaderImpl.java:208) 
> ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:63)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:365) 
> ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:207)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:508)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1977)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:630)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:609)
>  ~[classes/:?]
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>  ~[hadoop-mapreduce-client-common-2.6.1.jar:?]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[?:1.7.0_71]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> 

[jira] [Updated] (HIVE-14219) LLAP external client on secure cluster: Protocol interface org.apache.hadoop.hive.llap.protocol.LlapTaskUmbilicalProtocol is not known

2016-07-12 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-14219:
--
Status: Patch Available  (was: Open)

> LLAP external client on secure cluster: Protocol interface 
> org.apache.hadoop.hive.llap.protocol.LlapTaskUmbilicalProtocol is not known
> --
>
> Key: HIVE-14219
> URL: https://issues.apache.org/jira/browse/HIVE-14219
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-14219.1.patch
>
>
> {noformat}
> 2016-07-07T23:10:35,249 INFO  [TaskHeartbeatThread[]]: task.TezTaskRunner2 
> (:()) - TaskReporter reporter error which will cause the task to fail
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  Protocol interface 
> org.apache.hadoop.hive.llap.protocol.LlapTaskUmbilicalProtocol is not known.
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1551)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1495)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1395)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:241)
>   at com.sun.proxy.$Proxy39.heartbeat(Unknown Source)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.heartbeat(LlapTaskReporter.java:280)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:202)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:139)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14219) LLAP external client on secure cluster: Protocol interface org.apache.hadoop.hive.llap.protocol.LlapTaskUmbilicalProtocol is not known

2016-07-12 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-14219:
--
Attachment: HIVE-14219.1.patch

Attaching patch. Ended up creating a separate PolicyProvider, using MR 
constants, to avoid having to pull in llap-tez/tez dependencies.

> LLAP external client on secure cluster: Protocol interface 
> org.apache.hadoop.hive.llap.protocol.LlapTaskUmbilicalProtocol is not known
> --
>
> Key: HIVE-14219
> URL: https://issues.apache.org/jira/browse/HIVE-14219
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-14219.1.patch
>
>
> {noformat}
> 2016-07-07T23:10:35,249 INFO  [TaskHeartbeatThread[]]: task.TezTaskRunner2 
> (:()) - TaskReporter reporter error which will cause the task to fail
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  Protocol interface 
> org.apache.hadoop.hive.llap.protocol.LlapTaskUmbilicalProtocol is not known.
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1551)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1495)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1395)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:241)
>   at com.sun.proxy.$Proxy39.heartbeat(Unknown Source)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.heartbeat(LlapTaskReporter.java:280)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:202)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:139)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14004) Minor compaction produces ArrayIndexOutOfBoundsException: 7 in SchemaEvolution.getFileType

2016-07-12 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-14004:
-
Attachment: HIVE-14004.patch

Here's the one line fix.

> Minor compaction produces ArrayIndexOutOfBoundsException: 7 in 
> SchemaEvolution.getFileType
> --
>
> Key: HIVE-14004
> URL: https://issues.apache.org/jira/browse/HIVE-14004
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Owen O'Malley
> Attachments: HIVE-14004.01.patch, HIVE-14004.02.patch, 
> HIVE-14004.03.patch, HIVE-14004.patch
>
>
> Easiest way to repro is to add TestTxnCommands2
> {noformat}
>   @Test
>   public void testCompactWithDelete() throws Exception {
> int[][] tableData = {{1,2},{3,4}};
> runStatementOnDriver("insert into " + Table.ACIDTBL + "(a,b) " + 
> makeValuesClause(tableData));
> runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MAJOR'");
> Worker t = new Worker();
> t.setThreadId((int) t.getId());
> t.setHiveConf(hiveConf);
> AtomicBoolean stop = new AtomicBoolean();
> AtomicBoolean looped = new AtomicBoolean();
> stop.set(true);
> t.init(stop, looped);
> t.run();
> runStatementOnDriver("delete from " + Table.ACIDTBL + " where b = 4");
> runStatementOnDriver("update " + Table.ACIDTBL + " set b = -2 where b = 
> 2");
> runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MINOR'");
> t.run();
>   }
> {noformat}
> to TestTxnCommands2 and run it.
> Test won't fail but if you look 
> in target/tmp/log/hive.log for the following exception (from Minor 
> compaction).
> {noformat}
> 2016-06-09T18:36:39,071 WARN  [Thread-190[]]: mapred.LocalJobRunner 
> (LocalJobRunner.java:run(560)) - job_local1233973168_0005
> java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) 
> ~[hadoop-mapreduce-client-common-2.6.1.jar:?]
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) 
> [hadoop-mapreduce-client-common-2.6.1.jar:?]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.orc.impl.SchemaEvolution.getFileType(SchemaEvolution.java:67) 
> ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2031)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:1716)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:1716)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.RecordReaderImpl.(RecordReaderImpl.java:208) 
> ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:63)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:365) 
> ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:207)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:508)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1977)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:630)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:609)
>  ~[classes/:?]
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>  ~[hadoop-mapreduce-client-common-2.6.1.jar:?]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[?:1.7.0_71]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> 

[jira] [Assigned] (HIVE-14004) Minor compaction produces ArrayIndexOutOfBoundsException: 7 in SchemaEvolution.getFileType

2016-07-12 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-14004:


Assignee: Owen O'Malley  (was: Matt McCline)

> Minor compaction produces ArrayIndexOutOfBoundsException: 7 in 
> SchemaEvolution.getFileType
> --
>
> Key: HIVE-14004
> URL: https://issues.apache.org/jira/browse/HIVE-14004
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Owen O'Malley
> Attachments: HIVE-14004.01.patch, HIVE-14004.02.patch, 
> HIVE-14004.03.patch, HIVE-14004.patch
>
>
> Easiest way to repro is to add TestTxnCommands2
> {noformat}
>   @Test
>   public void testCompactWithDelete() throws Exception {
> int[][] tableData = {{1,2},{3,4}};
> runStatementOnDriver("insert into " + Table.ACIDTBL + "(a,b) " + 
> makeValuesClause(tableData));
> runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MAJOR'");
> Worker t = new Worker();
> t.setThreadId((int) t.getId());
> t.setHiveConf(hiveConf);
> AtomicBoolean stop = new AtomicBoolean();
> AtomicBoolean looped = new AtomicBoolean();
> stop.set(true);
> t.init(stop, looped);
> t.run();
> runStatementOnDriver("delete from " + Table.ACIDTBL + " where b = 4");
> runStatementOnDriver("update " + Table.ACIDTBL + " set b = -2 where b = 
> 2");
> runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MINOR'");
> t.run();
>   }
> {noformat}
> to TestTxnCommands2 and run it.
> Test won't fail but if you look 
> in target/tmp/log/hive.log for the following exception (from Minor 
> compaction).
> {noformat}
> 2016-06-09T18:36:39,071 WARN  [Thread-190[]]: mapred.LocalJobRunner 
> (LocalJobRunner.java:run(560)) - job_local1233973168_0005
> java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) 
> ~[hadoop-mapreduce-client-common-2.6.1.jar:?]
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) 
> [hadoop-mapreduce-client-common-2.6.1.jar:?]
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.orc.impl.SchemaEvolution.getFileType(SchemaEvolution.java:67) 
> ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2031)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:1716)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.(TreeReaderFactory.java:1716)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077)
>  ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.orc.impl.RecordReaderImpl.(RecordReaderImpl.java:208) 
> ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:63)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:365) 
> ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:207)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:508)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1977)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:630)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:609)
>  ~[classes/:?]
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
> ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>  ~[hadoop-mapreduce-client-common-2.6.1.jar:?]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[?:1.7.0_71]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> 

[jira] [Commented] (HIVE-13822) TestPerfCliDriver throws warning in StatsSetupConst that JsonParser cannot parse COLUMN_STATS

2016-07-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373655#comment-15373655
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-13822:
--

These are 2 errors associated with the failures :
{code}
2016-07-12T13:40:26,485 ERROR [e9d17784-f33f-4422-8fe3-748aeba57491 main] 
ql.Driver: FAILED: RuntimeException Invalid Stats number of null > no of tuples
java.lang.RuntimeException: Invalid Stats number of null > no of tuples
at 
org.apache.hadoop.hive.ql.optimizer.calcite.stats.FilterSelectivityEstimator.visitCall(FilterSelectivityEstimator.java:97)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.stats.FilterSelectivityEstimator.visitCall(FilterSelectivityEstimator.java:43)
at org.apache.calcite.rex.RexCall.accept(RexCall.java:108)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.stats.FilterSelectivityEstimator.computeConjunctionSelectivity(FilterSelectivityEstimator.java:214)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.stats.FilterSelectivityEstimator.visitCall(FilterSelectivityEstimator.java:75)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.stats.FilterSelectivityEstimator.visitCall(FilterSelectivityEstimator.java:43)
at org.apache.calcite.rex.RexCall.accept(RexCall.java:108)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.stats.FilterSelectivityEstimator.estimateSelectivity(FilterSelectivityEstimator.java:54)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdSelectivity.getSelectivity(HiveRelMdSelectivity.java:55)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$1$1.invoke(ReflectiveRelMetadataProvider.java:213)
at com.sun.proxy.$Proxy123.getSelectivity(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.calcite.rel.metadata.ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(ChainedRelMetadataProvider.java:112)
at com.sun.proxy.$Proxy123.getSelectivity(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.calcite.rel.metadata.ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(ChainedRelMetadataProvider.java:112)
at com.sun.proxy.$Proxy123.getSelectivity(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.calcite.rel.metadata.CachingRelMetadataProvider$CachingInvocationHandler.invoke(CachingRelMetadataProvider.java:129)
at com.sun.proxy.$Proxy123.getSelectivity(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.calcite.rel.metadata.ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(ChainedRelMetadataProvider.java:112)
at com.sun.proxy.$Proxy123.getSelectivity(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.calcite.rel.metadata.CachingRelMetadataProvider$CachingInvocationHandler.invoke(CachingRelMetadataProvider.java:129)
at com.sun.proxy.$Proxy123.getSelectivity(Unknown Source)
at 
org.apache.calcite.rel.metadata.RelMetadataQuery.getSelectivity(RelMetadataQuery.java:234)
at 
org.apache.calcite.rel.metadata.RelMdUtil.estimateFilteredRows(RelMdUtil.java:718)
  

[jira] [Commented] (HIVE-14219) LLAP external client on secure cluster: Protocol interface org.apache.hadoop.hive.llap.protocol.LlapTaskUmbilicalProtocol is not known

2016-07-12 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373642#comment-15373642
 ] 

Jason Dere commented on HIVE-14219:
---

According to [~sershe] the LlapTaskUmbilicalExternalClient requires a 
server.refreshServiceAcl(), similar to LlapTaskCommunicator.

> LLAP external client on secure cluster: Protocol interface 
> org.apache.hadoop.hive.llap.protocol.LlapTaskUmbilicalProtocol is not known
> --
>
> Key: HIVE-14219
> URL: https://issues.apache.org/jira/browse/HIVE-14219
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> {noformat}
> 2016-07-07T23:10:35,249 INFO  [TaskHeartbeatThread[]]: task.TezTaskRunner2 
> (:()) - TaskReporter reporter error which will cause the task to fail
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
>  Protocol interface 
> org.apache.hadoop.hive.llap.protocol.LlapTaskUmbilicalProtocol is not known.
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1551)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1495)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1395)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:241)
>   at com.sun.proxy.$Proxy39.heartbeat(Unknown Source)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.heartbeat(LlapTaskReporter.java:280)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:202)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter$HeartbeatCallable.call(LlapTaskReporter.java:139)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373638#comment-15373638
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-13995:
--

https://reviews.apache.org/r/49965/

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.2.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Status: Patch Available  (was: Open)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Status: Open  (was: Patch Available)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14218) LLAP: ACL validation fails if the user name is different from principal user name

2016-07-12 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373630#comment-15373630
 ] 

Prasanth Jayachandran commented on HIVE-14218:
--

+1

> LLAP: ACL validation fails if the user name is different from principal user 
> name
> -
>
> Key: HIVE-14218
> URL: https://issues.apache.org/jira/browse/HIVE-14218
> Project: Hive
>  Issue Type: Bug
>Reporter: Shraddha Sumit
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14218.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14180) Disable LlapZookeeperRegistry ZK auth setup for external clients

2016-07-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14180:

Assignee: Jason Dere  (was: Sergey Shelukhin)

> Disable LlapZookeeperRegistry ZK auth setup for external clients
> 
>
> Key: HIVE-14180
> URL: https://issues.apache.org/jira/browse/HIVE-14180
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-14180.02.patch, HIVE-14180.1.patch
>
>
> {noformat}
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> java.io.IOException: Llap Kerberos keytab is empty
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getClient(LlapRegistryService.java:67)
> at 
> org.apache.hadoop.hive.llap.LlapBaseInputFormat.getServiceInstance(LlapBaseInputFormat.java:238)
> at 
> org.apache.hadoop.hive.llap.LlapBaseInputFormat.getRecordReader(LlapBaseInputFormat.java:142)
> at 
> org.apache.hadoop.hive.llap.LlapRowInputFormat.getRecordReader(LlapRowInputFormat.java:51)
> {noformat}
> Using the LLAP ZK registry in environments other than the LLAP daemon (such 
> as external LLAP clients), there should be a way to skip this ZK auth setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14180) Disable LlapZookeeperRegistry ZK auth setup for external clients

2016-07-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373601#comment-15373601
 ] 

Sergey Shelukhin commented on HIVE-14180:
-

This can probably be committed without HiveQA...

> Disable LlapZookeeperRegistry ZK auth setup for external clients
> 
>
> Key: HIVE-14180
> URL: https://issues.apache.org/jira/browse/HIVE-14180
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14180.02.patch, HIVE-14180.1.patch
>
>
> {noformat}
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> java.io.IOException: Llap Kerberos keytab is empty
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getClient(LlapRegistryService.java:67)
> at 
> org.apache.hadoop.hive.llap.LlapBaseInputFormat.getServiceInstance(LlapBaseInputFormat.java:238)
> at 
> org.apache.hadoop.hive.llap.LlapBaseInputFormat.getRecordReader(LlapBaseInputFormat.java:142)
> at 
> org.apache.hadoop.hive.llap.LlapRowInputFormat.getRecordReader(LlapRowInputFormat.java:51)
> {noformat}
> Using the LLAP ZK registry in environments other than the LLAP daemon (such 
> as external LLAP clients), there should be a way to skip this ZK auth setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14180) Disable LlapZookeeperRegistry ZK auth setup for external clients

2016-07-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14180:

Attachment: HIVE-14180.02.patch

again for HiveQA

> Disable LlapZookeeperRegistry ZK auth setup for external clients
> 
>
> Key: HIVE-14180
> URL: https://issues.apache.org/jira/browse/HIVE-14180
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14180.02.patch, HIVE-14180.1.patch
>
>
> {noformat}
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> java.io.IOException: Llap Kerberos keytab is empty
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getClient(LlapRegistryService.java:67)
> at 
> org.apache.hadoop.hive.llap.LlapBaseInputFormat.getServiceInstance(LlapBaseInputFormat.java:238)
> at 
> org.apache.hadoop.hive.llap.LlapBaseInputFormat.getRecordReader(LlapBaseInputFormat.java:142)
> at 
> org.apache.hadoop.hive.llap.LlapRowInputFormat.getRecordReader(LlapRowInputFormat.java:51)
> {noformat}
> Using the LLAP ZK registry in environments other than the LLAP daemon (such 
> as external LLAP clients), there should be a way to skip this ZK auth setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14180) Disable LlapZookeeperRegistry ZK auth setup for external clients

2016-07-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-14180:
---

Assignee: Sergey Shelukhin  (was: Jason Dere)

> Disable LlapZookeeperRegistry ZK auth setup for external clients
> 
>
> Key: HIVE-14180
> URL: https://issues.apache.org/jira/browse/HIVE-14180
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14180.1.patch
>
>
> {noformat}
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> java.io.IOException: Llap Kerberos keytab is empty
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.hive.llap.registry.impl.LlapRegistryService.getClient(LlapRegistryService.java:67)
> at 
> org.apache.hadoop.hive.llap.LlapBaseInputFormat.getServiceInstance(LlapBaseInputFormat.java:238)
> at 
> org.apache.hadoop.hive.llap.LlapBaseInputFormat.getRecordReader(LlapBaseInputFormat.java:142)
> at 
> org.apache.hadoop.hive.llap.LlapRowInputFormat.getRecordReader(LlapRowInputFormat.java:51)
> {noformat}
> Using the LLAP ZK registry in environments other than the LLAP daemon (such 
> as external LLAP clients), there should be a way to skip this ZK auth setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14111) better concurrency handling for TezSessionState - part I

2016-07-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373587#comment-15373587
 ] 

Sergey Shelukhin commented on HIVE-14111:
-

Probably not:
{noformat}
org.apache.hadoop.ipc.RemoteException: Cannot renew lease for 
DFSClient_NONMAPREDUCE_-930074842_1. Name node is in safe mode.
{noformat}

> better concurrency handling for TezSessionState - part I
> 
>
> Key: HIVE-14111
> URL: https://issues.apache.org/jira/browse/HIVE-14111
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14111.01.patch, HIVE-14111.02.patch, 
> HIVE-14111.03.patch, HIVE-14111.04.patch, HIVE-14111.05.patch, 
> HIVE-14111.06.patch, HIVE-14111.patch, sessionPoolNotes.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12706) Incorrect output from from_utc_timestamp()/to_utc_timestamp when local timezone has DST

2016-07-12 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373579#comment-15373579
 ] 

Jason Dere commented on HIVE-12706:
---

I noticed you already commented in HIVE-14161 about difference between EST vs 
America/New_York, but I think the case was similar here as well, the time zone 
had to be a timezone that actually switches during DST (like America/New_York) 
to see the issue.

> Incorrect output from from_utc_timestamp()/to_utc_timestamp when local 
> timezone has DST
> ---
>
> Key: HIVE-12706
> URL: https://issues.apache.org/jira/browse/HIVE-12706
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 2.0.0
>
> Attachments: HIVE-12706.1.patch
>
>
> Getting wrong output with the local timezone set to PST (which has DST). I 
> don't think this happens when the local timezone does not observe DST.
> {noformat}
> select from_utc_timestamp('2015-03-28 17:00:00', 'Europe/London')
> 2015-03-28 17:00:00
> select from_utc_timestamp('2015-03-28 18:00:00', 'Europe/London')
> 2015-03-28 19:00:00  <= Wrong, should be 2015-03-28 18:00:00
> select from_utc_timestamp('2015-03-28 19:00:00', 'Europe/London')
> 2015-03-28 20:00:00 <= Wrong, should be 2015-03-28 19:00:00
> {noformat}
> Also to_utc_timestamp():
> {noformat}
> select to_utc_timestamp('2015-03-28 17:00:00', 'Europe/London')
> 2015-03-28 17:00:00
> select to_utc_timestamp('2015-03-28 18:00:00', 'Europe/London')
> 2015-03-28 17:00:00 <= Wrong
> select to_utc_timestamp('2015-03-28 19:00:00', 'Europe/London')
> 2015-03-28 18:00:00 <= Wrong
> select to_utc_timestamp('2015-03-28 20:00:00', 'Europe/London')
> 2015-03-28 19:00:00 <= Wrong
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14210) ExecDriver should call jobclient.close() to trigger cleanup

2016-07-12 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-14210:

Summary: ExecDriver should call jobclient.close() to trigger cleanup  (was: 
ExecDriver should call SSLFactory truststore reloader threads leaking in 
HiveServer2)

> ExecDriver should call jobclient.close() to trigger cleanup
> ---
>
> Key: HIVE-14210
> URL: https://issues.apache.org/jira/browse/HIVE-14210
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Affects Versions: 1.2.1, 2.0.0, 2.1.0
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
> Attachments: HIVE-14210.1.patch, HIVE-14210.patch
>
>
> We found an issue in a customer environment where the HS2 crashed after a few 
> days and the Java core dump contained several thousands of truststore 
> reloader threads:
> "Truststore reloader thread" #126 daemon prio=5 os_prio=0 
> tid=0x7f680d2e3000 nid=0x98fd waiting on 
> condition [0x7f67e482c000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run
> (ReloadingX509TrustManager.java:225)
> at java.lang.Thread.run(Thread.java:745)
> We found the issue to be caused by a bug in Hadoop where the 
> TimelineClientImpl is not destroying the SSLFactory if SSL is enabled in 
> Hadoop and the timeline server is running. I opened YARN-5309 which has more 
> details on the problem, and a patch was submitted a few days back.
> In addition to the changes in Hadoop, there are a couple of Hive changes 
> required:
> - ExecDriver needs to call jobclient.close() to trigger the clean-up of the 
> resources after the submitted job is done/failed
> - Hive needs to pick up a newer release of Hadoop to pick up MAPREDUCE-6618 
> and MAPREDUCE-6621 that fixed issues with calling jobclient.close(). Both 
> fixes are included in Hadoop 2.6.4. 
> However, since we also need to pick up YARN-5309, we need to wait for a new 
> release of Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14210) ExecDriver should call SSLFactory truststore reloader threads leaking in HiveServer2

2016-07-12 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-14210:

Summary: ExecDriver should call SSLFactory truststore reloader threads 
leaking in HiveServer2  (was: SSLFactory truststore reloader threads leaking in 
HiveServer2)

> ExecDriver should call SSLFactory truststore reloader threads leaking in 
> HiveServer2
> 
>
> Key: HIVE-14210
> URL: https://issues.apache.org/jira/browse/HIVE-14210
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Affects Versions: 1.2.1, 2.0.0, 2.1.0
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
> Attachments: HIVE-14210.1.patch, HIVE-14210.patch
>
>
> We found an issue in a customer environment where the HS2 crashed after a few 
> days and the Java core dump contained several thousands of truststore 
> reloader threads:
> "Truststore reloader thread" #126 daemon prio=5 os_prio=0 
> tid=0x7f680d2e3000 nid=0x98fd waiting on 
> condition [0x7f67e482c000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run
> (ReloadingX509TrustManager.java:225)
> at java.lang.Thread.run(Thread.java:745)
> We found the issue to be caused by a bug in Hadoop where the 
> TimelineClientImpl is not destroying the SSLFactory if SSL is enabled in 
> Hadoop and the timeline server is running. I opened YARN-5309 which has more 
> details on the problem, and a patch was submitted a few days back.
> In addition to the changes in Hadoop, there are a couple of Hive changes 
> required:
> - ExecDriver needs to call jobclient.close() to trigger the clean-up of the 
> resources after the submitted job is done/failed
> - Hive needs to pick up a newer release of Hadoop to pick up MAPREDUCE-6618 
> and MAPREDUCE-6621 that fixed issues with calling jobclient.close(). Both 
> fixes are included in Hadoop 2.6.4. 
> However, since we also need to pick up YARN-5309, we need to wait for a new 
> release of Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14111) better concurrency handling for TezSessionState - part I

2016-07-12 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373545#comment-15373545
 ] 

Siddharth Seth commented on HIVE-14111:
---

+1.
Is the test failure on 
TestMiniTezCliDriver-tez_self_join.q-filter_join_breaktask.q-vector_decimal_precision.q-and-12-more
 - did not produce a TEST-*.xml file related ? Looks new.

> better concurrency handling for TezSessionState - part I
> 
>
> Key: HIVE-14111
> URL: https://issues.apache.org/jira/browse/HIVE-14111
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14111.01.patch, HIVE-14111.02.patch, 
> HIVE-14111.03.patch, HIVE-14111.04.patch, HIVE-14111.05.patch, 
> HIVE-14111.06.patch, HIVE-14111.patch, sessionPoolNotes.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13258) LLAP: Add hdfs bytes read and spilled bytes to tez print summary

2016-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373538#comment-15373538
 ] 

Hive QA commented on HIVE-13258:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12817258/HIVE-13258.5.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10311 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_llap_acid
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_llap_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_llap_uncompressed
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_ppd_basic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_ppd_basic
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/484/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/484/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-484/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12817258 - PreCommit-HIVE-MASTER-Build

> LLAP: Add hdfs bytes read and spilled bytes to tez print summary
> 
>
> Key: HIVE-13258
> URL: https://issues.apache.org/jira/browse/HIVE-13258
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13258.1.patch, HIVE-13258.1.patch, 
> HIVE-13258.2.patch, HIVE-13258.3.patch, HIVE-13258.4.patch, 
> HIVE-13258.5.patch, HIVE-13258.5.patch, llap-fs-counters-full-cache-hit.png, 
> llap-fs-counters.png
>
>
> When printing counters to console it will be useful to print hdfs bytes read 
> and spilled bytes which will help with debugging issues faster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14210) SSLFactory truststore reloader threads leaking in HiveServer2

2016-07-12 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373534#comment-15373534
 ] 

Vaibhav Gumashta commented on HIVE-14210:
-

+1 from my side. Should we modify the jira title to reflect the change in the 
patch? 

> SSLFactory truststore reloader threads leaking in HiveServer2
> -
>
> Key: HIVE-14210
> URL: https://issues.apache.org/jira/browse/HIVE-14210
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Affects Versions: 1.2.1, 2.0.0, 2.1.0
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
> Attachments: HIVE-14210.1.patch, HIVE-14210.patch
>
>
> We found an issue in a customer environment where the HS2 crashed after a few 
> days and the Java core dump contained several thousands of truststore 
> reloader threads:
> "Truststore reloader thread" #126 daemon prio=5 os_prio=0 
> tid=0x7f680d2e3000 nid=0x98fd waiting on 
> condition [0x7f67e482c000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run
> (ReloadingX509TrustManager.java:225)
> at java.lang.Thread.run(Thread.java:745)
> We found the issue to be caused by a bug in Hadoop where the 
> TimelineClientImpl is not destroying the SSLFactory if SSL is enabled in 
> Hadoop and the timeline server is running. I opened YARN-5309 which has more 
> details on the problem, and a patch was submitted a few days back.
> In addition to the changes in Hadoop, there are a couple of Hive changes 
> required:
> - ExecDriver needs to call jobclient.close() to trigger the clean-up of the 
> resources after the submitted job is done/failed
> - Hive needs to pick up a newer release of Hadoop to pick up MAPREDUCE-6618 
> and MAPREDUCE-6621 that fixed issues with calling jobclient.close(). Both 
> fixes are included in Hadoop 2.6.4. 
> However, since we also need to pick up YARN-5309, we need to wait for a new 
> release of Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14218) LLAP: ACL validation fails if the user name is different from principal user name

2016-07-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14218:

Reporter: Shraddha Sumit  (was: Sergey Shelukhin)

> LLAP: ACL validation fails if the user name is different from principal user 
> name
> -
>
> Key: HIVE-14218
> URL: https://issues.apache.org/jira/browse/HIVE-14218
> Project: Hive
>  Issue Type: Bug
>Reporter: Shraddha Sumit
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14218.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14218) LLAP: ACL validation fails if the user name is different from principal user name

2016-07-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14218:

Status: Patch Available  (was: Open)

> LLAP: ACL validation fails if the user name is different from principal user 
> name
> -
>
> Key: HIVE-14218
> URL: https://issues.apache.org/jira/browse/HIVE-14218
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14218.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14218) LLAP: ACL validation fails if the user name is different from principal user name

2016-07-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14218:

Attachment: HIVE-14218.patch

[~prasanth_j] [~jdere] can you take a look?
Apparently, in some cases short user name from UGI becomes current system user 
(may be related to super.startThread?), and ACL check fails if it's not the 
same as the principal.

> LLAP: ACL validation fails if the user name is different from principal user 
> name
> -
>
> Key: HIVE-14218
> URL: https://issues.apache.org/jira/browse/HIVE-14218
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14218.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14212) hbase_queries result out of date on branch-2.1

2016-07-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14212:

Issue Type: Test  (was: Bug)

> hbase_queries result out of date on branch-2.1
> --
>
> Key: HIVE-14212
> URL: https://issues.apache.org/jira/browse/HIVE-14212
> Project: Hive
>  Issue Type: Test
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Trivial
> Fix For: 2.1.1
>
> Attachments: HIVE-14212-branch-2.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14212) hbase_queries result out of date on branch-2.1

2016-07-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14212:

   Resolution: Fixed
Fix Version/s: 2.1.1
   Status: Resolved  (was: Patch Available)

Committed. Thanks!

> hbase_queries result out of date on branch-2.1
> --
>
> Key: HIVE-14212
> URL: https://issues.apache.org/jira/browse/HIVE-14212
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Trivial
> Fix For: 2.1.1
>
> Attachments: HIVE-14212-branch-2.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14188) LLAPIF: wrong user field is used from the token

2016-07-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14188:

   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Committed. Thanks for the review!

> LLAPIF: wrong user field is used from the token
> ---
>
> Key: HIVE-14188
> URL: https://issues.apache.org/jira/browse/HIVE-14188
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14188.patch, HIVE-14188.patch
>
>
> realUser is not usually set in all cases for delegation tokens; we should use 
> the owner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14111) better concurrency handling for TezSessionState - part I

2016-07-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373466#comment-15373466
 ] 

Sergey Shelukhin commented on HIVE-14111:
-

[~sseth] ping?

> better concurrency handling for TezSessionState - part I
> 
>
> Key: HIVE-14111
> URL: https://issues.apache.org/jira/browse/HIVE-14111
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14111.01.patch, HIVE-14111.02.patch, 
> HIVE-14111.03.patch, HIVE-14111.04.patch, HIVE-14111.05.patch, 
> HIVE-14111.06.patch, HIVE-14111.patch, sessionPoolNotes.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13704) Don't call DistCp.execute() instead of DistCp.run()

2016-07-12 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-13704:
---
Fix Version/s: 2.1.1

> Don't call DistCp.execute() instead of DistCp.run()
> ---
>
> Key: HIVE-13704
> URL: https://issues.apache.org/jira/browse/HIVE-13704
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Harsh J
>Assignee: Sergio Peña
>Priority: Critical
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-13704.1.patch
>
>
> HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} 
> method runs added logic that drives the state of {{SimpleCopyListing}} which 
> runs in the driver, and of {{CopyCommitter}} which runs in the job runtime.
> When Hive ends up running DistCp for copy work (Between non matching FS or 
> between encrypted/non-encrypted zones, for sizes above a configured value) 
> this state not being set causes wrong paths to appear on the target (subdirs 
> named after the file, instead of just the file).
> Hive should call DistCp's Tool {{run}} method and not the {{execute}} method 
> directly, to not skip the target exists flag that the {{setTargetPathExists}} 
> call would set:
> https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13704) Don't call DistCp.execute() instead of DistCp.run()

2016-07-12 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-13704:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Thanks [~ashutoshc] for the review.


> Don't call DistCp.execute() instead of DistCp.run()
> ---
>
> Key: HIVE-13704
> URL: https://issues.apache.org/jira/browse/HIVE-13704
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Harsh J
>Assignee: Sergio Peña
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-13704.1.patch
>
>
> HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} 
> method runs added logic that drives the state of {{SimpleCopyListing}} which 
> runs in the driver, and of {{CopyCommitter}} which runs in the job runtime.
> When Hive ends up running DistCp for copy work (Between non matching FS or 
> between encrypted/non-encrypted zones, for sizes above a configured value) 
> this state not being set causes wrong paths to appear on the target (subdirs 
> named after the file, instead of just the file).
> Hive should call DistCp's Tool {{run}} method and not the {{execute}} method 
> directly, to not skip the target exists flag that the {{setTargetPathExists}} 
> call would set:
> https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14213) Add timeouts for various components in llap status check

2016-07-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373402#comment-15373402
 ] 

Sergey Shelukhin commented on HIVE-14213:
-

Why do we need a separate set of config settings? On the same note, if the 
component settings are already set, and the new ones are not set, this will 
override them with defaults.
Perhaps we can just have the default constants for the original parameters 
(from YARN etc.), and set them if not already set? If the user wants to change 
them they can just set the originals too

> Add timeouts for various components in llap status check
> 
>
> Key: HIVE-14213
> URL: https://issues.apache.org/jira/browse/HIVE-14213
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14213.01.patch
>
>
> The llapstatus check connects to various compoennts - YARN, HDFS via Slider, 
> ZooKeeper. If either of these components are down - the command can take a 
> long time to exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11855) Beeline display is not correct when content contains invisible character like \r

2016-07-12 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373396#comment-15373396
 ] 

Vihang Karajgaonkar commented on HIVE-11855:


I tried adding the fields like given in the summary and it showing the expected 
results. Can you confirm if this is happening with the latest beeline?

{noformat}
$ cat /tmp/test.txt
200 value1^Mvalue2


0: jdbc:hive2://localhost:1> desc test;
+---++--+--+
| col_name  | data_type  | comment  |
+---++--+--+
| id| int|  |
| value | string |  |
+---++--+--+
2 rows selected (0.075 seconds)
0: jdbc:hive2://localhost:1> load data local inpath '/tmp/test.txt' into 
table test;
No rows affected (0.262 seconds)
0: jdbc:hive2://localhost:1> select * from test;
+--+-+--+
| test.id  |   test.value|
+--+-+--+
| 200  | value1^Mvalue2  |
+--+-+--+
1 row selected (0.123 seconds)
{noformat}

> Beeline display is not correct when content contains invisible character like 
> \r
> 
>
> Key: HIVE-11855
> URL: https://issues.apache.org/jira/browse/HIVE-11855
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.1.1, 1.2.1
>Reporter: Victor JMelo
>Priority: Minor
>
> Field content like this:
> {quote}
> col1_part1^Mcol1_part2
> {quote}
> create table statement:
> {quote}
> create table foo(c1 string);
> {quote}
> After load the data into table foo using sqoop, beeline displays:
> {quote}
> 0: jdbc:hive2://localhost:1/default> select * from foo;
> +-+
> | foo.c1  |
> +-+
> col1_part2t1
>   |
> +-+
> {quote}
> The first part of the data "col1_part1" is overwritten by the second part 
> "col1_part2"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13704) Don't call DistCp.execute() instead of DistCp.run()

2016-07-12 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373390#comment-15373390
 ] 

Sergio Peña commented on HIVE-13704:


The tests are not related. 
Most of the tests are failing on previous jobs, except for 
TestSessionManagerMetrics.testThreadPoolMetrics, but the output nor the test 
look related to the distcp change.
I will commit this patch.

> Don't call DistCp.execute() instead of DistCp.run()
> ---
>
> Key: HIVE-13704
> URL: https://issues.apache.org/jira/browse/HIVE-13704
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Harsh J
>Assignee: Sergio Peña
>Priority: Critical
> Attachments: HIVE-13704.1.patch
>
>
> HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} 
> method runs added logic that drives the state of {{SimpleCopyListing}} which 
> runs in the driver, and of {{CopyCommitter}} which runs in the job runtime.
> When Hive ends up running DistCp for copy work (Between non matching FS or 
> between encrypted/non-encrypted zones, for sizes above a configured value) 
> this state not being set causes wrong paths to appear on the target (subdirs 
> named after the file, instead of just the file).
> Hive should call DistCp's Tool {{run}} method and not the {{execute}} method 
> directly, to not skip the target exists flag that the {{setTargetPathExists}} 
> call would set:
> https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14217) Druid integration

2016-07-12 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-14217:
--

Assignee: Jesus Camacho Rodriguez

> Druid integration
> -
>
> Key: HIVE-14217
> URL: https://issues.apache.org/jira/browse/HIVE-14217
> Project: Hive
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Jesus Camacho Rodriguez
>
> Allow Hive to query data in Druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14158) deal with derived column names

2016-07-12 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14158:
---
Attachment: HIVE-14158.05.patch

> deal with derived column names
> --
>
> Key: HIVE-14158
> URL: https://issues.apache.org/jira/browse/HIVE-14158
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-14158.01.patch, HIVE-14158.02.patch, 
> HIVE-14158.03.patch, HIVE-14158.04.patch, HIVE-14158.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14158) deal with derived column names

2016-07-12 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373367#comment-15373367
 ] 

Pengcheng Xiong commented on HIVE-14158:


have a clean run in internal ptest. pushed to master. cherry-picked to 2.1. 
thanks [~ashutoshc] for the review!

> deal with derived column names
> --
>
> Key: HIVE-14158
> URL: https://issues.apache.org/jira/browse/HIVE-14158
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-14158.01.patch, HIVE-14158.02.patch, 
> HIVE-14158.03.patch, HIVE-14158.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14195) HiveMetaStoreClient getFunction() does not throw NoSuchObjectException

2016-07-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373361#comment-15373361
 ] 

Sergey Shelukhin commented on HIVE-14195:
-

+1 on updated patch, will commit later today

> HiveMetaStoreClient getFunction() does not throw NoSuchObjectException
> --
>
> Key: HIVE-14195
> URL: https://issues.apache.org/jira/browse/HIVE-14195
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-14195.2.patch, HIVE-14195.patch
>
>
> HiveMetaStoreClient getFunction(dbName, funcName) does not throw 
> NoSuchObjectException when no function with funcName exists in the db. 
> Instead, I need to search the MetaException message for 
> 'NoSuchObjectException'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10928) Concurrent Beeline Connections can not work on different databases

2016-07-12 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved HIVE-10928.

Resolution: Cannot Reproduce

> Concurrent Beeline Connections can not work on different databases
> --
>
> Key: HIVE-10928
> URL: https://issues.apache.org/jira/browse/HIVE-10928
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 0.14.0
>Reporter: chirag aggarwal
>
> The concurrent beeline connections are not able to work on different 
> databases. If one connection calls 'use abc', then all the connections start 
> working on database 'abc'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10928) Concurrent Beeline Connections can not work on different databases

2016-07-12 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373356#comment-15373356
 ] 

Vihang Karajgaonkar commented on HIVE-10928:


This seems to be working in the latest beeline from master branch. Can you 
please check again? Please reopen the issue if you think there is something 
which still needs to be fixed.

Session 1:
{noformat}
0: jdbc:hive2://localhost:1> use default;
No rows affected (0.055 seconds)
0: jdbc:hive2://localhost:1> show tables;
++--+
|tab_name|
++--+
| likes  |
| longkeyvalues  |
| names  |
| src2   |
| t1 |
| t2 |
| t3 |
| table_3|
++--+
8 rows selected (0.077 seconds)
0: jdbc:hive2://localhost:1> select * from likes;
+---+--+--+
| likes.id  | likes.thing  |
+---+--+--+
| 1 | chocolate|
| 1 | car  |
| 1 | games|
| 1 | chess|
| 2 | cake |
| 2 | shopping |
| 5 | cricket  |
| 7 | travel   |
| 3 | hiking   |
+---+--+--+
9 rows selected (0.219 seconds)
{noformat}


Concurrent Session 2:
{noformat}
0: jdbc:hive2://localhost:1> use parquet;
No rows affected (0.043 seconds)
0: jdbc:hive2://localhost:1> show tables;
+---+--+
| tab_name  |
+---+--+
| test  |
+---+--+
1 row selected (0.058 seconds)
0: jdbc:hive2://localhost:1> select * from test;
+--+--+
| test.id  |
+--+--+
| 2000.0   |
+--+--+
1 row selected (0.538 seconds)
0: jdbc:hive2://localhost:1>
{noformat}

> Concurrent Beeline Connections can not work on different databases
> --
>
> Key: HIVE-10928
> URL: https://issues.apache.org/jira/browse/HIVE-10928
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 0.14.0
>Reporter: chirag aggarwal
>
> The concurrent beeline connections are not able to work on different 
> databases. If one connection calls 'use abc', then all the connections start 
> working on database 'abc'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14198) Refactor aux jar related code to make them more consistent

2016-07-12 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14198:

Attachment: (was: HIVE-14198.1.patch)

> Refactor aux jar related code to make them more consistent
> --
>
> Key: HIVE-14198
> URL: https://issues.apache.org/jira/browse/HIVE-14198
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14198.1.patch
>
>
> There are some redundancy and inconsistency between hive.aux.jar.paths and 
> hive.reloadable.aux.jar.paths and also between MR and spark. 
> Refactor the code to reuse the same code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14198) Refactor aux jar related code to make them more consistent

2016-07-12 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14198:

Attachment: HIVE-14198.1.patch

> Refactor aux jar related code to make them more consistent
> --
>
> Key: HIVE-14198
> URL: https://issues.apache.org/jira/browse/HIVE-14198
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Affects Versions: 2.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14198.1.patch
>
>
> There are some redundancy and inconsistency between hive.aux.jar.paths and 
> hive.reloadable.aux.jar.paths and also between MR and spark. 
> Refactor the code to reuse the same code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14187) JDOPersistenceManager objects remain cached if MetaStoreClient#close is not called

2016-07-12 Thread Mohit Sabharwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohit Sabharwal updated HIVE-14187:
---
Attachment: HIVE-14187.patch

> JDOPersistenceManager objects remain cached if MetaStoreClient#close is not 
> called
> --
>
> Key: HIVE-14187
> URL: https://issues.apache.org/jira/browse/HIVE-14187
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohit Sabharwal
>Assignee: Mohit Sabharwal
> Attachments: HIVE-14187.patch, HIVE-14187.patch
>
>
> JDOPersistenceManager objects are cached in JDOPersistenceManagerFactory by 
> DataNuclues.
> A new JDOPersistenceManager object gets created for every HMS thread since 
> ObjectStore is a thread local.
> In non-embedded metastore mode, JDOPersistenceManager associated with a 
> thread only gets cleaned up if IMetaStoreClient#close is called by the client 
> (which calls ObjectStore#shutdown which calls JDOPersistenceManager#close 
> which in turn removes the object from cache in 
> JDOPersistenceManagerFactory#releasePersistenceManager
> https://github.com/datanucleus/datanucleus-api-jdo/blob/master/src/main/java/org/datanucleus/api/jdo/JDOPersistenceManagerFactory.java#L1271),
>  i.e. the object will remain cached if client does not call close.
> For example: If one interrupts out of hive CLI shell (instead of using 
> 'exit;' command), SessionState#close does not get called, and hence 
> IMetaStoreClient#close does not get called.
> Instead of relying the client to call close, it's cleaner to automatically 
> perform RawStore related cleanup at the server end via deleteContext() which 
> gets called when the server detects a lost/closed connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13704) Don't call DistCp.execute() instead of DistCp.run()

2016-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373309#comment-15373309
 ] 

Hive QA commented on HIVE-13704:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12817250/HIVE-13704.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10309 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
org.apache.hive.service.cli.session.TestSessionManagerMetrics.testThreadPoolMetrics
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/483/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/483/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-483/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12817250 - PreCommit-HIVE-MASTER-Build

> Don't call DistCp.execute() instead of DistCp.run()
> ---
>
> Key: HIVE-13704
> URL: https://issues.apache.org/jira/browse/HIVE-13704
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Harsh J
>Assignee: Sergio Peña
>Priority: Critical
> Attachments: HIVE-13704.1.patch
>
>
> HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}} 
> method runs added logic that drives the state of {{SimpleCopyListing}} which 
> runs in the driver, and of {{CopyCommitter}} which runs in the job runtime.
> When Hive ends up running DistCp for copy work (Between non matching FS or 
> between encrypted/non-encrypted zones, for sizes above a configured value) 
> this state not being set causes wrong paths to appear on the target (subdirs 
> named after the file, instead of just the file).
> Hive should call DistCp's Tool {{run}} method and not the {{execute}} method 
> directly, to not skip the target exists flag that the {{setTargetPathExists}} 
> call would set:
> https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13644) Remove hardcoded groovy.grape.report.downloads=true from DependencyResolver

2016-07-12 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373230#comment-15373230
 ] 

Anthony Hsu edited comment on HIVE-13644 at 7/12/16 4:53 PM:
-

[~ashutoshc]: Thanks for reviewing and committing this!

[~leftylev]: I don't think this requires Wiki documentation. This change just 
removed some unnecessary logging output that was impossible to programmatically 
disable (it was hardcoded to be enabled in the code). Thanks for checking, 
though!


was (Author: erwaman):
[~leftylev]: I don't think this requires Wiki documentation. This change just 
removed some unnecessary logging output that was impossible to programmatically 
disable (it was hardcoded to be enabled in the code). Thanks for checking, 
though!

> Remove hardcoded groovy.grape.report.downloads=true from DependencyResolver
> ---
>
> Key: HIVE-13644
> URL: https://issues.apache.org/jira/browse/HIVE-13644
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Fix For: 2.2.0
>
> Attachments: HIVE-13644.1.patch
>
>
> Currently, in Hive's 
> [DependencyResolver.java|https://github.com/apache/hive/blob/8dd1d1966f2f0b86604b4e991ebc865224f42b41/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L176],
>  the system property {{groovy.grape.report.downloads}} is hardcoded to 
> {{true}} and there is no way to override it and disable the logging. We 
> should remove this hardcoded value and allow users to configure it as they 
> see fit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13644) Remove hardcoded groovy.grape.report.downloads=true from DependencyResolver

2016-07-12 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373230#comment-15373230
 ] 

Anthony Hsu commented on HIVE-13644:


[~leftylev]: I don't think this requires Wiki documentation. This change just 
removed some unnecessary logging output that was impossible to programmatically 
disable (it was hardcoded to be enabled in the code). Thanks for checking, 
though!

> Remove hardcoded groovy.grape.report.downloads=true from DependencyResolver
> ---
>
> Key: HIVE-13644
> URL: https://issues.apache.org/jira/browse/HIVE-13644
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Fix For: 2.2.0
>
> Attachments: HIVE-13644.1.patch
>
>
> Currently, in Hive's 
> [DependencyResolver.java|https://github.com/apache/hive/blob/8dd1d1966f2f0b86604b4e991ebc865224f42b41/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L176],
>  the system property {{groovy.grape.report.downloads}} is hardcoded to 
> {{true}} and there is no way to override it and disable the logging. We 
> should remove this hardcoded value and allow users to configure it as they 
> see fit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13966) DbNotificationListener: can loose DDL operation notifications

2016-07-12 Thread Rahul Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373197#comment-15373197
 ] 

Rahul Sharma commented on HIVE-13966:
-

Working on adding test cases, looks  a bit tricky as we need to prevent the 
entry to notification log to check the complete metastore operation rollback. 
Will update the new patch soon. 

> DbNotificationListener: can loose DDL operation notifications
> -
>
> Key: HIVE-13966
> URL: https://issues.apache.org/jira/browse/HIVE-13966
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Nachiket Vaidya
>Assignee: Rahul Sharma
>Priority: Critical
> Attachments: HIVE-13966.1.patch
>
>
> The code for each API in HiveMetaStore.java is like this:
> 1. openTransaction()
> 2. -- operation--
> 3. commit() or rollback() based on result of the operation.
> 4. add entry to notification log (unconditionally)
> If the operation is failed (in step 2), we still add entry to notification 
> log. Found this issue in testing.
> It is still ok as this is the case of false positive.
> If the operation is successful and adding to notification log failed, the 
> user will get an MetaException. It will not rollback the operation, as it is 
> already committed. We need to handle this case so that we will not have false 
> negatives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11402) HS2 - add an option to disallow parallel query execution within a single Session

2016-07-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373097#comment-15373097
 ] 

Hive QA commented on HIVE-11402:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12817230/HIVE-11402.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10294 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-tez_joins_explain.q-vector_data_types.q-tez_dynpart_hashjoin_1.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/482/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/482/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-482/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12817230 - PreCommit-HIVE-MASTER-Build

> HS2 - add an option to disallow parallel query execution within a single 
> Session
> 
>
> Key: HIVE-11402
> URL: https://issues.apache.org/jira/browse/HIVE-11402
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11402.01.patch, HIVE-11402.02.patch, 
> HIVE-11402.03.patch, HIVE-11402.patch
>
>
> HiveServer2 currently allows concurrent queries to be run in a single 
> session. However, every HS2 session has  an associated SessionState object, 
> and the use of SessionState in many places assumes that only one thread is 
> using it, ie it is not thread safe.
> There are many places where SesssionState thread safety needs to be 
> addressed, and until then we should serialize all query execution for a 
> single HS2 session. -This problem can become more visible with HIVE-4239 now 
> allowing parallel query compilation.-
> Note that running queries in parallel for single session is not 
> straightforward  with jdbc, you need to spawn another thread as the 
> Statement.execute calls are blocking. I believe ODBC has non blocking query 
> execution API, and Hue is another well known application that shares sessions 
> for all queries that a user runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14215) Displaying inconsistent CPU usage data with MR execution engine

2016-07-12 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-14215:
--
Status: Patch Available  (was: Open)

Tested the patch with the same sleep changes, and is is working

> Displaying inconsistent CPU usage data with MR execution engine
> ---
>
> Key: HIVE-14215
> URL: https://issues.apache.org/jira/browse/HIVE-14215
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-14215.patch
>
>
> If the MR task is finished after printing the cumulative CPU time then there 
> is the possibility to print inconsistent CPU usage information.
> Correct one:
> {noformat}
> 2016-07-12 11:31:42,961 Stage-3 map = 0%,  reduce = 0%
> 2016-07-12 11:31:48,237 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.5 
> sec
> MapReduce Total cumulative CPU time: 2 seconds 500 msec
> Ended Job = job_1468321038188_0003
> MapReduce Jobs Launched: 
> Stage-Stage-3: Map: 1   Cumulative CPU: 2.5 sec   HDFS Read: 5864 HDFS Write: 
> 103 SUCCESS
> Total MapReduce CPU Time Spent: 2 seconds 500 msec
> {noformat}
> One type of inconsistent data (easily reproducible one):
> {noformat}
> 2016-07-12 11:39:00,540 Stage-3 map = 0%,  reduce = 0%
> Ended Job = job_1468321038188_0004
> MapReduce Jobs Launched: 
> Stage-Stage-3: Map: 1   Cumulative CPU: 2.51 sec   HDFS Read: 5864 HDFS 
> Write: 103 SUCCESS
> Total MapReduce CPU Time Spent: 2 seconds 510 msec
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14215) Displaying inconsistent CPU usage data with MR execution engine

2016-07-12 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-14215:
--
Attachment: HIVE-14215.patch

Not sure how to create a unit test for this situation, but created a simple 
patch which contains reordering the cpu time calculation.

> Displaying inconsistent CPU usage data with MR execution engine
> ---
>
> Key: HIVE-14215
> URL: https://issues.apache.org/jira/browse/HIVE-14215
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-14215.patch
>
>
> If the MR task is finished after printing the cumulative CPU time then there 
> is the possibility to print inconsistent CPU usage information.
> Correct one:
> {noformat}
> 2016-07-12 11:31:42,961 Stage-3 map = 0%,  reduce = 0%
> 2016-07-12 11:31:48,237 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.5 
> sec
> MapReduce Total cumulative CPU time: 2 seconds 500 msec
> Ended Job = job_1468321038188_0003
> MapReduce Jobs Launched: 
> Stage-Stage-3: Map: 1   Cumulative CPU: 2.5 sec   HDFS Read: 5864 HDFS Write: 
> 103 SUCCESS
> Total MapReduce CPU Time Spent: 2 seconds 500 msec
> {noformat}
> One type of inconsistent data (easily reproducible one):
> {noformat}
> 2016-07-12 11:39:00,540 Stage-3 map = 0%,  reduce = 0%
> Ended Job = job_1468321038188_0004
> MapReduce Jobs Launched: 
> Stage-Stage-3: Map: 1   Cumulative CPU: 2.51 sec   HDFS Read: 5864 HDFS 
> Write: 103 SUCCESS
> Total MapReduce CPU Time Spent: 2 seconds 510 msec
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14215) Displaying inconsistent CPU usage data with MR execution engine

2016-07-12 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372954#comment-15372954
 ] 

Peter Vary commented on HIVE-14215:
---

To reproduce these consistently, I have to put a Thread.sleep at the end of the 
while cycle in the method progress(ExecDriverTaskHandle th), after this (line 
373):
{noformat}
  console.printInfo(output);
  task.setStatusMessage(output);
  reportTime = System.currentTimeMillis();
{noformat}

This way I raised the occurrence of the rare situation, where the job is 
finished after the cpu time generation, but before the check of the while cycle.

> Displaying inconsistent CPU usage data with MR execution engine
> ---
>
> Key: HIVE-14215
> URL: https://issues.apache.org/jira/browse/HIVE-14215
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
>
> If the MR task is finished after printing the cumulative CPU time then there 
> is the possibility to print inconsistent CPU usage information.
> Correct one:
> {noformat}
> 2016-07-12 11:31:42,961 Stage-3 map = 0%,  reduce = 0%
> 2016-07-12 11:31:48,237 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.5 
> sec
> MapReduce Total cumulative CPU time: 2 seconds 500 msec
> Ended Job = job_1468321038188_0003
> MapReduce Jobs Launched: 
> Stage-Stage-3: Map: 1   Cumulative CPU: 2.5 sec   HDFS Read: 5864 HDFS Write: 
> 103 SUCCESS
> Total MapReduce CPU Time Spent: 2 seconds 500 msec
> {noformat}
> One type of inconsistent data (easily reproducible one):
> {noformat}
> 2016-07-12 11:39:00,540 Stage-3 map = 0%,  reduce = 0%
> Ended Job = job_1468321038188_0004
> MapReduce Jobs Launched: 
> Stage-Stage-3: Map: 1   Cumulative CPU: 2.51 sec   HDFS Read: 5864 HDFS 
> Write: 103 SUCCESS
> Total MapReduce CPU Time Spent: 2 seconds 510 msec
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14187) JDOPersistenceManager objects remain cached if MetaStoreClient#close is not called

2016-07-12 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372939#comment-15372939
 ] 

Mohit Sabharwal commented on HIVE-14187:


[~vgumashta], do you have more input on the patch ? thanks. 

 

> JDOPersistenceManager objects remain cached if MetaStoreClient#close is not 
> called
> --
>
> Key: HIVE-14187
> URL: https://issues.apache.org/jira/browse/HIVE-14187
> Project: Hive
>  Issue Type: Bug
>Reporter: Mohit Sabharwal
>Assignee: Mohit Sabharwal
> Attachments: HIVE-14187.patch
>
>
> JDOPersistenceManager objects are cached in JDOPersistenceManagerFactory by 
> DataNuclues.
> A new JDOPersistenceManager object gets created for every HMS thread since 
> ObjectStore is a thread local.
> In non-embedded metastore mode, JDOPersistenceManager associated with a 
> thread only gets cleaned up if IMetaStoreClient#close is called by the client 
> (which calls ObjectStore#shutdown which calls JDOPersistenceManager#close 
> which in turn removes the object from cache in 
> JDOPersistenceManagerFactory#releasePersistenceManager
> https://github.com/datanucleus/datanucleus-api-jdo/blob/master/src/main/java/org/datanucleus/api/jdo/JDOPersistenceManagerFactory.java#L1271),
>  i.e. the object will remain cached if client does not call close.
> For example: If one interrupts out of hive CLI shell (instead of using 
> 'exit;' command), SessionState#close does not get called, and hence 
> IMetaStoreClient#close does not get called.
> Instead of relying the client to call close, it's cleaner to automatically 
> perform RawStore related cleanup at the server end via deleteContext() which 
> gets called when the server detects a lost/closed connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >