[jira] [Updated] (HIVE-14901) HiveServer2: Use user supplied fetch size to determine #rows serialized in tasks

2016-10-06 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-14901:

Affects Version/s: 2.1.0

> HiveServer2: Use user supplied fetch size to determine #rows serialized in 
> tasks
> 
>
> Key: HIVE-14901
> URL: https://issues.apache.org/jira/browse/HIVE-14901
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC, ODBC
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>
> Currently, we use {{hive.server2.thrift.resultset.max.fetch.size}} to decide 
> the max number of rows that we write in tasks. However, we should ideally use 
> the user supplied value (which can be extracted from the 
> ThriftCLIService.FetchResults' request parameter) to decide how many rows to 
> serialize in a blob in the tasks. We should however use 
> {{hive.server2.thrift.resultset.max.fetch.size}} to have an upper bound on 
> it, so that we don't go OOM in tasks and HS2. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14901) HiveServer2: Use user supplied fetch size to determine #rows serialized in tasks

2016-10-06 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551250#comment-15551250
 ] 

Vaibhav Gumashta commented on HIVE-14901:
-

cc [~thejas] [~holmanl] [~ziyangz] [~kliew]

> HiveServer2: Use user supplied fetch size to determine #rows serialized in 
> tasks
> 
>
> Key: HIVE-14901
> URL: https://issues.apache.org/jira/browse/HIVE-14901
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC, ODBC
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>
> Currently, we use {{hive.server2.thrift.resultset.max.fetch.size}} to decide 
> the max number of rows that we write in tasks. However, we should ideally use 
> the user supplied value (which can be extracted from the 
> ThriftCLIService.FetchResults' request parameter) to decide how many rows to 
> serialize in a blob in the tasks. We should however use 
> {{hive.server2.thrift.resultset.max.fetch.size}} to have an upper bound on 
> it, so that we don't go OOM in tasks and HS2. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14861) Support precedence for set operator using parentheses

2016-10-06 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551127#comment-15551127
 ] 

Pengcheng Xiong commented on HIVE-14861:


[~ashutoshc], the change sounds ok for me. (In some cases it even improved the 
explain). Could u take a look? Thanks.

> Support precedence for set operator using parentheses
> -
>
> Key: HIVE-14861
> URL: https://issues.apache.org/jira/browse/HIVE-14861
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14861.01.patch, HIVE-14861.02.patch
>
>
> We should support precedence for set operator by using parentheses. For 
> example
> {code}
> select * from src union all (select * from src union select * from src);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14876) make the number of rows to fetch from various HS2 clients/servers configurable

2016-10-06 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551259#comment-15551259
 ] 

Vaibhav Gumashta commented on HIVE-14876:
-

Following are the details of RPC fetch from jdbc to hs2 and also the confusion 
over {{hive.server2.thrift.resultset.max.fetch.size}}:
When we create a new connection, we use a default value for fetch size if not 
specified in the connection string by the end user. In {{HiveConnection}}:
{code}
private int fetchSize = HiveStatement.DEFAULT_FETCH_SIZE;
{code}

If however, a user specifies the fetch size by using a connection string like 
this: {{jdbc:hive2://localhost:1/default;fetchSize=1}}, we override the 
default value with the user supplied value. In {{HiveConnection}}:
{code}
if (sessConfMap.containsKey(JdbcConnectionParams.FETCH_SIZE)) {
  fetchSize = 
Integer.parseInt(sessConfMap.get(JdbcConnectionParams.FETCH_SIZE));
}
{code}

When we run a {{HiveStatement.execute}}, we set the fetch size in the result 
set. In {{HiveStatement.execute}}:
{code}
resultSet =  new 
HiveQueryResultSet.Builder(this).setClient(client).setSessionHandle(sessHandle)
.setStmtHandle(stmtHandle).setMaxRows(maxRows).setFetchSize(fetchSize)
.setScrollable(isScrollableResultset)
.build();
{code}

Finally, when we issue a fetch rpc request, we send this value as part of the 
rpc request. In {{HiveQueryResultSet.next}}:
{code}
TFetchResultsReq fetchReq = new TFetchResultsReq(stmtHandle,
orientation, fetchSize);
{code}

On the server side, the fetch request hits {{ThriftCLIService.FetchResults}}:
{code}
RowSet rowSet = cliService.fetchResults(
  new OperationHandle(req.getOperationHandle()),
  FetchOrientation.getFetchOrientation(req.getOrientation()),
  req.getMaxRows(),
  FetchType.getFetchType(req.getFetchType()));
{code}
The request eventually reaches {{SQLOperation.getNextRowSet}} which gets the 
fetch size specified in the RPC as the parameter.

Apologize for the confusion regarding 
{{hive.server2.thrift.resultset.max.fetch.size}}, but that is only used when 
ThriftJDBCSerde is used to write resultsets in tasks, to decide how many rows 
to serialize in a blob. I have created a jira for resolving the confusion and 
shall have a patch out soon: HIVE-14901. Meanwhile, to increase the default 
fetch size for the code path that doesn't use ThriftJDBCSerde, we should bump 
the value of HiveStatement.DEFAULT_FETCH_SIZE on the driver side.

cc [~ziyangz]:  you might want to follow the discussion here.

> make the number of rows to fetch from various HS2 clients/servers configurable
> --
>
> Key: HIVE-14876
> URL: https://issues.apache.org/jira/browse/HIVE-14876
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14876.patch
>
>
> Right now, it's hardcoded to a variety of values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11394) Enhance EXPLAIN display for vectorization

2016-10-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551274#comment-15551274
 ] 

Hive QA commented on HIVE-11394:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12831883/HIVE-11394.04.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1414/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1414/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1414/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2016-10-06 08:08:46.989
+ [[ -n /usr/java/jdk1.8.0_25 ]]
+ export JAVA_HOME=/usr/java/jdk1.8.0_25
+ JAVA_HOME=/usr/java/jdk1.8.0_25
+ export 
PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-1414/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2016-10-06 08:08:46.991
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at e1fa278 HIVE-14896 : Stabilize golden files for currently 
failing tests
+ git clean -f -d
Removing b/
Removing ql/src/test/queries/clientpositive/union_paren.q
Removing ql/src/test/results/clientpositive/union_paren.q.out
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at e1fa278 HIVE-14896 : Stabilize golden files for currently 
failing tests
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2016-10-06 08:08:48.096
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
error: patch failed: 
ql/src/test/results/clientpositive/vector_join_part_col_char.q.out:108
error: ql/src/test/results/clientpositive/vector_join_part_col_char.q.out: 
patch does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12831883 - PreCommit-HIVE-Build

> Enhance EXPLAIN display for vectorization
> -
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> The optional clause defaults are not ONLY and SUMMARY.
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> It is the same as EXPLAIN VECTORIZATION SUMMARY.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: decimal_date_test
>   Statistics: Num rows: 12288 Data size: 2467616 Basic stats: 
> COMPLETE Column stats: NONE
>   

[jira] [Commented] (HIVE-14861) Support precedence for set operator using parentheses

2016-10-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551112#comment-15551112
 ] 

Hive QA commented on HIVE-14861:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12831879/HIVE-14861.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 10657 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_lineage2]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[complex_alias]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constant_prop_1]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_part7]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[optimize_nullscan]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_offcbo]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_ppr]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_or_replace_view6]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[ptf_negative_AmbiguousWindowDefn]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[ptf_negative_NoWindowDefn]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[unionClusterBy]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[unionDistributeBy]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[unionLimit]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[unionOrderBy]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[unionSortBy]
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1413/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1413/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1413/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12831879 - PreCommit-HIVE-Build

> Support precedence for set operator using parentheses
> -
>
> Key: HIVE-14861
> URL: https://issues.apache.org/jira/browse/HIVE-14861
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14861.01.patch, HIVE-14861.02.patch
>
>
> We should support precedence for set operator by using parentheses. For 
> example
> {code}
> select * from src union all (select * from src union select * from src);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14660) ArrayIndexOutOfBoundsException on delete

2016-10-06 Thread Benjamin BONNET (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551513#comment-15551513
 ] 

Benjamin BONNET commented on HIVE-14660:


Actually, every bucket gets a writer : even though you have only one reducer, 
every bucket is covered - by the same writer. And indeed, writers are designed 
to cover several buckets (see that comment "// Find the bucket id, and switch 
buckets if need to" in FileSeekOperator source code).
In my opinion, there is just a small bug in the way the bucket switch is 
implemented.
Regards


> ArrayIndexOutOfBoundsException on delete
> 
>
> Key: HIVE-14660
> URL: https://issues.apache.org/jira/browse/HIVE-14660
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor, Transactions
>Affects Versions: 1.2.1
>Reporter: Benjamin BONNET
>Assignee: Benjamin BONNET
> Attachments: HIVE-14660.1-banch-1.2.patch
>
>
> Hi,
> DELETE on an ACID table may fail on an ArrayIndexOutOfBoundsException.
> That bug occurs at Reduce phase when there are less reducers than the number 
> of the table buckets.
> In order to reproduce, create a simple ACID table :
> {code:sql}
> CREATE TABLE test (`cle` bigint,`valeur` string)
>  PARTITIONED BY (`annee` string)
>  CLUSTERED BY (cle) INTO 5 BUCKETS
>  TBLPROPERTIES ('transactional'='true');
> {code}
> Populate it with lines distributed among all buckets, with random values and 
> a few partitions.
> Force the Reducers to be less than the buckets :
> {code:sql}
> set mapred.reduce.tasks=1;
> {code}
> Then execute a delete that will remove many lines from all the buckets.
> {code:sql}
> DELETE FROM test WHERE valeur<'some_value';
> {code}
> Then you will get an ArrayIndexOutOfBoundsException :
> {code}
> 2016-08-22 21:21:02,500 [FATAL] [TezChild] |tez.ReduceRecordSource|: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) 
> {"key":{"reducesinkkey0":{"transactionid":119,"bucketid":0,"rowid":0}},"value":{"_col0":"4"}}
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:769)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
> ... 17 more
> {code}
> Adding logs into FileSinkOperator, one sees the operator deals with buckets 
> 0, 1, 2, 3, 4, then 0 again and it fails at line 769 : actually each time you 
> switch bucket, you move forwards in a 5 (number of buckets) elements array. 
> So when you get bucket 0 for the second time, you get out of the array...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14799) Query operation are not thread safe during its cancellation

2016-10-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551537#comment-15551537
 ] 

Hive QA commented on HIVE-14799:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12831889/HIVE-14799.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10656 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1415/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1415/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1415/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12831889 - PreCommit-HIVE-Build

> Query operation are not thread safe during its cancellation
> ---
>
> Key: HIVE-14799
> URL: https://issues.apache.org/jira/browse/HIVE-14799
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-14799.1.patch, HIVE-14799.2.patch, HIVE-14799.patch
>
>
> When a query is cancelled either via Beeline (Ctrl-C) or API call 
> TCLIService.Client.CancelOperation, SQLOperation.cancel is invoked in a 
> different thread from that running the query to close/destroy its 
> encapsulated Driver object. Both SQLOperation and Driver are not thread-safe 
> which could sometimes result in Runtime exceptions like NPE. The errors from 
> the running query are not handled properly therefore probably causing some 
> stuffs (files, locks etc) not being cleaned after the query termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization

2016-10-06 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11394:

Attachment: HIVE-11394.05.patch

> Enhance EXPLAIN display for vectorization
> -
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> The optional clause defaults are not ONLY and SUMMARY.
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> It is the same as EXPLAIN VECTORIZATION SUMMARY.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: decimal_date_test
>   Statistics: Num rows: 12288 Data size: 2467616 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: 
> boolean)
> Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
> Select Operator
>   expressions: cdate (type: date)
>   outputColumnNames: _col0
>   Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: date)
> sort order: +
> Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2 
> Execution mode: vectorized, llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> groupByVectorOutput: true
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: date)
> outputColumnNames: _col0
> Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> EXPLAIN VECTORIZATION DETAIL
> (Note the added  Select Vectorization, Group By Vectorization, Reduce Sink 
> Vectorization sections in this example)
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
> 

[jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization

2016-10-06 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11394:

Status: In Progress  (was: Patch Available)

> Enhance EXPLAIN display for vectorization
> -
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> The optional clause defaults are not ONLY and SUMMARY.
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> It is the same as EXPLAIN VECTORIZATION SUMMARY.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: decimal_date_test
>   Statistics: Num rows: 12288 Data size: 2467616 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: 
> boolean)
> Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
> Select Operator
>   expressions: cdate (type: date)
>   outputColumnNames: _col0
>   Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: date)
> sort order: +
> Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2 
> Execution mode: vectorized, llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> groupByVectorOutput: true
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: date)
> outputColumnNames: _col0
> Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> EXPLAIN VECTORIZATION DETAIL
> (Note the added  Select Vectorization, Group By Vectorization, Reduce Sink 
> Vectorization sections in this example)
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>  

[jira] [Updated] (HIVE-14903) from_utc_time function issue for CET daylight savings

2016-10-06 Thread Eric Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-14903:

Description: 
Based on https://en.wikipedia.org/wiki/Central_European_Summer_Time, the summer 
time is between 1:00 UTC on the last Sunday of March and 1:00 on the last 
Sunday of October, see test case below:

Impala:

{code}
select from_utc_timestamp('2016-10-30 00:30:00','CET');
Query: select from_utc_timestamp('2016-10-30 00:30:00','CET')
+--+
| from_utc_timestamp('2016-10-30 00:30:00', 'cet') |
+--+
| 2016-10-30 01:30:00  |
+--+
{code}

Hive:

{code}
select from_utc_timestamp('2016-10-30 00:30:00','CET');
INFO  : OK
++--+
|  _c0   |
++--+
| 2016-10-30 01:30:00.0  |
++--+
{code}

MySQL:

{code}
mysql> SELECT CONVERT_TZ( '2016-10-30 00:30:00', 'UTC', 'CET' );
+---+
| CONVERT_TZ( '2016-10-30 00:30:00', 'UTC', 'CET' ) |
+---+
| 2016-10-30 02:30:00   |
+---+
{code}

At 00:30AM UTC, the daylight saving has not finished so the time different 
should still be 2 hours rather than 1. MySQL returned correct result

At 1:30, results are correct:

Impala:

{code}
Query: select from_utc_timestamp('2016-10-30 01:30:00','CET')
+--+
| from_utc_timestamp('2016-10-30 01:30:00', 'cet') |
+--+
| 2016-10-30 02:30:00  |
+--+
Fetched 1 row(s) in 0.01s
{code}

Hive:

{code}
++--+
|  _c0   |
++--+
| 2016-10-30 02:30:00.0  |
++--+
1 row selected (0.252 seconds)
{code}

MySQL:

{code}
mysql> SELECT CONVERT_TZ( '2016-10-30 01:30:00', 'UTC', 'CET' );
+---+
| CONVERT_TZ( '2016-10-30 01:30:00', 'UTC', 'CET' ) |
+---+
| 2016-10-30 02:30:00   |
+---+
1 row in set (0.00 sec)
{code}

Seems like a bug.

  was:
Based on https://en.wikipedia.org/wiki/Central_European_Summer_Time, the summer 
time is between 1:00 UTC on the last Sunday of March and 1:00 on the last 
Sunday of October, see test case below:

Impala:

{code}
[host-10-17-101-195.coe.cloudera.com:25003] > select 
from_utc_timestamp('2016-10-30 00:30:00','CET');
Query: select from_utc_timestamp('2016-10-30 00:30:00','CET')
+--+
| from_utc_timestamp('2016-10-30 00:30:00', 'cet') |
+--+
| 2016-10-30 01:30:00  |
+--+
{code}

Hive:

{code}
0: jdbc:hive2://host-10-17-101-195.coe.cloude> select 
from_utc_timestamp('2016-10-30 00:30:00','CET');
INFO  : OK
++--+
|  _c0   |
++--+
| 2016-10-30 01:30:00.0  |
++--+
{code}

MySQL:

{code}
mysql> SELECT CONVERT_TZ( '2016-10-30 00:30:00', 'UTC', 'CET' );
+---+
| CONVERT_TZ( '2016-10-30 00:30:00', 'UTC', 'CET' ) |
+---+
| 2016-10-30 02:30:00   |
+---+
{code}

At 00:30AM UTC, the daylight saving has not finished so the time different 
should still be 2 hours rather than 1. MySQL returned correct result

At 1:30, results are correct:

Impala:

{code}
Query: select from_utc_timestamp('2016-10-30 01:30:00','CET')
+--+
| from_utc_timestamp('2016-10-30 01:30:00', 'cet') |
+--+
| 2016-10-30 02:30:00  |
+--+
Fetched 1 row(s) in 0.01s
{code}

Hive:

{code}
++--+
|  _c0   |
++--+
| 2016-10-30 02:30:00.0  |
++--+
1 row selected (0.252 seconds)
{code}

MySQL:

{code}
mysql> SELECT CONVERT_TZ( '2016-10-30 01:30:00', 'UTC', 'CET' );
+---+
| CONVERT_TZ( '2016-10-30 01:30:00', 'UTC', 'CET' ) |
+---+
| 2016-10-30 02:30:00   |
+---+
1 row in set (0.00 sec)
{code}

Seems like a bug.

[jira] [Commented] (HIVE-14903) from_utc_time function issue for CET daylight savings

2016-10-06 Thread Eric Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551569#comment-15551569
 ] 

Eric Lin commented on HIVE-14903:
-

Both Hive and Impala seem to have the issue, so also created 
https://issues.cloudera.org/browse/IMPALA-4250

> from_utc_time function issue for CET daylight savings
> -
>
> Key: HIVE-14903
> URL: https://issues.apache.org/jira/browse/HIVE-14903
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.0.1
>Reporter: Eric Lin
>Priority: Minor
>
> Based on https://en.wikipedia.org/wiki/Central_European_Summer_Time, the 
> summer time is between 1:00 UTC on the last Sunday of March and 1:00 on the 
> last Sunday of October, see test case below:
> Impala:
> {code}
> select from_utc_timestamp('2016-10-30 00:30:00','CET');
> Query: select from_utc_timestamp('2016-10-30 00:30:00','CET')
> +--+
> | from_utc_timestamp('2016-10-30 00:30:00', 'cet') |
> +--+
> | 2016-10-30 01:30:00  |
> +--+
> {code}
> Hive:
> {code}
> select from_utc_timestamp('2016-10-30 00:30:00','CET');
> INFO  : OK
> ++--+
> |  _c0   |
> ++--+
> | 2016-10-30 01:30:00.0  |
> ++--+
> {code}
> MySQL:
> {code}
> mysql> SELECT CONVERT_TZ( '2016-10-30 00:30:00', 'UTC', 'CET' );
> +---+
> | CONVERT_TZ( '2016-10-30 00:30:00', 'UTC', 'CET' ) |
> +---+
> | 2016-10-30 02:30:00   |
> +---+
> {code}
> At 00:30AM UTC, the daylight saving has not finished so the time different 
> should still be 2 hours rather than 1. MySQL returned correct result
> At 1:30, results are correct:
> Impala:
> {code}
> Query: select from_utc_timestamp('2016-10-30 01:30:00','CET')
> +--+
> | from_utc_timestamp('2016-10-30 01:30:00', 'cet') |
> +--+
> | 2016-10-30 02:30:00  |
> +--+
> Fetched 1 row(s) in 0.01s
> {code}
> Hive:
> {code}
> ++--+
> |  _c0   |
> ++--+
> | 2016-10-30 02:30:00.0  |
> ++--+
> 1 row selected (0.252 seconds)
> {code}
> MySQL:
> {code}
> mysql> SELECT CONVERT_TZ( '2016-10-30 01:30:00', 'UTC', 'CET' );
> +---+
> | CONVERT_TZ( '2016-10-30 01:30:00', 'UTC', 'CET' ) |
> +---+
> | 2016-10-30 02:30:00   |
> +---+
> 1 row in set (0.00 sec)
> {code}
> Seems like a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14875) Enhancement and refactoring of TestLdapAtnProviderWithMiniDS

2016-10-06 Thread Illya Yalovyy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552081#comment-15552081
 ] 

Illya Yalovyy commented on HIVE-14875:
--

[~ctang.ma], [~aihuaxu], [~szehon],

Could you please take a look at this CR:
https://reviews.apache.org/r/52487/

> Enhancement and refactoring of TestLdapAtnProviderWithMiniDS
> 
>
> Key: HIVE-14875
> URL: https://issues.apache.org/jira/browse/HIVE-14875
> Project: Hive
>  Issue Type: Test
>  Components: Authentication, Tests
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-14875.1.patch
>
>
> This makes the following enhancements to TestLdapAtnProviderWithMiniDS:
>  
> * Extract defined ldifs to a resource file. 
> * Remove unneeded attributes defined in each ldif entry such as:
>   * sn (Surname) and givenName from group entries
>   * distinguishedName from all entries as this attribute serves more
> as a parent type of many other attributes.
> * Remove setting ExtensibleObject as an objectClass for all ldap entries
>   as that is not needed. This objectClass would allow for adding any
>   attribute to an entry.
> * Add missing uid attribute to group entries whose dn refer to a uid
>   attribute
> * Add missing uidObject objectClass to entries that have the uid attribute
> * Explicitly set organizationalPerson objectClass to user entries as
>   they are using inetOrgPerson objectClass which is a subclass of
>   the organizationalPerson objectClass
> * Create indexes on cn and uid attributes as they are commonly
> queried.
> * Removed unused variables and imports.
> * Fixed givenName for user3.
> * Other minor code clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13306) Better Decimal vectorization

2016-10-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551796#comment-15551796
 ] 

Hive QA commented on HIVE-13306:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12831891/HIVE-13306.3.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10725 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1416/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1416/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1416/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12831891 - PreCommit-HIVE-Build

> Better Decimal vectorization
> 
>
> Key: HIVE-13306
> URL: https://issues.apache.org/jira/browse/HIVE-13306
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-13306.1.patch, HIVE-13306.2.patch, 
> HIVE-13306.3.patch
>
>
> Decimal Vectorization Requirements
> • Today, the LongColumnVector, DoubleColumnVector, BytesColumnVector, 
> TimestampColumnVector classes store the data as primitive Java data types 
> long, double, or byte arrays for efficiency.
> • DecimalColumnVector is different - it has an array of Object references 
> to HiveDecimal objects.
> • The HiveDecimal object uses an internal object BigDecimal for its 
> implementation.  Further, BigDecimal itself uses an internal object 
> BigInteger for its implementation, and BigInteger uses an int array.  4 
> objects total.
> • And, HiveDecimal is an immutable object which means arithmetic and 
> other operations produce new HiveDecimal object with 3 new objects underneath.
> • A major reason Vectorization is fast is the ColumnVector classes except 
> DecimalColumnVector do not have to allocate additional memory per row.   This 
> avoids memory fragmentation and pressure on the Java Garbage Collector that 
> DecimalColumnVector can generate.  It is very significant.
> • What can be done with DecimalColumnVector to make it much more 
> efficient?
> o Design several new decimal classes that allow the caller to manage the 
> decimal storage.
> o If it takes N int values to store a decimal (e.g. N=1..5), then a new 
> DecimalColumnVector would have an int[] of length N*1024 (where 1024 is the 
> default column vector size).
> o Why store a decimal in separate int values?
> • Java does not support 128 bit integers.
> • Java does not support unsigned integers.
> • In order to do multiplication of a decimal represented in a long you 
> need twice the storage (i.e. 128 bits).  So you need to represent parts in 32 
> bit integers.
> • But really since we do not have unsigned, really you can only do 
> multiplications on N-1 bits or 31 bits.
> • So, 5 ints are needed for decimal storage... of 38 digits.
> o It makes sense to have just one algorithm for decimals rather than one 
> for HiveDecimal and another for DecimalColumnVector.  So, make HiveDecimal 
> store N int values, too.
> o A lower level primitive decimal class would accept decimals stored as 
> int arrays and produces results into int arrays.  It would be used by 
> HiveDecimal and DecimalColumnVector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-2263) Wrong value is retrieved by ResultSet.getColumnDisplaySize() for SMALLINT and FLOAT Datatype

2016-10-06 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S resolved HIVE-2263.
-
Resolution: Cannot Reproduce

Closing as Can't Reproduce since  several releases has been given after this 
bug.

> Wrong value is retrieved by ResultSet.getColumnDisplaySize() for SMALLINT and 
> FLOAT Datatype
> 
>
> Key: HIVE-2263
> URL: https://issues.apache.org/jira/browse/HIVE-2263
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 0.5.0, 0.7.1
> Environment: Linux. Hadoop-20.1 and Hive-0.7.1
>Reporter: Rohith Sharma K S
>Priority: Minor
>
> 1).Create a table with smallint datatype.
> create table test(a smallint,b int,c bigint);
> 2).ResultSet result = select a,b,c from test;
> 3) When I Try to get the getColumnDisplaySize()for the columns,
> i.e   ResultSet.getColumnDisplaySize(a) is 32 (default vaule  )  
>   ResultSet.getColumnDisplaySize(b) is 16   
>   ResultSet.getColumnDisplaySize(c) is 32 
> But in the code, default is returning.
> {noformat}
> public int getColumnDisplaySize(int column) throws SQLException {
> // taking a stab at appropriate values
> switch (getColumnType(column)) {
> case Types.VARCHAR:
> case Types.BIGINT:
>   return 32;
> case Types.TINYINT:
>   return 2;
> case Types.BOOLEAN:
>   return 8;
> case Types.DOUBLE:
> case Types.INTEGER:
>   return 16;
> default:
>   return 32;
> }
>   }
> {noformat}
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14875) Enhancement and refactoring of TestLdapAtnProviderWithMiniDS

2016-10-06 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552835#comment-15552835
 ] 

Aihua Xu commented on HIVE-14875:
-

The change looks cleaner. +1.

> Enhancement and refactoring of TestLdapAtnProviderWithMiniDS
> 
>
> Key: HIVE-14875
> URL: https://issues.apache.org/jira/browse/HIVE-14875
> Project: Hive
>  Issue Type: Test
>  Components: Authentication, Tests
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-14875.1.patch
>
>
> This makes the following enhancements to TestLdapAtnProviderWithMiniDS:
>  
> * Extract defined ldifs to a resource file. 
> * Remove unneeded attributes defined in each ldif entry such as:
>   * sn (Surname) and givenName from group entries
>   * distinguishedName from all entries as this attribute serves more
> as a parent type of many other attributes.
> * Remove setting ExtensibleObject as an objectClass for all ldap entries
>   as that is not needed. This objectClass would allow for adding any
>   attribute to an entry.
> * Add missing uid attribute to group entries whose dn refer to a uid
>   attribute
> * Add missing uidObject objectClass to entries that have the uid attribute
> * Explicitly set organizationalPerson objectClass to user entries as
>   they are using inetOrgPerson objectClass which is a subclass of
>   the organizationalPerson objectClass
> * Create indexes on cn and uid attributes as they are commonly
> queried.
> * Removed unused variables and imports.
> * Fixed givenName for user3.
> * Other minor code clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14875) Enhancement and refactoring of TestLdapAtnProviderWithMiniDS

2016-10-06 Thread Illya Yalovyy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552888#comment-15552888
 ] 

Illya Yalovyy commented on HIVE-14875:
--

Thank you!

> Enhancement and refactoring of TestLdapAtnProviderWithMiniDS
> 
>
> Key: HIVE-14875
> URL: https://issues.apache.org/jira/browse/HIVE-14875
> Project: Hive
>  Issue Type: Test
>  Components: Authentication, Tests
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-14875.1.patch
>
>
> This makes the following enhancements to TestLdapAtnProviderWithMiniDS:
>  
> * Extract defined ldifs to a resource file. 
> * Remove unneeded attributes defined in each ldif entry such as:
>   * sn (Surname) and givenName from group entries
>   * distinguishedName from all entries as this attribute serves more
> as a parent type of many other attributes.
> * Remove setting ExtensibleObject as an objectClass for all ldap entries
>   as that is not needed. This objectClass would allow for adding any
>   attribute to an entry.
> * Add missing uid attribute to group entries whose dn refer to a uid
>   attribute
> * Add missing uidObject objectClass to entries that have the uid attribute
> * Explicitly set organizationalPerson objectClass to user entries as
>   they are using inetOrgPerson objectClass which is a subclass of
>   the organizationalPerson objectClass
> * Create indexes on cn and uid attributes as they are commonly
> queried.
> * Removed unused variables and imports.
> * Fixed givenName for user3.
> * Other minor code clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14875) Enhancement and refactoring of TestLdapAtnProviderWithMiniDS

2016-10-06 Thread Illya Yalovyy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552887#comment-15552887
 ] 

Illya Yalovyy commented on HIVE-14875:
--

Thank you!

> Enhancement and refactoring of TestLdapAtnProviderWithMiniDS
> 
>
> Key: HIVE-14875
> URL: https://issues.apache.org/jira/browse/HIVE-14875
> Project: Hive
>  Issue Type: Test
>  Components: Authentication, Tests
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-14875.1.patch
>
>
> This makes the following enhancements to TestLdapAtnProviderWithMiniDS:
>  
> * Extract defined ldifs to a resource file. 
> * Remove unneeded attributes defined in each ldif entry such as:
>   * sn (Surname) and givenName from group entries
>   * distinguishedName from all entries as this attribute serves more
> as a parent type of many other attributes.
> * Remove setting ExtensibleObject as an objectClass for all ldap entries
>   as that is not needed. This objectClass would allow for adding any
>   attribute to an entry.
> * Add missing uid attribute to group entries whose dn refer to a uid
>   attribute
> * Add missing uidObject objectClass to entries that have the uid attribute
> * Explicitly set organizationalPerson objectClass to user entries as
>   they are using inetOrgPerson objectClass which is a subclass of
>   the organizationalPerson objectClass
> * Create indexes on cn and uid attributes as they are commonly
> queried.
> * Removed unused variables and imports.
> * Fixed givenName for user3.
> * Other minor code clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-14875) Enhancement and refactoring of TestLdapAtnProviderWithMiniDS

2016-10-06 Thread Illya Yalovyy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Illya Yalovyy updated HIVE-14875:
-
Comment: was deleted

(was: Thank you!)

> Enhancement and refactoring of TestLdapAtnProviderWithMiniDS
> 
>
> Key: HIVE-14875
> URL: https://issues.apache.org/jira/browse/HIVE-14875
> Project: Hive
>  Issue Type: Test
>  Components: Authentication, Tests
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
> Attachments: HIVE-14875.1.patch
>
>
> This makes the following enhancements to TestLdapAtnProviderWithMiniDS:
>  
> * Extract defined ldifs to a resource file. 
> * Remove unneeded attributes defined in each ldif entry such as:
>   * sn (Surname) and givenName from group entries
>   * distinguishedName from all entries as this attribute serves more
> as a parent type of many other attributes.
> * Remove setting ExtensibleObject as an objectClass for all ldap entries
>   as that is not needed. This objectClass would allow for adding any
>   attribute to an entry.
> * Add missing uid attribute to group entries whose dn refer to a uid
>   attribute
> * Add missing uidObject objectClass to entries that have the uid attribute
> * Explicitly set organizationalPerson objectClass to user entries as
>   they are using inetOrgPerson objectClass which is a subclass of
>   the organizationalPerson objectClass
> * Create indexes on cn and uid attributes as they are commonly
> queried.
> * Removed unused variables and imports.
> * Fixed givenName for user3.
> * Other minor code clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14876) make the number of rows to fetch from various HS2 clients/servers configurable

2016-10-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553032#comment-15553032
 ] 

Sergey Shelukhin commented on HIVE-14876:
-

[~vgumashta] this patch actually adds a config setting to make use of this 
setting everywhere (originally, I was going to add a separate setting).
"Bumping up" the hardcoded values is not an option (note there are 3 different 
hardcoded values).
Also not everyone uses JDBC. There's also (at least) ODBC, and maybe other 
callers.

> make the number of rows to fetch from various HS2 clients/servers configurable
> --
>
> Key: HIVE-14876
> URL: https://issues.apache.org/jira/browse/HIVE-14876
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14876.patch
>
>
> Right now, it's hardcoded to a variety of values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14721) Fix TestJdbcWithMiniHS2 runtime

2016-10-06 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-14721:

Attachment: HIVE-14721.4.patch

> Fix TestJdbcWithMiniHS2 runtime
> ---
>
> Key: HIVE-14721
> URL: https://issues.apache.org/jira/browse/HIVE-14721
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-14721.1.patch, HIVE-14721.2.patch, 
> HIVE-14721.3.patch, HIVE-14721.3.patch, HIVE-14721.3.patch, 
> HIVE-14721.4.patch, HIVE-14721.4.patch
>
>
> Currently 450s



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14721) Fix TestJdbcWithMiniHS2 runtime

2016-10-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553232#comment-15553232
 ] 

Hive QA commented on HIVE-14721:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12832030/HIVE-14721.4.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10658 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testSessionScratchDirs
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testUdfBlackListOverride
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testUdfWhiteBlackList
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1421/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1421/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1421/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12832030 - PreCommit-HIVE-Build

> Fix TestJdbcWithMiniHS2 runtime
> ---
>
> Key: HIVE-14721
> URL: https://issues.apache.org/jira/browse/HIVE-14721
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-14721.1.patch, HIVE-14721.2.patch, 
> HIVE-14721.3.patch, HIVE-14721.3.patch, HIVE-14721.3.patch, 
> HIVE-14721.4.patch, HIVE-14721.4.patch
>
>
> Currently 450s



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14640) handle hive.merge.*files in select queries

2016-10-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14640:

Attachment: HIVE-14640.WIP.patch

I am very slow to understand the merge code... this so far handles the simple 
path correctly but doesn't commit. It seems like the final MoveTask doesn't do 
anything table-related due to having a LFD. I was expecting the original 
MoveTask to still execute and load the table. 

Also it seems like explain query creates a write ID, which I probably won't 
fix, merge with ACID will resolve that.

> handle hive.merge.*files in select queries
> --
>
> Key: HIVE-14640
> URL: https://issues.apache.org/jira/browse/HIVE-14640
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14640.WIP.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14640) handle hive.merge.*files in select queries

2016-10-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553279#comment-15553279
 ] 

Sergey Shelukhin edited comment on HIVE-14640 at 10/6/16 9:45 PM:
--

I am very slow to understand the merge code... this so far handles the simple 
path correctly but doesn't commit. It seems like the final MoveTask doesn't do 
anything table-related due to having a LFD. I was expecting the original 
MoveTask to still execute and load the table. Will continue looking next week.

Also it seems like explain query creates a write ID, which I probably won't 
fix, merge with ACID will resolve that.


was (Author: sershe):
I am very slow to understand the merge code... this so far handles the simple 
path correctly but doesn't commit. It seems like the final MoveTask doesn't do 
anything table-related due to having a LFD. I was expecting the original 
MoveTask to still execute and load the table. 

Also it seems like explain query creates a write ID, which I probably won't 
fix, merge with ACID will resolve that.

> handle hive.merge.*files in select queries
> --
>
> Key: HIVE-14640
> URL: https://issues.apache.org/jira/browse/HIVE-14640
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14640.WIP.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14550) HiveServer2: enable ThriftJDBCBinarySerde use by default

2016-10-06 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552716#comment-15552716
 ] 

Vaibhav Gumashta commented on HIVE-14550:
-

Thanks for the analysis [~ziyangz]. I think we'll need to submit the patch 
again after resolving the other 2 open issues in the parent.

> HiveServer2: enable ThriftJDBCBinarySerde use by default
> 
>
> Key: HIVE-14550
> URL: https://issues.apache.org/jira/browse/HIVE-14550
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC, ODBC
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Ziyang Zhao
> Attachments: HIVE-14550.1.patch
>
>
> We've covered all items in HIVE-12427 and created HIVE-14549 for part2 of the 
> effort. Before closing the umbrella jira, we should enable this feature by 
> default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14799) Query operation are not thread safe during its cancellation

2016-10-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553026#comment-15553026
 ] 

Sergey Shelukhin commented on HIVE-14799:
-

New patch mostly looks good. I was suggesting we move to the model where only 
one thread from the driver manages all the resources except before it's 
started, but I guess the current model also works.
There's a typo in description ("until")
+1
[~ashutoshc] do you also want to take a look?

> Query operation are not thread safe during its cancellation
> ---
>
> Key: HIVE-14799
> URL: https://issues.apache.org/jira/browse/HIVE-14799
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-14799.1.patch, HIVE-14799.2.patch, HIVE-14799.patch
>
>
> When a query is cancelled either via Beeline (Ctrl-C) or API call 
> TCLIService.Client.CancelOperation, SQLOperation.cancel is invoked in a 
> different thread from that running the query to close/destroy its 
> encapsulated Driver object. Both SQLOperation and Driver are not thread-safe 
> which could sometimes result in Runtime exceptions like NPE. The errors from 
> the running query are not handled properly therefore probably causing some 
> stuffs (files, locks etc) not being cleaned after the query termination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14889) Beeline leaks sensitive environment variables of HiveServer2 when you type set;

2016-10-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553080#comment-15553080
 ] 

Hive QA commented on HIVE-14889:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12832011/HIVE-14889.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10661 tests 
executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1419/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1419/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1419/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12832011 - PreCommit-HIVE-Build

> Beeline leaks sensitive environment variables of HiveServer2 when you type 
> set;
> ---
>
> Key: HIVE-14889
> URL: https://issues.apache.org/jira/browse/HIVE-14889
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14889.1.patch, HIVE-14889.2.patch
>
>
> When you type set; beeline prints all the environment variables including 
> passwords which could be major security risk. Eg: HADOOP_CREDENTIAL_PASSWORD 
> below is leaked.
> {noformat}
> | env:HADOOP_CREDSTORE_PASSWORD=password |
> | env:HADOOP_DATANODE_OPTS=-Dhadoop.security.logger=ERROR,RFAS  |
> | env:HADOOP_HOME_WARN_SUPPRESS=true |
> | env:HADOOP_IDENT_STRING=vihang |
> | env:HADOOP_PID_DIR=|
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14889) Beeline leaks sensitive environment variables of HiveServer2 when you type set;

2016-10-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552998#comment-15552998
 ] 

Sergio Peña commented on HIVE-14889:


LGTM +1

> Beeline leaks sensitive environment variables of HiveServer2 when you type 
> set;
> ---
>
> Key: HIVE-14889
> URL: https://issues.apache.org/jira/browse/HIVE-14889
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14889.1.patch, HIVE-14889.2.patch
>
>
> When you type set; beeline prints all the environment variables including 
> passwords which could be major security risk. Eg: HADOOP_CREDENTIAL_PASSWORD 
> below is leaked.
> {noformat}
> | env:HADOOP_CREDSTORE_PASSWORD=password |
> | env:HADOOP_DATANODE_OPTS=-Dhadoop.security.logger=ERROR,RFAS  |
> | env:HADOOP_HOME_WARN_SUPPRESS=true |
> | env:HADOOP_IDENT_STRING=vihang |
> | env:HADOOP_PID_DIR=|
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14550) HiveServer2: enable ThriftJDBCBinarySerde use by default

2016-10-06 Thread Ziyang Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552706#comment-15552706
 ] 

Ziyang Zhao commented on HIVE-14550:


All failed tests passed in my local, seems unrelated.

> HiveServer2: enable ThriftJDBCBinarySerde use by default
> 
>
> Key: HIVE-14550
> URL: https://issues.apache.org/jira/browse/HIVE-14550
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC, ODBC
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Ziyang Zhao
> Attachments: HIVE-14550.1.patch
>
>
> We've covered all items in HIVE-12427 and created HIVE-14549 for part2 of the 
> effort. Before closing the umbrella jira, we should enable this feature by 
> default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14721) Fix TestJdbcWithMiniHS2 runtime

2016-10-06 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-14721:

Attachment: HIVE-14721.4.patch

> Fix TestJdbcWithMiniHS2 runtime
> ---
>
> Key: HIVE-14721
> URL: https://issues.apache.org/jira/browse/HIVE-14721
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-14721.1.patch, HIVE-14721.2.patch, 
> HIVE-14721.3.patch, HIVE-14721.3.patch, HIVE-14721.3.patch, HIVE-14721.4.patch
>
>
> Currently 450s



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14901) HiveServer2: Use user supplied fetch size to determine #rows serialized in tasks

2016-10-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553034#comment-15553034
 ] 

Sergey Shelukhin commented on HIVE-14901:
-

We don't even really use the config in all cases, see HIVE-14876.

> HiveServer2: Use user supplied fetch size to determine #rows serialized in 
> tasks
> 
>
> Key: HIVE-14901
> URL: https://issues.apache.org/jira/browse/HIVE-14901
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC, ODBC
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>
> Currently, we use {{hive.server2.thrift.resultset.max.fetch.size}} to decide 
> the max number of rows that we write in tasks. However, we should ideally use 
> the user supplied value (which can be extracted from the 
> ThriftCLIService.FetchResults' request parameter) to decide how many rows to 
> serialize in a blob in the tasks. We should however use 
> {{hive.server2.thrift.resultset.max.fetch.size}} to have an upper bound on 
> it, so that we don't go OOM in tasks and HS2. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13589) beeline - support prompt for password with '-u' option

2016-10-06 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553068#comment-15553068
 ] 

Vihang Karajgaonkar commented on HIVE-13589:


Hey [~Ferd] .. Can you please review? I incorporated your review comments which 
you had given earlier in my patch shared with you and Ke Jia. Thanks!

> beeline - support prompt for password with '-u' option
> --
>
> Key: HIVE-13589
> URL: https://issues.apache.org/jira/browse/HIVE-13589
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Thejas M Nair
>Assignee: Vihang Karajgaonkar
> Fix For: 2.2.0
>
> Attachments: HIVE-13589.1.patch, HIVE-13589.2.patch, 
> HIVE-13589.3.patch, HIVE-13589.4.patch, HIVE-13589.5.patch, 
> HIVE-13589.6.patch, HIVE-13589.7.patch, HIVE-13589.8.patch, HIVE-13589.9.patch
>
>
> Specifying connection string using commandline options in beeline is 
> convenient, as it gets saved in shell command history, and it is easy to 
> retrieve it from there.
> However, specifying the password in command prompt is not secure as it gets 
> displayed on screen and saved in the history.
> It should be possible to specify '-p' without an argument to make beeline 
> prompt for password.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14721) Fix TestJdbcWithMiniHS2 runtime

2016-10-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553086#comment-15553086
 ] 

Hive QA commented on HIVE-14721:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12832017/HIVE-14721.4.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1420/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1420/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1420/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2016-10-06 20:25:00.235
+ [[ -n /usr/java/jdk1.8.0_25 ]]
+ export JAVA_HOME=/usr/java/jdk1.8.0_25
+ JAVA_HOME=/usr/java/jdk1.8.0_25
+ export 
PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/java/jdk1.8.0_25/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-1420/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2016-10-06 20:25:00.237
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at e1fa278 HIVE-14896 : Stabilize golden files for currently 
failing tests
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at e1fa278 HIVE-14896 : Stabilize golden files for currently 
failing tests
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2016-10-06 20:25:01.282
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
fatal: git diff header lacks filename information when removing 0 leading 
pathname components (line 4)
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12832017 - PreCommit-HIVE-Build

> Fix TestJdbcWithMiniHS2 runtime
> ---
>
> Key: HIVE-14721
> URL: https://issues.apache.org/jira/browse/HIVE-14721
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-14721.1.patch, HIVE-14721.2.patch, 
> HIVE-14721.3.patch, HIVE-14721.3.patch, HIVE-14721.3.patch, HIVE-14721.4.patch
>
>
> Currently 450s



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14555) JDBC:ClassNotFoundException when executing a map join query with UDF

2016-10-06 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552534#comment-15552534
 ] 

Aihua Xu commented on HIVE-14555:
-

[~hizero] We are also seeing such issue. I don't quite follow what causes the 
issue. Can you explain more in details? Thanks.

> JDBC:ClassNotFoundException when executing a map join query with UDF
> 
>
> Key: HIVE-14555
> URL: https://issues.apache.org/jira/browse/HIVE-14555
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.1.0
>Reporter: hizero
>Assignee: hizero
> Fix For: 1.1.0
>
> Attachments: HIVE-14555.patch
>
>
> when I submit a map join query with UDF using JDBC  and sometimes it throws:
> Error while compiling statement: FAILED: SemanticException Generate Map Join 
> Task Error: Unable to find class: com.kingnetdc.hive.udf.FilterByMap 
> Serialization trace: genericUDF 
> (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc) colExprMap 
> (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators 
> (org.apache.hadoop.hive.ql.exec.FilterOperator) childOperators 
> (org.apache.hadoop.hive.ql.exec.JoinOperator) reducer 
> (org.apache.hadoop.hive.ql.plan.ReduceWork) reduceWork 
> (org.apache.hadoop.hive.ql.plan.MapredWork)
>  I have found the fact that it fails at cloning plan when invoking 
> Utilities.deserializePlan.
> An existing thread deals with the query and its static threadlocal 
> variable,cloningQueryPlanKryo has been initialed at most once per thread.When 
> this thread registered UDF setting in aux_jar_paths  it wont reinitialize the 
> cloningQueryPlanKryo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13690) Shade guava in hive-exec fat jar

2016-10-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552671#comment-15552671
 ] 

Hive QA commented on HIVE-13690:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12802303/HIVE-13690.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 705 failed/errored test(s), 6341 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
org.apache.hadoop.hive.cli.TestCliDriverMethods.testProcessSelectDatabase
org.apache.hadoop.hive.cli.TestCliDriverMethods.testprocessInitFiles
org.apache.hadoop.hive.cli.TestCompareCliDriver.org.apache.hadoop.hive.cli.TestCompareCliDriver
org.apache.hadoop.hive.cli.TestContribCliDriver.org.apache.hadoop.hive.cli.TestContribCliDriver
org.apache.hadoop.hive.cli.TestContribNegativeCliDriver.org.apache.hadoop.hive.cli.TestContribNegativeCliDriver
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver
org.apache.hadoop.hive.cli.TestHBaseCliDriver.org.apache.hadoop.hive.cli.TestHBaseCliDriver
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop]
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[cascade_dbdrop_hadoop20]
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver[generatehfiles_require_family_path]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
org.apache.hadoop.hive.cli.TestNegativeCliDriver.org.apache.hadoop.hive.cli.TestNegativeCliDriver
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver
org.apache.hadoop.hive.cli.TestPerfCliDriver.org.apache.hadoop.hive.cli.TestPerfCliDriver
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
org.apache.hadoop.hive.hooks.TestHs2Hooks.testHookContexts
org.apache.hadoop.hive.metastore.TestFilterHooks.org.apache.hadoop.hive.metastore.TestFilterHooks
org.apache.hadoop.hive.metastore.TestMarkPartition.testMarkingPartitionSet
org.apache.hadoop.hive.metastore.TestMarkPartitionRemote.testMarkingPartitionSet
org.apache.hadoop.hive.metastore.TestMetaStoreEndFunctionListener.testEndFunctionListener
org.apache.hadoop.hive.metastore.TestMetaStoreEventListener.testListener
org.apache.hadoop.hive.metastore.TestMetaStoreEventListenerOnlyOnCommit.testEventStatus
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMetaDataCounts
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testMethodCounts
org.apache.hadoop.hive.metastore.TestMetastoreVersion.testMetastoreVersion
org.apache.hadoop.hive.metastore.TestMetastoreVersion.testVersionCompatibility
org.apache.hadoop.hive.metastore.TestMetastoreVersion.testVersionMatching
org.apache.hadoop.hive.metastore.TestMetastoreVersion.testVersionMisMatch
org.apache.hadoop.hive.metastore.hbase.TestHBaseMetastoreMetrics.testMetaDataCounts
org.apache.hadoop.hive.metastore.hbase.TestHBaseMetastoreSql.alterRename
org.apache.hadoop.hive.metastore.hbase.TestHBaseMetastoreSql.alterRenamePartitioned
org.apache.hadoop.hive.metastore.hbase.TestHBaseMetastoreSql.database
org.apache.hadoop.hive.metastore.hbase.TestHBaseMetastoreSql.describeNonpartitionedTable
org.apache.hadoop.hive.metastore.hbase.TestHBaseMetastoreSql.grant
org.apache.hadoop.hive.metastore.hbase.TestHBaseMetastoreSql.insertIntoPartitionTable
org.apache.hadoop.hive.metastore.hbase.TestHBaseMetastoreSql.insertIntoTable
org.apache.hadoop.hive.metastore.hbase.TestHBaseMetastoreSql.partitionedTable
org.apache.hadoop.hive.metastore.hbase.TestHBaseMetastoreSql.role
org.apache.hadoop.hive.metastore.hbase.TestHBaseMetastoreSql.table
org.apache.hadoop.hive.ql.TestAcidOnTez.testMapJoinOnMR
org.apache.hadoop.hive.ql.TestAcidOnTez.testMapJoinOnTez
org.apache.hadoop.hive.ql.TestAcidOnTez.testMergeJoinOnMR
org.apache.hadoop.hive.ql.TestAcidOnTez.testMergeJoinOnTez
org.apache.hadoop.hive.ql.TestAcidOnTezWithSplitUpdate.testMapJoinOnMR
org.apache.hadoop.hive.ql.TestAcidOnTezWithSplitUpdate.testMapJoinOnTez
org.apache.hadoop.hive.ql.TestAcidOnTezWithSplitUpdate.testMergeJoinOnMR
org.apache.hadoop.hive.ql.TestAcidOnTezWithSplitUpdate.testMergeJoinOnTez
org.apache.hadoop.hive.ql.TestCreateUdfEntities.testUdfWithDfsResource

[jira] [Updated] (HIVE-14889) Beeline leaks sensitive environment variables of HiveServer2 when you type set;

2016-10-06 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-14889:
---
Attachment: HIVE-14889.2.patch

Updating the patch so that messaging is more consistent with other hidden 
configs

> Beeline leaks sensitive environment variables of HiveServer2 when you type 
> set;
> ---
>
> Key: HIVE-14889
> URL: https://issues.apache.org/jira/browse/HIVE-14889
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-14889.1.patch, HIVE-14889.2.patch
>
>
> When you type set; beeline prints all the environment variables including 
> passwords which could be major security risk. Eg: HADOOP_CREDENTIAL_PASSWORD 
> below is leaked.
> {noformat}
> | env:HADOOP_CREDSTORE_PASSWORD=password |
> | env:HADOOP_DATANODE_OPTS=-Dhadoop.security.logger=ERROR,RFAS  |
> | env:HADOOP_HOME_WARN_SUPPRESS=true |
> | env:HADOOP_IDENT_STRING=vihang |
> | env:HADOOP_PID_DIR=|
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14904) Hive: Constant fold VALUES() clauses instead of erroring out

2016-10-06 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552619#comment-15552619
 ] 

Gopal V commented on HIVE-14904:


Because of the way this is implemented in the SemanticAnalyzer *before* the 
semantic information is all established, this might be a whole yak shaving 
exercise.

> Hive: Constant fold VALUES() clauses instead of erroring out
> 
>
> Key: HIVE-14904
> URL: https://issues.apache.org/jira/browse/HIVE-14904
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.1, 2.1.0, 2.2.0
>Reporter: Gopal V
>Priority: Minor
>
> {code}
> hive> create temporary table foo (a int);
> hive> insert into foo values((1+1));
> FAILED: SemanticException [Error 10293]: Unable to create temp file for 
> insert values Expression of type + not supported in insert/values
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14099) Hive security authorization can be disabled by users

2016-10-06 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14099:
-
Description: 
In case we enables :

hive.security.authorization.enabled=true in hive-site.xml

this setting can be disabled by users at their hive prompt. There should be 
hardcoded setting in the configs.

The other thing is once we enable authorization, the tables that got created 
before enabling looses access as they don't have authorization defined. How 
this situation can be tackled in hive.

Note that this issue does not affect SQL standard or ranger authorization 
plugin.

  was:
In case we enables :

hive.security.authorization.enabled=true in hive-site.xml

this setting can be disabled by users at their hive prompt. There should be 
hardcoded setting in the configs.

The other thing is once we enable authorization, the tables that got created 
before enabling looses access as they don't have authorization defined. How 
this situation can be tackled in hive.


> Hive security authorization can be disabled by users
> 
>
> Key: HIVE-14099
> URL: https://issues.apache.org/jira/browse/HIVE-14099
> Project: Hive
>  Issue Type: Improvement
>  Components: Authorization
>Affects Versions: 0.13.1
>Reporter: Prashant Kumar Singh
>Assignee: Aihua Xu
> Fix For: 2.2.0
>
> Attachments: HIVE-14099.1.patch
>
>
> In case we enables :
> hive.security.authorization.enabled=true in hive-site.xml
> this setting can be disabled by users at their hive prompt. There should be 
> hardcoded setting in the configs.
> The other thing is once we enable authorization, the tables that got created 
> before enabling looses access as they don't have authorization defined. How 
> this situation can be tackled in hive.
> Note that this issue does not affect SQL standard or ranger authorization 
> plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14099) Hive security authorization can be disabled by users

2016-10-06 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552496#comment-15552496
 ] 

Thejas M Nair commented on HIVE-14099:
--

Adding a note that this does not affect SQL Standard or ranger authorization 
plugin. They both use a config whitelist, for set of configs that are allowed 
to be modified.

With SQL std auth or ranger you would get an error message like the following -
{code}
0: jdbc:hive2://localhost:1/default> set 
hive.security.authorization.enabled=false;
Error: Error while processing statement: Cannot modify 
hive.security.authorization.enabled at runtime. It is not in list of params 
that are allowed to be modified at runtime (state=42000,code=1)
{code}

This issue would affect [legacy authorization 
mode|https://cwiki.apache.org/confluence/display/Hive/Hive+Default+Authorization+-+Legacy+Mode],
 which is inherently unsecure. 

Also, trying to secure hive-cli this way is meaningless, you can specify any 
config options on commandline to override the settings, or point it to a 
different config directly, or even read directly from HDFS.



> Hive security authorization can be disabled by users
> 
>
> Key: HIVE-14099
> URL: https://issues.apache.org/jira/browse/HIVE-14099
> Project: Hive
>  Issue Type: Improvement
>  Components: Authorization
>Affects Versions: 0.13.1
>Reporter: Prashant Kumar Singh
>Assignee: Aihua Xu
> Fix For: 2.2.0
>
> Attachments: HIVE-14099.1.patch
>
>
> In case we enables :
> hive.security.authorization.enabled=true in hive-site.xml
> this setting can be disabled by users at their hive prompt. There should be 
> hardcoded setting in the configs.
> The other thing is once we enable authorization, the tables that got created 
> before enabling looses access as they don't have authorization defined. How 
> this situation can be tackled in hive.
> Note that this issue does not affect SQL standard or ranger authorization 
> plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14855) test patch

2016-10-06 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14855:
--
Attachment: HIVE-14855.4.patch

> test patch
> --
>
> Key: HIVE-14855
> URL: https://issues.apache.org/jira/browse/HIVE-14855
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14855.2.patch, HIVE-14855.3.patch, 
> HIVE-14855.4.patch, HIVE-14855.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14861) Support precedence for set operator using parentheses

2016-10-06 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553848#comment-15553848
 ] 

Ashutosh Chauhan commented on HIVE-14861:
-

+1

> Support precedence for set operator using parentheses
> -
>
> Key: HIVE-14861
> URL: https://issues.apache.org/jira/browse/HIVE-14861
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14861.01.patch, HIVE-14861.02.patch
>
>
> We should support precedence for set operator by using parentheses. For 
> example
> {code}
> select * from src union all (select * from src union select * from src);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11394) Enhance EXPLAIN display for vectorization

2016-10-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553979#comment-15553979
 ] 

Hive QA commented on HIVE-11394:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12832072/HIVE-11394.06.patch

{color:green}SUCCESS:{color} +1 due to 132 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 10656 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas_colname]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_complex_all]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_udf1]
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_bulk]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_complex_all]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_between_in]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_cast_constant]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_count_distinct]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_decimal_aggregate]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_distinct_2]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_groupby_3]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_mapjoin_reduce]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_orderby_5]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_string_concat]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_13]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_14]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_15]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_16]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_9]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf]
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_shufflejoin]
org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning
org.apache.hadoop.hive.ql.optimizer.physical.TestVectorizer.testValidateNestedExpressions
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1423/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1423/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-Build-1423/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 24 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12832072 - PreCommit-HIVE-Build

> Enhance EXPLAIN display for vectorization
> -
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, 
> HIVE-11394.06.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> The optional clause defaults are not ONLY and SUMMARY.
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> It is the same as EXPLAIN VECTORIZATION SUMMARY.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: decimal_date_test
>   Statistics: Num rows: 12288 Data size: 2467616 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: 
> boolean)
> Statistics: Num rows: 6144 Data 

[jira] [Commented] (HIVE-14474) Create datasource in Druid from Hive

2016-10-06 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553992#comment-15553992
 ] 

Ashutosh Chauhan commented on HIVE-14474:
-

Can you create a RB request for it? Seems like GH request isn't upto date.

> Create datasource in Druid from Hive
> 
>
> Key: HIVE-14474
> URL: https://issues.apache.org/jira/browse/HIVE-14474
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14474.01.patch, HIVE-14474.02.patch, 
> HIVE-14474.03.patch, HIVE-14474.04.patch, HIVE-14474.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In the initial implementation proposed in this issue, we will write the 
> results of the query to HDFS (or the location specified in the CTAS 
> statement), and submit a HadoopIndexing task to the Druid overlord. The task 
> will contain the path where data was stored, it will read it and create the 
> segments in Druid. Once this is done, the results are removed from Hive.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "my_query_based_datasource")
> AS ;
> {code}
> This statement stores the results of query  in a Druid 
> datasource named 'my_query_based_datasource'. One of the columns of the query 
> needs to be the time dimension, which is mandatory in Druid. In particular, 
> we use the same convention that it is used for Druid: there needs to be a the 
> column named '\_\_time' in the result of the executed query, which will act 
> as the time dimension column in Druid. Currently, the time column dimension 
> needs to be a 'timestamp' type column.
> This initial implementation interacts with Druid API as it is currently 
> exposed to the user. In a follow-up issue, we should propose an 
> implementation that integrates tighter with Druid. In particular, we would 
> like to store segments directly in Druid from Hive, thus avoiding the 
> overhead of writing Hive results to HDFS and then launching a MR job that 
> basically reads them again to create the segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14873) Add UDF for extraction of 'day of week'

2016-10-06 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554105#comment-15554105
 ] 

Lefty Leverenz commented on HIVE-14873:
---

Doc note:  This needs to be documented in the wiki.  Added a TODOC2.2 label.

* [Hive Operators and UDFs -- Date Functions | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions]
* [DDL -- Non-reserved Keywords | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Non-reservedKeywords]

This JIRA issue also needs a description -- the syntax doesn't appear in any of 
the comments.  A release note would be nice, too.

> Add UDF for extraction of 'day of week'
> ---
>
> Key: HIVE-14873
> URL: https://issues.apache.org/jira/browse/HIVE-14873
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, UDF
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-14873.01.patch, HIVE-14873.02.patch, 
> HIVE-14873.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14579) Add support for date extract

2016-10-06 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554107#comment-15554107
 ] 

Lefty Leverenz commented on HIVE-14579:
---

The new keywords quarter & week also need to be documented here:

* [DDL -- Non-reserved Keywords | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Non-reservedKeywords]

> Add support for date extract
> 
>
> Key: HIVE-14579
> URL: https://issues.apache.org/jira/browse/HIVE-14579
> Project: Hive
>  Issue Type: Sub-task
>  Components: UDF
>Reporter: Ashutosh Chauhan
>Assignee: Jesus Camacho Rodriguez
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-14579.01.patch, HIVE-14579.patch, HIVE-14579.patch
>
>
> https://www.postgresql.org/docs/9.1/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14877) Move slow CliDriver tests to MiniLlap

2016-10-06 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14877:
-
Attachment: HIVE-14877.4.patch

> Move slow CliDriver tests to MiniLlap
> -
>
> Key: HIVE-14877
> URL: https://issues.apache.org/jira/browse/HIVE-14877
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14877.1.patch, HIVE-14877.2.patch, 
> HIVE-14877.3.patch, HIVE-14877.4.patch
>
>
> When analyzing the test runtimes, there are many CliDriver tests that shows 
> up as stragglers and are slow. Most of these tests are not really testing the 
> execution engine. For example special_character_in_tabnames_1.q is the 
> slowest test case that takes 419s in CliDriver but only 62s in MiniLlap. 
> Similarly there are many test cases that can benefit from fast runtimes. We 
> should consider moving the tests that are not testing the execution engine to 
> MiniLlap (assuming it provides significant performance benefit).
> Here is the list of top 100 slow tests based on build #1055
> ||QFiles||TestCliDriver elapsed time||
> |special_character_in_tabnames_1.q|419.229|
> |unionDistinct_1.q|278.583|
> |vector_leftsemi_mapjoin.q|232.313|
> |join_filters.q|172.436|
> |escape2.q|167.503|
> |archive_excludeHadoop20.q|163.522|
> |escape1.q|130.217|
> |lineage3.q|110.935|
> |insert_into_with_schema.q|107.345|
> |auto_join_filters.q|104.331|
> |windowing.q|99.622|
> |index_compact_binary_search.q|97.637|
> |cbo_rp_windowing_2.q|95.108|
> |vectorized_ptf.q|93.397|
> |dynpart_sort_optimization_acid.q|91.831|
> |partition_multilevels.q|90.392|
> |ptf.q|89.115|
> |sample_islocalmode_hook.q|88.293|
> |udaf_collect_set_2.q|84.725|
> |skewjoin.q|84.588|
> |lineage2.q|84.187|
> |correlationoptimizer1.q|80.367|
> |dynpart_sort_optimization.q|77.07|
> |orc_ppd_decimal.q|75.523|
> |orc_ppd_schema_evol_3a.q|75.352|
> |groupby_sort_skew_1_23.q|75.342|
> |cbo_rp_lineage2.q|75.283|
> |parquet_ppd_decimal.q|74.063|
> |sample_islocalmode_hook_use_metadata.q|73.988|
> |orc_analyze.q|73.803|
> |join_nulls.q|72.417|
> |semijoin.q|70.403|
> |correlationoptimizer6.q|69.151|
> |table_access_keys_stats.q|68.699|
> |autoColumnStats_2.q|68.632|
> |cbo_join.q|68.325|
> |cbo_rp_join.q|68.317|
> |sample10.q|64.513|
> |mergejoin.q|63.647|
> |multi_insert_move_tasks_share_dependencies.q|62.079|
> |union_view.q|61.772|
> |autoColumnStats_1.q|61.246|
> |groupby_sort_1_23.q|61.129|
> |pcr.q|59.546|
> |vectorization_short_regress.q|58.775|
> |auto_sortmerge_join_9.q|58.3|
> |correlationoptimizer2.q|56.591|
> |alter_merge_stats_orc.q|55.202|
> |vector_join30.q|54.85|
> |selectDistinctStar.q|53.981|
> |vector_decimal_udf.q|53.879|
> |auto_join30.q|53.762|
> |subquery_notin.q|52.879|
> |cbo_rp_subq_not_in.q|52.609|
> |cbo_rp_gby.q|51.866|
> |cbo_subq_not_in.q|51.672|
> |cbo_gby.q|50.361|
> |infer_bucket_sort.q|49.158|
> |ptf_streaming.q|48.484|
> |join_1to1.q|48.268|
> |load_dyn_part5.q|47.796|
> |limit_join_transpose.q|47.517|
> |ppd_windowing2.q|47.318|
> |dynpart_sort_opt_vectorization.q|47.208|
> |vector_number_compare_projection.q|47.024|
> |correlationoptimizer4.q|45.472|
> |orc_ppd_date.q|45.19|
> |global_limit.q|44.438|
> |union_top_level.q|44.229|
> |llap_partitioned.q|44.139|
> |orc_ppd_timestamp.q|43.617|
> |parquet_ppd_date.q|43.539|
> |multiMapJoin2.q|43.036|
> |parquet_ppd_timestamp.q|42.665|
> |vector_partitioned_date_time.q|42.511|
> |auto_sortmerge_join_8.q|42.377|
> |create_view.q|42.23|
> |windowing_windowspec2.q|42.202|
> |multiMapJoin1.q|41.176|
> |vector_decimal_2.q|41.026|
> |bucket_groupby.q|40.565|
> |rcfile_merge2.q|39.782|
> |index_compact_2.q|39.765|
> |join_nullsafe.q|39.698|
> |vector_join_filters.q|39.343|
> |cbo_rp_auto_join1.q|39.308|
> |vector_auto_smb_mapjoin_14.q|39.17|
> |vector_udf1.q|38.988|
> |rcfile_createas1.q|38.932|
> |cbo_rp_semijoin.q|38.675|
> |auto_join_nulls.q|38.519|
> |cbo_rp_unionDistinct_2.q|37.815|
> |union_remove_26.q|37.672|
> |rcfile_merge3.q|37.373|
> |rcfile_merge4.q|37.194|
> |bucketsortoptimize_insert_2.q|37.187|
> |cbo_limit.q|37.038|
> |auto_sortmerge_join_6.q|36.663|
> |join43.q|36.656|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14855) test patch

2016-10-06 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554017#comment-15554017
 ] 

Eugene Koifman commented on HIVE-14855:
---

patch 4 actually ran 
https://builds.apache.org/job/PreCommit-HIVE-Build/1422/testReport/
with 

All Failed Tests

|Test Name|Duration|Age|
 |org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testMerge|  12 sec| 
1|
 |org.apache.hadoop.hive.ql.TestTxnCommands2.testMerge| 11 sec| 1|
 
|org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testMerge|
  12 sec| 1|
 |org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching|  
2.1 sec|64|


> test patch
> --
>
> Key: HIVE-14855
> URL: https://issues.apache.org/jira/browse/HIVE-14855
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14855.2.patch, HIVE-14855.3.patch, 
> HIVE-14855.4.patch, HIVE-14855.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization

2016-10-06 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11394:

Attachment: HIVE-11394.06.patch

> Enhance EXPLAIN display for vectorization
> -
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, 
> HIVE-11394.06.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> The optional clause defaults are not ONLY and SUMMARY.
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> It is the same as EXPLAIN VECTORIZATION SUMMARY.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: decimal_date_test
>   Statistics: Num rows: 12288 Data size: 2467616 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: 
> boolean)
> Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
> Select Operator
>   expressions: cdate (type: date)
>   outputColumnNames: _col0
>   Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: date)
> sort order: +
> Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2 
> Execution mode: vectorized, llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> groupByVectorOutput: true
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: date)
> outputColumnNames: _col0
> Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> EXPLAIN VECTORIZATION DETAIL
> (Note the added  Select Vectorization, Group By Vectorization, Reduce Sink 
> Vectorization sections in this example)
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
> Map Operator Tree:
> 

[jira] [Commented] (HIVE-14866) Set hive.limit.optimize.enable to true by default

2016-10-06 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553986#comment-15553986
 ] 

Ashutosh Chauhan commented on HIVE-14866:
-

Are test failures related?

> Set hive.limit.optimize.enable to true by default
> -
>
> Key: HIVE-14866
> URL: https://issues.apache.org/jira/browse/HIVE-14866
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14866.patch
>
>
> Currently, we set up the global limit for the query in two different places 
> through two different variables: SemanticAnalyzer and through an optimization 
> rule GlobalLimitOptimizer (the latest is off by default).
> This leads to several problems that I have observed:
> - Global limit might not be set for very simple queries, e.g., if the query 
> does not contain a RS). GlobalLimitOptimizer would set the limit in this 
> case, but as stated above, it is off by default.
> - Some other optimizations are not checking both variables, thus missing 
> opportunities.
> - The variable set by SemanticAnalyzer does not take into account offset of 
> the query, which I think might lead to incorrect results if FetchOptimizer 
> kicks in (not verified yet). GlobalLimitOptimizer does take into account 
> offset of query.
> This issue is to set hive.limit.optimize.enable to _true_ by default, i.e., 
> use GlobalLimitOptimizer, and thus getting rid of the variable set by 
> SemanticAnalyzer. Maybe there are some gaps (cases covered by 
> SemanticAnalyzer alternative and not covered by GlobalLimitOptimizer) that we 
> will need to work on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14858) Analyze command should support custom input formats

2016-10-06 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554041#comment-15554041
 ] 

Lefty Leverenz commented on HIVE-14858:
---

Should this be documented in the wiki?

* [Statistics in Hive – Existing Tables – ANALYZE | 
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables–ANALYZE]

> Analyze command should support custom input formats
> ---
>
> Key: HIVE-14858
> URL: https://issues.apache.org/jira/browse/HIVE-14858
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14858.1.patch
>
>
> Currently analyze command with partialscan or noscan only applies to 
> OrcInputFormat and MapredParquetInputFormat. However, if custom input formats 
> extend these two they should also be able to use the same command to collect 
> stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14873) Add UDF for extraction of 'day of week'

2016-10-06 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-14873:
--
Labels: TODOC2.2  (was: )

> Add UDF for extraction of 'day of week'
> ---
>
> Key: HIVE-14873
> URL: https://issues.apache.org/jira/browse/HIVE-14873
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, UDF
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-14873.01.patch, HIVE-14873.02.patch, 
> HIVE-14873.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization

2016-10-06 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11394:

Status: In Progress  (was: Patch Available)

> Enhance EXPLAIN display for vectorization
> -
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> The optional clause defaults are not ONLY and SUMMARY.
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> It is the same as EXPLAIN VECTORIZATION SUMMARY.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: decimal_date_test
>   Statistics: Num rows: 12288 Data size: 2467616 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: 
> boolean)
> Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
> Select Operator
>   expressions: cdate (type: date)
>   outputColumnNames: _col0
>   Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: date)
> sort order: +
> Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2 
> Execution mode: vectorized, llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> groupByVectorOutput: true
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: date)
> outputColumnNames: _col0
> Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> EXPLAIN VECTORIZATION DETAIL
> (Note the added  Select Vectorization, Group By Vectorization, Reduce Sink 
> Vectorization sections in this example)
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
> Map Operator Tree:
> 

[jira] [Updated] (HIVE-11394) Enhance EXPLAIN display for vectorization

2016-10-06 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11394:

Status: Patch Available  (was: In Progress)

> Enhance EXPLAIN display for vectorization
> -
>
> Key: HIVE-11394
> URL: https://issues.apache.org/jira/browse/HIVE-11394
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-11394.01.patch, HIVE-11394.02.patch, 
> HIVE-11394.03.patch, HIVE-11394.04.patch, HIVE-11394.05.patch, 
> HIVE-11394.06.patch
>
>
> Add detail to the EXPLAIN output showing why a Map and Reduce work is not 
> vectorized.
> New syntax is: EXPLAIN VECTORIZATION \[ONLY\] \[SUMMARY|DETAIL\]
> The ONLY option suppresses most non-vectorization elements.
> SUMMARY shows vectorization information for the PLAN (is vectorization 
> enabled) and a summary of Map and Reduce work.
> The optional clause defaults are not ONLY and SUMMARY.
> Here are some examples:
> EXPLAIN VECTORIZATION example:
> (Note the PLAN VECTORIZATION, Map Vectorization, Reduce Vectorization 
> sections)
> It is the same as EXPLAIN VECTORIZATION SUMMARY.
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: decimal_date_test
>   Statistics: Num rows: 12288 Data size: 2467616 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: cdate BETWEEN 1969-12-30 AND 1970-01-02 (type: 
> boolean)
> Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
> Select Operator
>   expressions: cdate (type: date)
>   outputColumnNames: _col0
>   Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col0 (type: date)
> sort order: +
> Statistics: Num rows: 6144 Data size: 1233808 Basic 
> stats: COMPLETE Column stats: NONE
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Map Vectorization:
> enabled: true
> enabledConditionsMet: 
> hive.vectorized.use.vectorized.input.format IS true
> groupByVectorOutput: true
> inputFileFormats: 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reducer 2 
> Execution mode: vectorized, llap
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> groupByVectorOutput: true
> allNative: false
> usesVectorUDFAdaptor: false
> vectorized: true
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: date)
> outputColumnNames: _col0
> Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 6144 Data size: 1233808 Basic stats: 
> COMPLETE Column stats: NONE
>   table:
>   input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> {code}
> EXPLAIN VECTORIZATION DETAIL
> (Note the added  Select Vectorization, Group By Vectorization, Reduce Sink 
> Vectorization sections in this example)
> {code}
> PLAN VECTORIZATION:
>   enabled: true
>   enabledConditionsMet: [hive.vectorized.execution.enabled IS true]
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
> …
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> …
>   Vertices:
> Map 1 
> Map Operator 

[jira] [Resolved] (HIVE-14892) Test that explicitly submit jobs via child process are slow

2016-10-06 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-14892.
--
Resolution: Fixed

> Test that explicitly submit jobs via child process are slow
> ---
>
> Key: HIVE-14892
> URL: https://issues.apache.org/jira/browse/HIVE-14892
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 2.2.0
>
> Attachments: HIVE-14892.1.patch, HIVE-14892.2.patch
>
>
> sample_islocalmode_hook.q
> sample_islocalmode_hook_use_metadata.q
> archive_excludeHadoop20.q
> and some more tests are slow because they are setting 
> hive.exec.submitviachild=true.
> We can leave some tests for coverage and move the slow ones over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-13284) Make ORC Reader resilient to 0 length files

2016-10-06 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-13284.
--
Resolution: Fixed

Fixed in ORC-103

> Make ORC Reader resilient to 0 length files
> ---
>
> Key: HIVE-13284
> URL: https://issues.apache.org/jira/browse/HIVE-13284
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> HIVE-13040 creates 0 length ORC files. Reading such files will throw 
> following exception. ORC is resilient to corrupt footers but not 0 length 
> files.
> {code}
> Processing data file file:/app/warehouse/concat_incompat/00_0 [length: 0]
> Exception in thread "main" java.lang.IndexOutOfBoundsException
>   at java.nio.Buffer.checkIndex(Buffer.java:540)
>   at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
>   at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:510)
>   at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:361)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:83)
>   at 
> org.apache.hadoop.hive.ql.io.orc.FileDump.getReader(FileDump.java:239)
>   at 
> org.apache.hadoop.hive.ql.io.orc.FileDump.printMetaDataImpl(FileDump.java:312)
>   at 
> org.apache.hadoop.hive.ql.io.orc.FileDump.printMetaData(FileDump.java:291)
>   at org.apache.hadoop.hive.ql.io.orc.FileDump.main(FileDump.java:138)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-2271) InConsistency between Infomation Document and Actual Behavior for Built In Aggregate Functions(UDAF) Return Type.

2016-10-06 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S resolved HIVE-2271.
-
Resolution: Cannot Reproduce

Closing as Can't Reproduce since  several releases has been given after this 
bug. And interestingly no one had looked it!!

> InConsistency between Infomation Document and Actual Behavior for Built In 
> Aggregate Functions(UDAF) Return Type. 
> --
>
> Key: HIVE-2271
> URL: https://issues.apache.org/jira/browse/HIVE-2271
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation, Query Processor
>Affects Versions: 0.5.0, 0.6.0, 0.7.0, 0.7.1
> Environment: SuSE-Linux-11
>Reporter: Rohith Sharma K S
>
> I followed the Information Document for executing UDAF like 
> MIN(),MAX(),SUM(),AVG() and COUNT() below link.
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-BuiltinAggregateFunctions%28UDAF%29]
>  
> Observed mismatch for RETURN TYPE mentioned in Information Document and 
> Actual Behavior.
> 1.) Return Type for min(colName),max(colName) is mentioned as DOUBLE,but 
> while retrieving it is returning colName DataType passed to Min()/Max().
> Ex:
> 1).create table test(a int,b smallint,c string);
> 2).select min(a) from test;
> 3).ResultSet.getMetaData ().getColumnTypeName ( 1 )
>   Output : int
>   Expected : double(According to InformationDoc)
> 2.) Return Type for sum(colName) is mentioned as DOUBLE,but while retrieving 
> it is always returning as BIGINT
> Ex:
> 1).create table test(a int,b smallint,c string);
> 2).select sum(a) from test;
> 3).ResultSet.getMetaData ().getColumnTypeName ( 1 )
>   Output : BIGINT
>   Expected : double(According to InformationDoc)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)