[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-07-25 Thread Saket Saurabh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Attachment: HIVE-14233.03.patch

Fix vectorized row batch initial schema to be based on that of base reader 
schema so that projected columns are accounted for.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14323) Reduce number of FS permissions and redundant FS operations

2016-07-25 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393192#comment-15393192
 ] 

Rui Li commented on HIVE-14323:
---

It's a little different from the FS specification [~cnauroth] mentioned, but 
the JavaDoc of FileSystem doesn't specify whether to return false or throw 
exception if the file to be deleted doesn't exist. Not sure if all 
implementations will comply with the specification.
{code}
  /** Delete a file.
   *
   * @param f the path to delete.
   * @param recursive if path is a directory and set to 
   * true, the directory is deleted else throws an exception. In
   * case of a file the recursive can be set to either true or false. 
   * @return  true if delete is successful else false. 
   * @throws IOException
   */
  public abstract boolean delete(Path f, boolean recursive) throws IOException;
{code}

> Reduce number of FS permissions and redundant FS operations
> ---
>
> Key: HIVE-14323
> URL: https://issues.apache.org/jira/browse/HIVE-14323
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14323.1.patch
>
>
> Some examples are given below.
> 1. When creating stage directory, FileUtils sets the directory permissions by 
> running a set of chgrp and chmod commands. In systems like S3, this would not 
> be relevant.
> 2. In some cases, fs.delete() is followed by fs.exists(). In this case, it 
> might be redundant to check for exists() (lookup ops are expensive in systems 
> like S3). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393160#comment-15393160
 ] 

Hive QA commented on HIVE-14170:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12818937/HIVE-14170.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10356 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/642/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/642/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-642/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12818937 - PreCommit-HIVE-MASTER-Build

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch, 
> HIVE-14170.3.patch, HIVE-14170.4.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14296) Session count is not decremented when HS2 clients do not shutdown cleanly.

2016-07-25 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393132#comment-15393132
 ] 

Mohit Sabharwal commented on HIVE-14296:


+1 as well

> Session count is not decremented when HS2 clients do not shutdown cleanly.
> --
>
> Key: HIVE-14296
> URL: https://issues.apache.org/jira/browse/HIVE-14296
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-14296.2.patch, HIVE-14296.patch
>
>
> When a JDBC client like beeline abruptly disconnects from HS2, the session 
> gets closed on the serverside but the session count reported in the logs is 
> incorrect. It never gets decremented.
> For example, I created 6 connections from the same instance of beeline to HS2.
> {code}
> 2016-07-20T15:05:17,987  INFO [HiveServer2-Handler-Pool: Thread-40] 
> thrift.ThriftCLIService: Opened a session SessionHandle 
> [28b225ee-204f-4b3e-b4fd-0039ef8e276e], current sessions: 1
> .
> 2016-07-20T15:05:24,239  INFO [HiveServer2-Handler-Pool: Thread-45] 
> thrift.ThriftCLIService: Opened a session SessionHandle 
> [1d267de8-ff9a-4e76-ac5c-e82c871588e7], current sessions: 2
> .
> 2016-07-20T15:05:25,710  INFO [HiveServer2-Handler-Pool: Thread-50] 
> thrift.ThriftCLIService: Opened a session SessionHandle 
> [04d53deb-8965-464b-aa3f-7042304cfb54], current sessions: 3
> .
> 2016-07-20T15:05:26,795  INFO [HiveServer2-Handler-Pool: Thread-55] 
> thrift.ThriftCLIService: Opened a session SessionHandle 
> [b4bb8b86-74e1-4e3c-babb-674d34ad1caf], current sessions: 4
> 2016-07-20T15:05:28,160  INFO [HiveServer2-Handler-Pool: Thread-60] 
> thrift.ThriftCLIService: Opened a session SessionHandle 
> [6d3c3ed9-fadb-4673-8c15-3315b7e2995d], current sessions: 5
> .
> 2016-07-20T15:05:29,136  INFO [HiveServer2-Handler-Pool: Thread-65] 
> thrift.ThriftCLIService: Opened a session SessionHandle 
> [88b630c0-f272-427d-8263-febfef8d], current sessions: 6
> {code}
> When I CNTRL-C the beeline process, in the HS2 logs I see
> {code}
> 2016-07-20T15:11:37,858  INFO [HiveServer2-Handler-Pool: Thread-55] 
> thrift.ThriftCLIService: Session disconnected without closing properly. 
> 2016-07-20T15:11:37,858  INFO [HiveServer2-Handler-Pool: Thread-40] 
> thrift.ThriftCLIService: Session disconnected without closing properly. 
> 2016-07-20T15:11:37,858  INFO [HiveServer2-Handler-Pool: Thread-65] 
> thrift.ThriftCLIService: Session disconnected without closing properly. 
> 2016-07-20T15:11:37,858  INFO [HiveServer2-Handler-Pool: Thread-60] 
> thrift.ThriftCLIService: Session disconnected without closing properly. 
> 2016-07-20T15:11:37,859  INFO [HiveServer2-Handler-Pool: Thread-50] 
> thrift.ThriftCLIService: Session disconnected without closing properly. 
> 2016-07-20T15:11:37,859  INFO [HiveServer2-Handler-Pool: Thread-45] 
> thrift.ThriftCLIService: Session disconnected without closing properly. 
> 2016-07-20T15:11:37,859  INFO [HiveServer2-Handler-Pool: Thread-55] 
> thrift.ThriftCLIService: Closing the session: SessionHandle 
> [b4bb8b86-74e1-4e3c-babb-674d34ad1caf]
> 2016-07-20T15:11:37,859  INFO [HiveServer2-Handler-Pool: Thread-40] 
> thrift.ThriftCLIService: Closing the session: SessionHandle 
> [28b225ee-204f-4b3e-b4fd-0039ef8e276e]
> 2016-07-20T15:11:37,859  INFO [HiveServer2-Handler-Pool: Thread-65] 
> thrift.ThriftCLIService: Closing the session: SessionHandle 
> [88b630c0-f272-427d-8263-febfef8d]
> 2016-07-20T15:11:37,859  INFO [HiveServer2-Handler-Pool: Thread-60] 
> thrift.ThriftCLIService: Closing the session: SessionHandle 
> [6d3c3ed9-fadb-4673-8c15-3315b7e2995d]
> 2016-07-20T15:11:37,859  INFO [HiveServer2-Handler-Pool: Thread-45] 
> thrift.ThriftCLIService: Closing the session: SessionHandle 
> [1d267de8-ff9a-4e76-ac5c-e82c871588e7]
> 2016-07-20T15:11:37,859  INFO [HiveServer2-Handler-Pool: Thread-50] 
> thrift.ThriftCLIService: Closing the session: SessionHandle 
> [04d53deb-8965-464b-aa3f-7042304cfb54]
> {code}
> The next time I connect to HS2 via beeline, I see
> {code}
> 2016-07-20T15:14:33,679  INFO [HiveServer2-Handler-Pool: Thread-50] 
> thrift.ThriftCLIService: Client protocol version: HIVE_CLI_SERVICE_PROTOCOL_V8
> 2016-07-20T15:14:33,710  INFO [HiveServer2-Handler-Pool: Thread-50] 
> session.SessionState: Created HDFS directory: 
> /tmp/hive/hive/d47759e8-df3a-4504-9f28-99ff5247352c
> 2016-07-20T15:14:33,725  INFO [HiveServer2-Handler-Pool: Thread-50] 
> session.SessionState: Created local directory: 
> /var/folders/_3/0w477k4j5bjd6h967rw4vflwgp/T/ngangam/d47759e8-df3a-4504-9f28-99ff5247352c
> 2016-07-20T15:14:33,735  INFO [HiveServer2-Handler-Pool: Thread-50] 
> session.SessionState: Created HDFS directory: 
> 

[jira] [Commented] (HIVE-14281) Issue in decimal multiplication

2016-07-25 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393127#comment-15393127
 ] 

Chaoyu Tang commented on HIVE-14281:


Thanks [~sershe] for chiming in. I think the issue here might be sightly 
different from that in HIVE-13098. For the example in HIVE-13098, select 99 
as decimal(5,0) should ideally throw out an exception since 99 exceeds the 
decimal(5,0) data range. For the decimal multiplication, I think its result 
should be implicitly rounded up (or via a configuration?) as long as it is 
within the range supported by the specified decimal. Otherwise, the decimal use 
with multiplication will be dramatically limited because of the supported 
range. For the example in 
https://issues.apache.org/jira/browse/HIVE-14281?focusedCommentId=15384793=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15384793,
 the result data range is reduced to (-100, 100). Actually Hive cast also 
supports the round up in decimal, select cast(999.999 as decimal(5.0)) returns 
1000.

> Issue in decimal multiplication
> ---
>
> Key: HIVE-14281
> URL: https://issues.apache.org/jira/browse/HIVE-14281
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>
> {code}
> CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18));
> INSERT OVERWRITE TABLE test VALUES (20, 20);
> SELECT a*b from test
> {code}
> The returned result is NULL (instead of 400)
> It is because Hive adds the scales from operands and the type for a*b is set 
> to decimal (38, 36). Hive could not handle this case properly (e.g. by 
> rounding)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14204) Optimize loading dynamic partitions

2016-07-25 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-14204:

Status: Patch Available  (was: Open)

> Optimize loading dynamic partitions 
> 
>
> Key: HIVE-14204
> URL: https://issues.apache.org/jira/browse/HIVE-14204
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14204.1.patch, HIVE-14204.3.patch, 
> HIVE-14204.4.patch, HIVE-14204.6.patch
>
>
> Lots of time is spent in sequential fashion to load dynamic partitioned 
> dataset in driver side. E.g simple dynamic partitioned load as follows takes 
> 300+ seconds
> {noformat}
> INSERT INTO web_sales_test partition(ws_sold_date_sk) select * from 
> tpcds_bin_partitioned_orc_200.web_sales;
> Time taken to load dynamic partitions: 309.22 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14323) Reduce number of FS permissions and redundant FS operations

2016-07-25 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393106#comment-15393106
 ] 

Rajesh Balamohan commented on HIVE-14323:
-

Thanks [~cnauroth] for the review. Uploaded the revised patch in review board 
(https://reviews.apache.org/r/50434/diff/1#index_header).



> Reduce number of FS permissions and redundant FS operations
> ---
>
> Key: HIVE-14323
> URL: https://issues.apache.org/jira/browse/HIVE-14323
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14323.1.patch
>
>
> Some examples are given below.
> 1. When creating stage directory, FileUtils sets the directory permissions by 
> running a set of chgrp and chmod commands. In systems like S3, this would not 
> be relevant.
> 2. In some cases, fs.delete() is followed by fs.exists(). In this case, it 
> might be redundant to check for exists() (lookup ops are expensive in systems 
> like S3). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14318) Vectorization: LIKE should use matches() instead of find(0)

2016-07-25 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-14318:
---
Resolution: Invalid
Status: Resolved  (was: Patch Available)

The testing of this issue produced a different BUG, this issue is unlikely to 
be fixed because ComplexChecker is used in both LIKE & RLIKE exactly the same.

> Vectorization: LIKE should use matches() instead of find(0)
> ---
>
> Key: HIVE-14318
> URL: https://issues.apache.org/jira/browse/HIVE-14318
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 1.2.1, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-14318.1.patch
>
>
> Checking for a match instead of find() would allow matcher to exit early 
> instead of looking for sub-sequences beyond the first non-match.
> In UDFLike.java, the complex pattern checker uses matches() and the 
> vectorized version uses find(0), which is more expensive.
> {code}
> BenchmarkMode  CntScoreError  Units
> RegexBench.testGreedyRegexHitavgt5  379.316 ± 32.444  ns/op
> RegexBench.testGreedyRegexHitCheck   avgt5  344.895 ± 15.436  ns/op
> RegexBench.testGreedyRegexMiss   avgt5  497.193 ± 18.168  ns/op
> RegexBench.testGreedyRegexMissCheck  avgt5  171.872 ±  8.588  ns/op
> {code}
> The miss in match is nearly ~3x more expensive per-row with the .find(0) over 
> the .match() check version.
> The pattern match scenario is nearly the same.
> The lazy scenario makes it slower when there's a hit (because match runs the 
> check till end, but ~2x faster when there's a miss).
> {code}
> RegexBench.testLazyRegexHit  avgt5   78.398 ±  6.007  ns/op
> RegexBench.testLazyRegexHitCheck avgt5  120.557 ±  4.396  ns/op
> RegexBench.testLazyRegexMiss avgt5  387.594 ± 25.672  ns/op
> RegexBench.testLazyRegexMissCheckavgt5  154.489 ± 13.622  ns/op
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9482) Hive parquet timestamp compatibility

2016-07-25 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393035#comment-15393035
 ] 

Rui Li commented on HIVE-9482:
--

Hi [~szehon], is there a follow on task for the write path?

> Hive parquet timestamp compatibility
> 
>
> Key: HIVE-9482
> URL: https://issues.apache.org/jira/browse/HIVE-9482
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.15.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Fix For: 1.2.0
>
> Attachments: HIVE-9482.2.patch, HIVE-9482.patch, HIVE-9482.patch, 
> parquet_external_time.parq
>
>
> In current Hive implementation, timestamps are stored in UTC (converted from 
> current timezone), based on original parquet timestamp spec.
> However, we find this is not compatibility with other tools, and after some 
> investigation it is not the way of the other file formats, or even some 
> databases (Hive Timestamp is more equivalent of 'timestamp without timezone' 
> datatype).
> This is the first part of the fix, which will restore compatibility with 
> parquet-timestamp files generated by external tools by skipping conversion on 
> reading.
> Later fix will change the write path to not convert, and stop the 
> read-conversion even for files written by Hive itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12727) refactor Hive strict checks to be more granular, allow order by no limit and no partition filter by default for now

2016-07-25 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393004#comment-15393004
 ] 

Chao Sun commented on HIVE-12727:
-

OK, got it. So there are 3 cases:
# hive.mapred.mode is not set: use the values set for the 3 configurations
# hive.mapred.mode is set to 'strict': disable checks from the 3 
configurations, ignoring their values
# hive.mapred.mode is set to something other than 'strict': enable checks from 
the 3 configurations, ignoring their values.

Perhaps the documentation could be more explicit about this. Also, in 
[here|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.mapred.mode]
 the default value is still strict - should be changed to nonstrict.

[~sershe] do you know how can we unset the {{hive.mapred.mode}} so to use this 
feature?

> refactor Hive strict checks to be more granular, allow order by no limit and 
> no partition filter by default for now
> ---
>
> Key: HIVE-12727
> URL: https://issues.apache.org/jira/browse/HIVE-12727
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Blocker
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-12727.01.patch, HIVE-12727.02.patch, 
> HIVE-12727.03.patch, HIVE-12727.04.patch, HIVE-12727.05.patch, 
> HIVE-12727.06.patch, HIVE-12727.07.patch, HIVE-12727.patch
>
>
> Making strict mode the default recently appears to have broken many normal 
> queries, such as some TPCDS benchmark queries, e.g. Q85:
> Response message: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: SemanticException [Error 10041]: No partition 
> predicate found for Alias "web_sales" Table "web_returns"
> We should remove this restriction from strict mode, or change the default 
> back to non-strict. Perhaps make a 3-value parameter, nonstrict, semistrict, 
> and strict, for backward compat for people who are relying on strict already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14303) CommonJoinOperator.checkAndGenObject should return directly at CLOSE state to avoid NPE if ExecReducer.close is called twice.

2016-07-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392991#comment-15392991
 ] 

Hive QA commented on HIVE-14303:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12819986/HIVE-14303.000.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 188 failed/errored test(s), 10343 tests 
executed
*Failed tests:*
{noformat}
TestColumn - did not produce a TEST-*.xml file
TestHiveSQLException - did not produce a TEST-*.xml file
TestMsgBusConnection - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketizedhiveinputformat_auto
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_auto_join1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_filters
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_nullsafe
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_join_partition_key
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sort_merge_join_desc_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sort_merge_join_desc_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sort_merge_join_desc_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sort_merge_join_desc_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sort_merge_join_desc_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_bucketmapjoin1
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_5
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_4
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_cte_mat_5
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_hybridgrace_hashjoin_1
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_mrr
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_dynpart_hashjoin_1
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_join_hash
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_join_tests
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_joins_explain

[jira] [Comment Edited] (HIVE-12727) refactor Hive strict checks to be more granular, allow order by no limit and no partition filter by default for now

2016-07-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392966#comment-15392966
 ] 

Sergey Shelukhin edited comment on HIVE-12727 at 7/26/16 1:17 AM:
--

Yeah, that was the intent... maintaining backward compat for mapred mode. The 
new configs are used only when mapred.mode is not set (hence getting it with 
the null-default).


was (Author: sershe):
Yeah, that was the intent... maintaining backward compat for strict mode.

> refactor Hive strict checks to be more granular, allow order by no limit and 
> no partition filter by default for now
> ---
>
> Key: HIVE-12727
> URL: https://issues.apache.org/jira/browse/HIVE-12727
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Blocker
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-12727.01.patch, HIVE-12727.02.patch, 
> HIVE-12727.03.patch, HIVE-12727.04.patch, HIVE-12727.05.patch, 
> HIVE-12727.06.patch, HIVE-12727.07.patch, HIVE-12727.patch
>
>
> Making strict mode the default recently appears to have broken many normal 
> queries, such as some TPCDS benchmark queries, e.g. Q85:
> Response message: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: SemanticException [Error 10041]: No partition 
> predicate found for Alias "web_sales" Table "web_returns"
> We should remove this restriction from strict mode, or change the default 
> back to non-strict. Perhaps make a 3-value parameter, nonstrict, semistrict, 
> and strict, for backward compat for people who are relying on strict already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12727) refactor Hive strict checks to be more granular, allow order by no limit and no partition filter by default for now

2016-07-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392966#comment-15392966
 ] 

Sergey Shelukhin commented on HIVE-12727:
-

Yeah, that was the intent... maintaining backward compat for strict mode.

> refactor Hive strict checks to be more granular, allow order by no limit and 
> no partition filter by default for now
> ---
>
> Key: HIVE-12727
> URL: https://issues.apache.org/jira/browse/HIVE-12727
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Blocker
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-12727.01.patch, HIVE-12727.02.patch, 
> HIVE-12727.03.patch, HIVE-12727.04.patch, HIVE-12727.05.patch, 
> HIVE-12727.06.patch, HIVE-12727.07.patch, HIVE-12727.patch
>
>
> Making strict mode the default recently appears to have broken many normal 
> queries, such as some TPCDS benchmark queries, e.g. Q85:
> Response message: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: SemanticException [Error 10041]: No partition 
> predicate found for Alias "web_sales" Table "web_returns"
> We should remove this restriction from strict mode, or change the default 
> back to non-strict. Perhaps make a 3-value parameter, nonstrict, semistrict, 
> and strict, for backward compat for people who are relying on strict already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10022) Authorization checks for non existent file/directory should not be recursive

2016-07-25 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10022:

Attachment: HIVE-10022.9.patch

Updated .9.patch, Difference between .8.patch and .9.patch is over at : 
https://gist.github.com/khorgath/52530044d4e046dca2f27acaf9def443

> Authorization checks for non existent file/directory should not be recursive
> 
>
> Key: HIVE-10022
> URL: https://issues.apache.org/jira/browse/HIVE-10022
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 0.14.0
>Reporter: Pankit Thapar
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-10022.2.patch, HIVE-10022.3.patch, 
> HIVE-10022.4.patch, HIVE-10022.5.patch, HIVE-10022.6.patch, 
> HIVE-10022.7.patch, HIVE-10022.8.patch, HIVE-10022.9.patch, HIVE-10022.patch
>
>
> I am testing a query like : 
> set hive.test.authz.sstd.hs2.mode=true;
> set 
> hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
> set 
> hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateConfigUserAuthenticator;
> set hive.security.authorization.enabled=true;
> set user.name=user1;
> create table auth_noupd(i int) clustered by (i) into 2 buckets stored as orc 
> location '${OUTPUT}' TBLPROPERTIES ('transactional'='true');
> Now, in the above query,  since authorization is true, 
> we would end up calling doAuthorizationV2() which ultimately ends up calling 
> SQLAuthorizationUtils.getPrivilegesFromFS() which calls a recursive method : 
> FileUtils.isActionPermittedForFileHierarchy() with the object or the ancestor 
> of the object we are trying to authorize if the object does not exist. 
> The logic in FileUtils.isActionPermittedForFileHierarchy() is DFS.
> Now assume, we have a path as a/b/c/d that we are trying to authorize.
> In case, a/b/c/d does not exist, we would call 
> FileUtils.isActionPermittedForFileHierarchy() with say a/b/ assuming a/b/c 
> also does not exist.
> If under the subtree at a/b, we have millions of files, then 
> FileUtils.isActionPermittedForFileHierarchy()  is going to check file 
> permission on each of those objects. 
> I do not completely understand why do we have to check for file permissions 
> in all the objects in  branch of the tree that we are not  trying to read 
> from /write to.  
> We could have checked file permission on the ancestor that exists and if it 
> matches what we expect, the return true.
> Please confirm if this is a bug so that I can submit a patch else let me know 
> what I am missing ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14322) Postgres db issues after Datanucleus 4.x upgrade

2016-07-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392955#comment-15392955
 ] 

Sergey Shelukhin commented on HIVE-14322:
-

We can also set datanucleus.rdbms.initializeColumnInfo=NONE to address this, it 
appears (see the thread). I wonder if it has any other repercussions.

> Postgres db issues after Datanucleus 4.x upgrade
> 
>
> Key: HIVE-14322
> URL: https://issues.apache.org/jira/browse/HIVE-14322
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0, 2.0.1
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14322.1.patch
>
>
> With the upgrade to  datanucleus 4.x versions in HIVE-6113, hive does not 
> work properly with postgres.
> The nullable fields in the database have string "NULL::character varying" 
> instead of real NULL values. This causes various issues.
> One example is -
> {code}
> hive> create table t(i int);
> OK
> Time taken: 1.9 seconds
> hive> create view v as select * from t;
> OK
> Time taken: 0.542 seconds
> hive> select * from v;
> FAILED: SemanticException Unable to fetch table v. 
> java.net.URISyntaxException: Relative path in absolute URI: 
> NULL::character%20varying
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14331) Task should set exception for failed map reduce job.

2016-07-25 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392946#comment-15392946
 ] 

Jimmy Xiang commented on HIVE-14331:


+1

> Task should set exception for failed map reduce job.
> 
>
> Key: HIVE-14331
> URL: https://issues.apache.org/jira/browse/HIVE-14331
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 2.1.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-14331.000.patch
>
>
> Task should set exception for failed map reduce job. So the exception can be 
> seen in HookContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12727) refactor Hive strict checks to be more granular, allow order by no limit and no partition filter by default for now

2016-07-25 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392935#comment-15392935
 ] 

Xuefu Zhang commented on HIVE-12727:


My understanding is that hive.mapred.mode=strict is deprecated, being replaced 
with the three new configurations. To use the new configurations, 
hive.mapred.mode cannot be strict. If it's nonstrict, then the three new 
configuration are checked.

As to null value, I'm not sure if it can be ever null. Nevertheless, the above 
logic should stand.

> refactor Hive strict checks to be more granular, allow order by no limit and 
> no partition filter by default for now
> ---
>
> Key: HIVE-12727
> URL: https://issues.apache.org/jira/browse/HIVE-12727
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Blocker
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-12727.01.patch, HIVE-12727.02.patch, 
> HIVE-12727.03.patch, HIVE-12727.04.patch, HIVE-12727.05.patch, 
> HIVE-12727.06.patch, HIVE-12727.07.patch, HIVE-12727.patch
>
>
> Making strict mode the default recently appears to have broken many normal 
> queries, such as some TPCDS benchmark queries, e.g. Q85:
> Response message: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: SemanticException [Error 10041]: No partition 
> predicate found for Alias "web_sales" Table "web_returns"
> We should remove this restriction from strict mode, or change the default 
> back to non-strict. Perhaps make a 3-value parameter, nonstrict, semistrict, 
> and strict, for backward compat for people who are relying on strict already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14204) Optimize loading dynamic partitions

2016-07-25 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392933#comment-15392933
 ] 

Rajesh Balamohan edited comment on HIVE-14204 at 7/26/16 12:46 AM:
---

Thanks [~ashutoshc]. Will upload the latest patch from review board here for 
jenkins to pick up.


was (Author: rajesh.balamohan):
Thanks [~ashutoshc]. Will upload the latest patch here for jenkins to pick up.

> Optimize loading dynamic partitions 
> 
>
> Key: HIVE-14204
> URL: https://issues.apache.org/jira/browse/HIVE-14204
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14204.1.patch, HIVE-14204.3.patch, 
> HIVE-14204.4.patch, HIVE-14204.6.patch
>
>
> Lots of time is spent in sequential fashion to load dynamic partitioned 
> dataset in driver side. E.g simple dynamic partitioned load as follows takes 
> 300+ seconds
> {noformat}
> INSERT INTO web_sales_test partition(ws_sold_date_sk) select * from 
> tpcds_bin_partitioned_orc_200.web_sales;
> Time taken to load dynamic partitions: 309.22 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14204) Optimize loading dynamic partitions

2016-07-25 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-14204:

Attachment: HIVE-14204.6.patch

> Optimize loading dynamic partitions 
> 
>
> Key: HIVE-14204
> URL: https://issues.apache.org/jira/browse/HIVE-14204
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14204.1.patch, HIVE-14204.3.patch, 
> HIVE-14204.4.patch, HIVE-14204.6.patch
>
>
> Lots of time is spent in sequential fashion to load dynamic partitioned 
> dataset in driver side. E.g simple dynamic partitioned load as follows takes 
> 300+ seconds
> {noformat}
> INSERT INTO web_sales_test partition(ws_sold_date_sk) select * from 
> tpcds_bin_partitioned_orc_200.web_sales;
> Time taken to load dynamic partitions: 309.22 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14204) Optimize loading dynamic partitions

2016-07-25 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-14204:

Status: Open  (was: Patch Available)

Thanks [~ashutoshc]. Will upload the latest patch here for jenkins to pick up.

> Optimize loading dynamic partitions 
> 
>
> Key: HIVE-14204
> URL: https://issues.apache.org/jira/browse/HIVE-14204
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14204.1.patch, HIVE-14204.3.patch, 
> HIVE-14204.4.patch
>
>
> Lots of time is spent in sequential fashion to load dynamic partitioned 
> dataset in driver side. E.g simple dynamic partitioned load as follows takes 
> 300+ seconds
> {noformat}
> INSERT INTO web_sales_test partition(ws_sold_date_sk) select * from 
> tpcds_bin_partitioned_orc_200.web_sales;
> Time taken to load dynamic partitions: 309.22 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12727) refactor Hive strict checks to be more granular, allow order by no limit and no partition filter by default for now

2016-07-25 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392921#comment-15392921
 ] 

Chao Sun commented on HIVE-12727:
-

I'm actually a little confused about this. From the code:
{code}
private static String makeMessage(String what, ConfVars setting) {
  return what + " are disabled for safety reasons. If you know what you are 
doing, please make"
  + " sure that " + setting.varname + " is set to false and that "
  + ConfVars.HIVEMAPREDMODE.varname + " is not set to 'strict' to 
enable them.";
}
{code}
it seems like if {{hive.mapred.mode}} is NOT set to 'strict', then these 3 new 
configurations are activated.

However, in another piece of code:
{code}
private static boolean isAllowed(Configuration conf, ConfVars setting) {
  String mode = HiveConf.getVar(conf, ConfVars.HIVEMAPREDMODE, null);
  return (mode != null) ? !"strict".equals(mode) : 
!HiveConf.getBoolVar(conf, setting);
}
{code}
it seems like as long as {{hive.mapred.mode}} is not null AND not 'strict', 
then the above 3 configurations are disabled, i.e., their values are ignored.
Is this intentional?
(BTW, how can I set the {{hive.mapred.mode}} to null? it seems Hive just load 
the default value which is 'nonstrict')

> refactor Hive strict checks to be more granular, allow order by no limit and 
> no partition filter by default for now
> ---
>
> Key: HIVE-12727
> URL: https://issues.apache.org/jira/browse/HIVE-12727
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Blocker
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-12727.01.patch, HIVE-12727.02.patch, 
> HIVE-12727.03.patch, HIVE-12727.04.patch, HIVE-12727.05.patch, 
> HIVE-12727.06.patch, HIVE-12727.07.patch, HIVE-12727.patch
>
>
> Making strict mode the default recently appears to have broken many normal 
> queries, such as some TPCDS benchmark queries, e.g. Q85:
> Response message: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: SemanticException [Error 10041]: No partition 
> predicate found for Alias "web_sales" Table "web_returns"
> We should remove this restriction from strict mode, or change the default 
> back to non-strict. Perhaps make a 3-value parameter, nonstrict, semistrict, 
> and strict, for backward compat for people who are relying on strict already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14332) Reduce logging from VectorMapOperator

2016-07-25 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392916#comment-15392916
 ] 

Matt McCline commented on HIVE-14332:
-

[~sseth] Can you give this a quick +1.  Thanks

> Reduce logging from VectorMapOperator
> -
>
> Key: HIVE-14332
> URL: https://issues.apache.org/jira/browse/HIVE-14332
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14332.01.patch
>
>
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator: VectorMapOperator 
> path: 
> hdfs://cn108-10.l42scl.hortonworks.com:8020/apps/hive/warehouse/tpcds_bin_partitioned_orc_200.db/store_sales/ss_sold_date_sk=2451710,
>  read type VECTORIZED_INPUT_FILE_FORMAT, vector deserialize type NONE, 
> aliases store_sales
> Lines like this repeat all over the log. This gets really big with a large 
> number of partitions. 6MB of logs per node for a 30 task query running for 20 
> seconds on a 3 node cluster.
> Instead of logging this line - can we have a consolidated log / logging only 
> if something abnormal happens ... or a shorter log message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14332) Reduce logging from VectorMapOperator

2016-07-25 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14332:

Attachment: HIVE-14332.01.patch

> Reduce logging from VectorMapOperator
> -
>
> Key: HIVE-14332
> URL: https://issues.apache.org/jira/browse/HIVE-14332
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14332.01.patch
>
>
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator: VectorMapOperator 
> path: 
> hdfs://cn108-10.l42scl.hortonworks.com:8020/apps/hive/warehouse/tpcds_bin_partitioned_orc_200.db/store_sales/ss_sold_date_sk=2451710,
>  read type VECTORIZED_INPUT_FILE_FORMAT, vector deserialize type NONE, 
> aliases store_sales
> Lines like this repeat all over the log. This gets really big with a large 
> number of partitions. 6MB of logs per node for a 30 task query running for 20 
> seconds on a 3 node cluster.
> Instead of logging this line - can we have a consolidated log / logging only 
> if something abnormal happens ... or a shorter log message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14332) Reduce logging from VectorMapOperator

2016-07-25 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14332:

Status: Patch Available  (was: Open)

> Reduce logging from VectorMapOperator
> -
>
> Key: HIVE-14332
> URL: https://issues.apache.org/jira/browse/HIVE-14332
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14332.01.patch
>
>
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator: VectorMapOperator 
> path: 
> hdfs://cn108-10.l42scl.hortonworks.com:8020/apps/hive/warehouse/tpcds_bin_partitioned_orc_200.db/store_sales/ss_sold_date_sk=2451710,
>  read type VECTORIZED_INPUT_FILE_FORMAT, vector deserialize type NONE, 
> aliases store_sales
> Lines like this repeat all over the log. This gets really big with a large 
> number of partitions. 6MB of logs per node for a 30 task query running for 20 
> seconds on a 3 node cluster.
> Instead of logging this line - can we have a consolidated log / logging only 
> if something abnormal happens ... or a shorter log message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching

2016-07-25 Thread Shannon Ladymon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392890#comment-15392890
 ] 

Shannon Ladymon commented on HIVE-7926:
---

[~asears], thank you for creating the LLAP page and adding in an overview 
description of it.  I've gone ahead and copied the rest of the design document 
into it as well as made some edits to the overview:
* [ LLAP | https://cwiki.apache.org/confluence/display/Hive/LLAP]

The page will need to be revised since it still reads like a design document 
and is missing a few sections, but it's a start.

[~sershe] and Andrew, feel free to make any modifications you'd like to what I 
added today to fix/clarify the information.

> long-lived daemons for query fragment execution, I/O and caching
> 
>
> Key: HIVE-7926
> URL: https://issues.apache.org/jira/browse/HIVE-7926
> Project: Hive
>  Issue Type: New Feature
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: LLAPdesigndocument.pdf
>
>
> We are proposing a new execution model for Hive that is a combination of 
> existing process-based tasks and long-lived daemons running on worker nodes. 
> These nodes can take care of efficient I/O, caching and query fragment 
> execution, while heavy lifting like most joins, ordering, etc. can be handled 
> by tasks.
> The proposed model is not a 2-system solution for small and large queries; 
> neither it is a separate execution engine like MR or Tez. It can be used by 
> any Hive execution engine, if support is added; in future even external 
> products (e.g. Pig) can use it.
> The document with high-level design we are proposing will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching

2016-07-25 Thread Shannon Ladymon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392890#comment-15392890
 ] 

Shannon Ladymon edited comment on HIVE-7926 at 7/26/16 12:05 AM:
-

[~asears], thank you for creating the LLAP page and adding in an overview 
description of it.  I've gone ahead and copied the rest of the design document 
into it as well as made some edits to the overview:
* [LLAP | https://cwiki.apache.org/confluence/display/Hive/LLAP]

The page will need to be revised since it still reads like a design document 
and is missing a few sections, but it's a start.

[~sershe] and Andrew, feel free to make any modifications you'd like to what I 
added today to fix/clarify the information.


was (Author: sladymon):
[~asears], thank you for creating the LLAP page and adding in an overview 
description of it.  I've gone ahead and copied the rest of the design document 
into it as well as made some edits to the overview:
* [ LLAP | https://cwiki.apache.org/confluence/display/Hive/LLAP]

The page will need to be revised since it still reads like a design document 
and is missing a few sections, but it's a start.

[~sershe] and Andrew, feel free to make any modifications you'd like to what I 
added today to fix/clarify the information.

> long-lived daemons for query fragment execution, I/O and caching
> 
>
> Key: HIVE-7926
> URL: https://issues.apache.org/jira/browse/HIVE-7926
> Project: Hive
>  Issue Type: New Feature
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: LLAPdesigndocument.pdf
>
>
> We are proposing a new execution model for Hive that is a combination of 
> existing process-based tasks and long-lived daemons running on worker nodes. 
> These nodes can take care of efficient I/O, caching and query fragment 
> execution, while heavy lifting like most joins, ordering, etc. can be handled 
> by tasks.
> The proposed model is not a 2-system solution for small and large queries; 
> neither it is a separate execution engine like MR or Tez. It can be used by 
> any Hive execution engine, if support is added; in future even external 
> products (e.g. Pig) can use it.
> The document with high-level design we are proposing will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14316) TestLlapTokenChecker.testCheckPermissions, testGetToken fail

2016-07-25 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392881#comment-15392881
 ] 

Siddharth Seth commented on HIVE-14316:
---

+1 pending precommit.

> TestLlapTokenChecker.testCheckPermissions, testGetToken fail
> 
>
> Key: HIVE-14316
> URL: https://issues.apache.org/jira/browse/HIVE-14316
> Project: Hive
>  Issue Type: Test
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14316.patch
>
>
> cc [~sershe]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10022) Authorization checks for non existent file/directory should not be recursive

2016-07-25 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392878#comment-15392878
 ] 

Thejas M Nair commented on HIVE-10022:
--

+1 changes look good. 
There seems to be some spaces missing at couple of places, after "," in some 
places, before "==" in (fileStatus== null) . Can you please take care of that 
before commit ?



> Authorization checks for non existent file/directory should not be recursive
> 
>
> Key: HIVE-10022
> URL: https://issues.apache.org/jira/browse/HIVE-10022
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 0.14.0
>Reporter: Pankit Thapar
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-10022.2.patch, HIVE-10022.3.patch, 
> HIVE-10022.4.patch, HIVE-10022.5.patch, HIVE-10022.6.patch, 
> HIVE-10022.7.patch, HIVE-10022.8.patch, HIVE-10022.patch
>
>
> I am testing a query like : 
> set hive.test.authz.sstd.hs2.mode=true;
> set 
> hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
> set 
> hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateConfigUserAuthenticator;
> set hive.security.authorization.enabled=true;
> set user.name=user1;
> create table auth_noupd(i int) clustered by (i) into 2 buckets stored as orc 
> location '${OUTPUT}' TBLPROPERTIES ('transactional'='true');
> Now, in the above query,  since authorization is true, 
> we would end up calling doAuthorizationV2() which ultimately ends up calling 
> SQLAuthorizationUtils.getPrivilegesFromFS() which calls a recursive method : 
> FileUtils.isActionPermittedForFileHierarchy() with the object or the ancestor 
> of the object we are trying to authorize if the object does not exist. 
> The logic in FileUtils.isActionPermittedForFileHierarchy() is DFS.
> Now assume, we have a path as a/b/c/d that we are trying to authorize.
> In case, a/b/c/d does not exist, we would call 
> FileUtils.isActionPermittedForFileHierarchy() with say a/b/ assuming a/b/c 
> also does not exist.
> If under the subtree at a/b, we have millions of files, then 
> FileUtils.isActionPermittedForFileHierarchy()  is going to check file 
> permission on each of those objects. 
> I do not completely understand why do we have to check for file permissions 
> in all the objects in  branch of the tree that we are not  trying to read 
> from /write to.  
> We could have checked file permission on the ancestor that exists and if it 
> matches what we expect, the return true.
> Please confirm if this is a bug so that I can submit a patch else let me know 
> what I am missing ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14303) CommonJoinOperator.checkAndGenObject should return directly at CLOSE state to avoid NPE if ExecReducer.close is called twice.

2016-07-25 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392870#comment-15392870
 ] 

Xuefu Zhang commented on HIVE-14303:


Is it more logical if we check if reducer is null at line # 459?

> CommonJoinOperator.checkAndGenObject should return directly at CLOSE state to 
> avoid NPE if ExecReducer.close is called twice.
> -
>
> Key: HIVE-14303
> URL: https://issues.apache.org/jira/browse/HIVE-14303
> Project: Hive
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.1.0
>
> Attachments: HIVE-14303.0.patch
>
>
> CommonJoinOperator.checkAndGenObject should return directly at CLOSE state to 
> avoid NPE if ExecReducer.close is called twice. ExecReducer.close implements 
> Closeable interface and ExecReducer.close can be called multiple time. We saw 
> the following NPE which hide the real exception due to this bug.
> {code}
> Error: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators: null
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296)
> at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
> at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718)
> at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284)
> ... 8 more
> {code}
> The code from ReduceTask.runOldReducer:
> {code}
>   reducer.close(); //line 453
>   reducer = null;
>   
>   out.close(reporter);
>   out = null;
> } finally {
>   IOUtils.cleanup(LOG, reducer);// line 459
>   closeQuietly(out, reporter);
> }
> {code}
> Based on the above stack trace and code, reducer.close() is called twice 
> because the exception happened when reducer.close() is called for the first 
> time at line 453, the code exit before reducer was set to null. 
> NullPointerException is triggered when reducer.close() is called for the 
> second time in IOUtils.cleanup at line 459. NullPointerException hide the 
> real exception which happened when reducer.close() is called for the first 
> time at line 453.
> The reason for NPE is:
> The first reducer.close called CommonJoinOperator.closeOp which clear 
> {{storage}}
> {code}
> Arrays.fill(storage, null);
> {code}
> the second reduce.close generated NPE due to null {{storage[alias]}} which is 
> set to null by first reducer.close.
> The following reducer log can give more proof:
> {code}
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.SelectOperator: 2 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[4]: records written - 
> 53466
> 2016-07-14 22:25:11,555 ERROR [main] ExecReducer: Hit error while closing 
> operators - failing tree
> 2016-07-14 22:25:11,649 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.RuntimeException: Hive Runtime Error 
> while closing operators: null
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296)
>   at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at 

[jira] [Commented] (HIVE-14322) Postgres db issues after Datanucleus 4.x upgrade

2016-07-25 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392865#comment-15392865
 ] 

Thejas M Nair commented on HIVE-14322:
--

[~sershe] Sorry, I should have been more clear.
I didn't mean actually auto-creating the schema made the difference. What I 
meant was using a schema created with schematool, but running metastore with 
the -hiveconf datanucleus.schema.autoCreateColumns=true option made a 
difference.
 

> Postgres db issues after Datanucleus 4.x upgrade
> 
>
> Key: HIVE-14322
> URL: https://issues.apache.org/jira/browse/HIVE-14322
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0, 2.0.1
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14322.1.patch
>
>
> With the upgrade to  datanucleus 4.x versions in HIVE-6113, hive does not 
> work properly with postgres.
> The nullable fields in the database have string "NULL::character varying" 
> instead of real NULL values. This causes various issues.
> One example is -
> {code}
> hive> create table t(i int);
> OK
> Time taken: 1.9 seconds
> hive> create view v as select * from t;
> OK
> Time taken: 0.542 seconds
> hive> select * from v;
> FAILED: SemanticException Unable to fetch table v. 
> java.net.URISyntaxException: Relative path in absolute URI: 
> NULL::character%20varying
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14322) Postgres db issues after Datanucleus 4.x upgrade

2016-07-25 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14322:
-
Status: Patch Available  (was: Open)

> Postgres db issues after Datanucleus 4.x upgrade
> 
>
> Key: HIVE-14322
> URL: https://issues.apache.org/jira/browse/HIVE-14322
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.1, 2.1.0, 2.0.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14322.1.patch
>
>
> With the upgrade to  datanucleus 4.x versions in HIVE-6113, hive does not 
> work properly with postgres.
> The nullable fields in the database have string "NULL::character varying" 
> instead of real NULL values. This causes various issues.
> One example is -
> {code}
> hive> create table t(i int);
> OK
> Time taken: 1.9 seconds
> hive> create view v as select * from t;
> OK
> Time taken: 0.542 seconds
> hive> select * from v;
> FAILED: SemanticException Unable to fetch table v. 
> java.net.URISyntaxException: Relative path in absolute URI: 
> NULL::character%20varying
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14322) Postgres db issues after Datanucleus 4.x upgrade

2016-07-25 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14322:
-
Attachment: HIVE-14322.1.patch

Patch for rolling back the upgrade of DN.

> Postgres db issues after Datanucleus 4.x upgrade
> 
>
> Key: HIVE-14322
> URL: https://issues.apache.org/jira/browse/HIVE-14322
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0, 2.0.1
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-14322.1.patch
>
>
> With the upgrade to  datanucleus 4.x versions in HIVE-6113, hive does not 
> work properly with postgres.
> The nullable fields in the database have string "NULL::character varying" 
> instead of real NULL values. This causes various issues.
> One example is -
> {code}
> hive> create table t(i int);
> OK
> Time taken: 1.9 seconds
> hive> create view v as select * from t;
> OK
> Time taken: 0.542 seconds
> hive> select * from v;
> FAILED: SemanticException Unable to fetch table v. 
> java.net.URISyntaxException: Relative path in absolute URI: 
> NULL::character%20varying
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14293) PerfLogger.openScopes should be transient

2016-07-25 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-14293:
--
Attachment: HIVE-14293.4.patch

For retest.

> PerfLogger.openScopes should be transient
> -
>
> Key: HIVE-14293
> URL: https://issues.apache.org/jira/browse/HIVE-14293
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-14293.1.patch, HIVE-14293.2.patch, 
> HIVE-14293.3.patch, HIVE-14293.4.patch
>
>
> See the following exception when running Hive e2e tests:
> {code}
> 0: jdbc:hive2://nat-r6-ojss-hsihs2-1.openstac> SELECT s.name, s2.age, s.gpa, 
> v.registration, v2.contributions FROM student s INNER JOIN voter v ON (s.name 
> = v.name) INNER JOIN student s2 ON (s2.age = v.age and s.name = s2.name) 
> INNER JOIN voter v2 ON (v2.name = s2.name and v2.age = s2.age) WHERE v2.age = 
> s.age ORDER BY s.name, s2.age, s.gpa, v.registration, v2.contributions;
> INFO  : Compiling 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): 
> SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s 
> INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = 
> v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and 
> v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, 
> v.registration, v2.contributions
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:s.name, 
> type:string, comment:null), FieldSchema(name:s2.age, type:int, comment:null), 
> FieldSchema(name:s.gpa, type:double, comment:null), 
> FieldSchema(name:v.registration, type:string, comment:null), 
> FieldSchema(name:v2.contributions, type:float, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8); 
> Time taken: 1.165 seconds
> INFO  : Executing 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): 
> SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s 
> INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = 
> v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and 
> v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, 
> v.registration, v2.contributions
> INFO  : Query ID = hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8
> INFO  : Total jobs = 1
> INFO  : Launching Job 1 out of 1
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Session is already open
> INFO  : Dag name: SELECT s.name, s2.age, sv2.contributions(Stage-1)
> ERROR : Failed to execute tez graph.
> java.lang.RuntimeException: Error caching map.xml: 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.util.ConcurrentModificationException
> Serialization trace:
> classes (sun.misc.Launcher$AppClassLoader)
> classloader (java.security.ProtectionDomain)
> context (java.security.AccessControlContext)
> acc (org.apache.hadoop.hive.ql.exec.UDFClassLoader)
> classLoader (org.apache.hadoop.hive.conf.HiveConf)
> conf (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics)
> metrics 
> (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics$CodahaleMetricsScope)
> openScopes (org.apache.hadoop.hive.ql.log.PerfLogger)
> perfLogger (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:582) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:516) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:601) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:1147) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.build(TezTask.java:390) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:164) 
> [hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) 
> 

[jira] [Updated] (HIVE-14303) CommonJoinOperator.checkAndGenObject should return directly at CLOSE state to avoid NPE if ExecReducer.close is called twice.

2016-07-25 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HIVE-14303:
-
Attachment: HIVE-14303.0.patch

> CommonJoinOperator.checkAndGenObject should return directly at CLOSE state to 
> avoid NPE if ExecReducer.close is called twice.
> -
>
> Key: HIVE-14303
> URL: https://issues.apache.org/jira/browse/HIVE-14303
> Project: Hive
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.1.0
>
> Attachments: HIVE-14303.0.patch
>
>
> CommonJoinOperator.checkAndGenObject should return directly at CLOSE state to 
> avoid NPE if ExecReducer.close is called twice. ExecReducer.close implements 
> Closeable interface and ExecReducer.close can be called multiple time. We saw 
> the following NPE which hide the real exception due to this bug.
> {code}
> Error: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators: null
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296)
> at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
> at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718)
> at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284)
> ... 8 more
> {code}
> The code from ReduceTask.runOldReducer:
> {code}
>   reducer.close(); //line 453
>   reducer = null;
>   
>   out.close(reporter);
>   out = null;
> } finally {
>   IOUtils.cleanup(LOG, reducer);// line 459
>   closeQuietly(out, reporter);
> }
> {code}
> Based on the above stack trace and code, reducer.close() is called twice 
> because the exception happened when reducer.close() is called for the first 
> time at line 453, the code exit before reducer was set to null. 
> NullPointerException is triggered when reducer.close() is called for the 
> second time in IOUtils.cleanup at line 459. NullPointerException hide the 
> real exception which happened when reducer.close() is called for the first 
> time at line 453.
> The reason for NPE is:
> The first reducer.close called CommonJoinOperator.closeOp which clear 
> {{storage}}
> {code}
> Arrays.fill(storage, null);
> {code}
> the second reduce.close generated NPE due to null {{storage[alias]}} which is 
> set to null by first reducer.close.
> The following reducer log can give more proof:
> {code}
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.SelectOperator: 2 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[4]: records written - 
> 53466
> 2016-07-14 22:25:11,555 ERROR [main] ExecReducer: Hit error while closing 
> operators - failing tree
> 2016-07-14 22:25:11,649 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.RuntimeException: Hive Runtime Error 
> while closing operators: null
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296)
>   at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at 

[jira] [Updated] (HIVE-14326) Merging outer joins without conditions can lead to wrong results

2016-07-25 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14326:
---
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master, branch-2.1. Thanks [~ashutoshc] for the review!

> Merging outer joins without conditions can lead to wrong results
> 
>
> Key: HIVE-14326
> URL: https://issues.apache.org/jira/browse/HIVE-14326
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14326.patch
>
>
> HIVE-13069 enabled cartesian product merging. However, merge should only be 
> performed between INNER joins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14303) CommonJoinOperator.checkAndGenObject should return directly at CLOSE state to avoid NPE if ExecReducer.close is called twice.

2016-07-25 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HIVE-14303:
-
Attachment: (was: HIVE-14303.000.patch)

> CommonJoinOperator.checkAndGenObject should return directly at CLOSE state to 
> avoid NPE if ExecReducer.close is called twice.
> -
>
> Key: HIVE-14303
> URL: https://issues.apache.org/jira/browse/HIVE-14303
> Project: Hive
>  Issue Type: Bug
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.1.0
>
>
> CommonJoinOperator.checkAndGenObject should return directly at CLOSE state to 
> avoid NPE if ExecReducer.close is called twice. ExecReducer.close implements 
> Closeable interface and ExecReducer.close can be called multiple time. We saw 
> the following NPE which hide the real exception due to this bug.
> {code}
> Error: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators: null
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296)
> at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
> at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718)
> at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284)
> ... 8 more
> {code}
> The code from ReduceTask.runOldReducer:
> {code}
>   reducer.close(); //line 453
>   reducer = null;
>   
>   out.close(reporter);
>   out = null;
> } finally {
>   IOUtils.cleanup(LOG, reducer);// line 459
>   closeQuietly(out, reporter);
> }
> {code}
> Based on the above stack trace and code, reducer.close() is called twice 
> because the exception happened when reducer.close() is called for the first 
> time at line 453, the code exit before reducer was set to null. 
> NullPointerException is triggered when reducer.close() is called for the 
> second time in IOUtils.cleanup at line 459. NullPointerException hide the 
> real exception which happened when reducer.close() is called for the first 
> time at line 453.
> The reason for NPE is:
> The first reducer.close called CommonJoinOperator.closeOp which clear 
> {{storage}}
> {code}
> Arrays.fill(storage, null);
> {code}
> the second reduce.close generated NPE due to null {{storage[alias]}} which is 
> set to null by first reducer.close.
> The following reducer log can give more proof:
> {code}
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.SelectOperator: 2 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[4]: records written - 
> 53466
> 2016-07-14 22:25:11,555 ERROR [main] ExecReducer: Hit error while closing 
> operators - failing tree
> 2016-07-14 22:25:11,649 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.RuntimeException: Hive Runtime Error 
> while closing operators: null
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296)
>   at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 

[jira] [Commented] (HIVE-13604) Do not log AlreadyExistsException when "IF NOT EXISTS" is used.

2016-07-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392822#comment-15392822
 ] 

Hive QA commented on HIVE-13604:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12819930/HIVE-13604.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 306 failed/errored test(s), 10337 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
TestSessionCleanup - did not produce a TEST-*.xml file
TestSessionManagerMetrics - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_exist
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_change_db_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_char1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_db_owner
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_rename_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_skewed_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_table_invalidate_column_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_varchar1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_view_as_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_view_rename
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_multi
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_owner_actions_db
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_insert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_func1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_or_replace_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_with_constraints
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas_uses_database_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cteViews
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cte_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cte_mat_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_database
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_database_drop
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_database_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_database_properties
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dbtxnmgr_compact3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dbtxnmgr_ddl1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dbtxnmgr_query5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_database
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_database_json
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_syntax
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_display_colstats_tbllvl
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_drop_database_removes_partition_dirs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_drop_multi_partitions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exchange_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_00_nonpart_empty
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_01_nonpart
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_02_00_part_empty
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_02_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_03_nonpart_over_compat
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_04_all_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_04_evolved_parts
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_05_some_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_06_one_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_07_all_part_over_nonoverlap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_08_nonpart_rename
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_09_part_spec_nonoverlap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_10_external_managed
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_11_managed_external
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_12_external_location

[jira] [Updated] (HIVE-12878) Support Vectorization for TEXTFILE and other formats

2016-07-25 Thread Shannon Ladymon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shannon Ladymon updated HIVE-12878:
---
Labels:   (was: TODOC2.1)

> Support Vectorization for TEXTFILE and other formats
> 
>
> Key: HIVE-12878
> URL: https://issues.apache.org/jira/browse/HIVE-12878
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-12878.01.patch, HIVE-12878.02.patch, 
> HIVE-12878.03.patch, HIVE-12878.04.patch, HIVE-12878.05.patch, 
> HIVE-12878.06.patch, HIVE-12878.07.patch, HIVE-12878.08.patch, 
> HIVE-12878.09.patch, HIVE-12878.091.patch, HIVE-12878.092.patch, 
> HIVE-12878.093.patch
>
>
> Support vectorizing when the input format is TEXTFILE and other formats for 
> better Map Vertex performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12878) Support Vectorization for TEXTFILE and other formats

2016-07-25 Thread Shannon Ladymon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392820#comment-15392820
 ] 

Shannon Ladymon commented on HIVE-12878:


Doc done.  The new properties (*hive.vectorized.use.vectorized.input.format, 
hive.vectorized.use.vector.serde.deserialize, and 
hive.vectorized.use.row.serde.deserialize*) have been documented as follows:
* [Configuration Properties - hive.vectorized.use.vectorized.input.format | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.vectorized.use.vectorized.input.format]
* [Configuration Properties - hive.vectorized.use.vector.serde.deserialize | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.vectorized.use.vector.serde.deserialize]
* [Configuration Properties - hive.vectorized.use.row.serde.deserialize | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.vectorized.use.row.serde.deserialize]

The TODOC 2.1 label has also been removed.

> Support Vectorization for TEXTFILE and other formats
> 
>
> Key: HIVE-12878
> URL: https://issues.apache.org/jira/browse/HIVE-12878
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-12878.01.patch, HIVE-12878.02.patch, 
> HIVE-12878.03.patch, HIVE-12878.04.patch, HIVE-12878.05.patch, 
> HIVE-12878.06.patch, HIVE-12878.07.patch, HIVE-12878.08.patch, 
> HIVE-12878.09.patch, HIVE-12878.091.patch, HIVE-12878.092.patch, 
> HIVE-12878.093.patch
>
>
> Support vectorizing when the input format is TEXTFILE and other formats for 
> better Map Vertex performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14205) Hive doesn't support union type with AVRO file format

2016-07-25 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392818#comment-15392818
 ] 

Chaoyu Tang commented on HIVE-14205:


+1

> Hive doesn't support union type with AVRO file format
> -
>
> Key: HIVE-14205
> URL: https://issues.apache.org/jira/browse/HIVE-14205
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, 
> HIVE-14205.3.patch, HIVE-14205.4.patch, HIVE-14205.5.patch, 
> HIVE-14205.6.patch, HIVE-14205.7.patch
>
>
> Reproduce steps:
> {noformat}
> hive> CREATE TABLE avro_union_test
> > PARTITIONED BY (p int)
> > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> > STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> > OUTPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> > TBLPROPERTIES ('avro.schema.literal'='{
> >"type":"record",
> >"name":"nullUnionTest",
> >"fields":[
> >   {
> >  "name":"value",
> >  "type":[
> > "null",
> > "int",
> > "long"
> >  ],
> >  "default":null
> >   }
> >]
> > }');
> OK
> Time taken: 0.105 seconds
> hive> alter table avro_union_test add partition (p=1);
> OK
> Time taken: 0.093 seconds
> hive> select * from avro_union_test;
> FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: 
> Failed with exception Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported 
> yet.java.lang.RuntimeException: Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported yet.
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {noformat}
> Another test case to show this problem is:
> {noformat}
> hive> create table avro_union_test2 (value uniontype) stored as 
> avro;
> OK
> Time taken: 0.053 seconds
> hive> show create table avro_union_test2;
> OK
> CREATE TABLE `avro_union_test2`(
>   `value` uniontype COMMENT '')
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS INPUTFORMAT
>   

[jira] [Commented] (HIVE-13815) Improve logic to infer false predicates

2016-07-25 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392811#comment-15392811
 ] 

Ashutosh Chauhan commented on HIVE-13815:
-

+1 needs golden file updates.

> Improve logic to infer false predicates
> ---
>
> Key: HIVE-13815
> URL: https://issues.apache.org/jira/browse/HIVE-13815
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13815.patch
>
>
> Follow-up/extension of the work done in HIVE-13068.
> Ex.
> ql/src/test/results/clientpositive/annotate_stats_filter.q.out
> {{predicate: ((year = 2001) and (state = 'OH') and (state = 'FL')) (type: 
> boolean)}} -> {{false}}
> ql/src/test/results/clientpositive/cbo_rp_join1.q.out
> {{predicate: ((_col0 = _col1) and (_col1 = 40) and (_col0 = 40)) (type: 
> boolean)}} -> {{predicate: ((_col1 = 40) and (_col0 = 40)) (type: boolean)}}
> ql/src/test/results/clientpositive/constprog_semijoin.q.out 
> {{predicate: (((id = 100) = true) and (id <> 100)) (type: boolean)}} -> 
> {{false}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14331) Task should set exception for failed map reduce job.

2016-07-25 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HIVE-14331:
-
Status: Patch Available  (was: Open)

> Task should set exception for failed map reduce job.
> 
>
> Key: HIVE-14331
> URL: https://issues.apache.org/jira/browse/HIVE-14331
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 2.1.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-14331.000.patch
>
>
> Task should set exception for failed map reduce job. So the exception can be 
> seen in HookContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14225) Llap slider package should support configuring YARN rolling log aggregation

2016-07-25 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392806#comment-15392806
 ] 

Siddharth Seth commented on HIVE-14225:
---

Think we need a full writup on llap logging at some point; more importantly 
llap configuration.

> Llap slider package should support configuring YARN rolling log aggregation
> ---
>
> Key: HIVE-14225
> URL: https://issues.apache.org/jira/browse/HIVE-14225
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-14225.01.patch, HIVE-14225.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14331) Task should set exception for failed map reduce job.

2016-07-25 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392803#comment-15392803
 ] 

zhihai xu commented on HIVE-14331:
--

I attached a patch HIVE-14331.000.patch which will set exception in 
MergeFileTask, PartialScanTask, ColumnTruncateTask and ExecDriver.

> Task should set exception for failed map reduce job.
> 
>
> Key: HIVE-14331
> URL: https://issues.apache.org/jira/browse/HIVE-14331
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 2.1.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-14331.000.patch
>
>
> Task should set exception for failed map reduce job. So the exception can be 
> seen in HookContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14331) Task should set exception for failed map reduce job.

2016-07-25 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HIVE-14331:
-
Attachment: HIVE-14331.000.patch

> Task should set exception for failed map reduce job.
> 
>
> Key: HIVE-14331
> URL: https://issues.apache.org/jira/browse/HIVE-14331
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 2.1.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: HIVE-14331.000.patch
>
>
> Task should set exception for failed map reduce job. So the exception can be 
> seen in HookContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14330) fix LockHandle TxnHandler.acquireLock(String key) retry logic

2016-07-25 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392795#comment-15392795
 ] 

Wei Zheng commented on HIVE-14330:
--

+1

> fix LockHandle TxnHandler.acquireLock(String key) retry logic
> -
>
> Key: HIVE-14330
> URL: https://issues.apache.org/jira/browse/HIVE-14330
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-14330.patch
>
>
> stupid bug: return statement is missing.  See patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14324) ORC PPD for floats is broken

2016-07-25 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392781#comment-15392781
 ] 

Gopal V commented on HIVE-14324:


LGTM  - +1.

The narrowing to Float is necessary before widening it back to Double.

> ORC PPD for floats is broken
> 
>
> Key: HIVE-14324
> URL: https://issues.apache.org/jira/browse/HIVE-14324
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 1.3.0, 2.0.0, 2.1.0, 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-14324.1.patch, HIVE-14324.2.patch
>
>
> ORC stores min/max stats, bloom filters by passing floats as doubles using 
> java's widening conversion. So if we write a float value of 0.22 to ORC file, 
> the min/max stats and bloom filter will use 0.219988079071 double value.
> But when we do PPD, SARG creates literals by converting float to string and 
> then to double which compares 0.22 to 0.219988079071 and fails PPD 
> evaluation. 
> {code}
> hive> create table orc_float (f float) stored as orc;
> hive> insert into table orc_float values(0.22);
> hive> set hive.optimize.index.filter=true;
> hive> select * from orc_float where f=0.22;
> OK
> hive> set hive.optimize.index.filter=false;
> hive> select * from orc_float where f=0.22;
> OK
> 0.22
> {code}
> This is not a problem for doubles and decimals.
> This issue was introduced in HIVE-8460 but back then there was no strict type 
> check when SARGs are created and also PPD evaluation does not convert to 
> column type. But now predicate leaf creation in SARG enforces strict type 
> check for boxed literals and predicate type and PPD evaluation converts stats 
> and constants to column type (predicate).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12878) Support Vectorization for TEXTFILE and other formats

2016-07-25 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392777#comment-15392777
 ] 

Lefty Leverenz commented on HIVE-12878:
---

The previous comment about maxColumnWidth is on the wrong jira -- it belongs on 
HIVE-14135.  Thanks go to [~sladymon] for figuring it out and fixing the doc.

> Support Vectorization for TEXTFILE and other formats
> 
>
> Key: HIVE-12878
> URL: https://issues.apache.org/jira/browse/HIVE-12878
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>  Labels: TODOC2.1
> Fix For: 2.1.0
>
> Attachments: HIVE-12878.01.patch, HIVE-12878.02.patch, 
> HIVE-12878.03.patch, HIVE-12878.04.patch, HIVE-12878.05.patch, 
> HIVE-12878.06.patch, HIVE-12878.07.patch, HIVE-12878.08.patch, 
> HIVE-12878.09.patch, HIVE-12878.091.patch, HIVE-12878.092.patch, 
> HIVE-12878.093.patch
>
>
> Support vectorizing when the input format is TEXTFILE and other formats for 
> better Map Vertex performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14135) beeline output not formatted correctly for large column widths

2016-07-25 Thread Shannon Ladymon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392773#comment-15392773
 ] 

Shannon Ladymon commented on HIVE-14135:


Thank you, [~vihangk1], for documenting the Beeline Command Option 
*--maxColumnWidth* in the Hive wiki:
* [HiveServer2 Clients - Beeline Command Options | 
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-BeelineCommandOptions]

> beeline output not formatted correctly for large column widths
> --
>
> Key: HIVE-14135
> URL: https://issues.apache.org/jira/browse/HIVE-14135
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Fix For: 2.2.0
>
> Attachments: HIVE-14135.1.patch, HIVE-14135.2.patch, 
> HIVE-14135.3.patch, csv.txt, csv2.txt, dsv.txt, longKeyValues.txt, 
> output_after.txt, output_before.txt, table.txt, tsv.txt, tsv2.txt, 
> vertical.txt
>
>
> If the column width is too large then beeline uses the maximum column width 
> when normalizing all the column widths. In order to reproduce the issue, run 
> set -v; 
> Once the configuration variables is classpath which can be extremely large 
> width (41k characters in my environment).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14330) fix LockHandle TxnHandler.acquireLock(String key) retry logic

2016-07-25 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14330:
--
Status: Patch Available  (was: Open)

> fix LockHandle TxnHandler.acquireLock(String key) retry logic
> -
>
> Key: HIVE-14330
> URL: https://issues.apache.org/jira/browse/HIVE-14330
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 2.1.0, 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-14330.patch
>
>
> stupid bug: return statement is missing.  See patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14330) fix LockHandle TxnHandler.acquireLock(String key) retry logic

2016-07-25 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14330:
--
Attachment: HIVE-14330.patch

[~wzheng] could you review please

> fix LockHandle TxnHandler.acquireLock(String key) retry logic
> -
>
> Key: HIVE-14330
> URL: https://issues.apache.org/jira/browse/HIVE-14330
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-14330.patch
>
>
> stupid bug: return statement is missing.  See patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13872) Vectorization: Fix cross-product reduce sink serialization

2016-07-25 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13872:

Fix Version/s: 2.2.0

> Vectorization: Fix cross-product reduce sink serialization
> --
>
> Key: HIVE-13872
> URL: https://issues.apache.org/jira/browse/HIVE-13872
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.1.0
>Reporter: Gopal V
>Assignee: Matt McCline
> Fix For: 2.2.0
>
> Attachments: HIVE-13872.01.patch, HIVE-13872.02.patch, 
> HIVE-13872.03.patch, HIVE-13872.04.patch, HIVE-13872.05.patch, 
> HIVE-13872.WIP.patch, customer_demographics.txt, vector_include_no_sel.q, 
> vector_include_no_sel.q.out
>
>
> TPC-DS Q13 produces a cross-product without CBO simplifying the query
> {code}
> Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0 
> projection column num 1
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:349)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:267)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRow(VectorExtractRow.java:343)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:103)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:762)
> ... 18 more
> {code}
> Simplified query
> {code}
> set hive.cbo.enable=false;
> -- explain
> select count(1)  
>  from store_sales
>  ,customer_demographics
>  where (
> ( 
>   customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'M'
>  )or
>  (
>customer_demographics.cd_demo_sk = ss_cdemo_sk
>   and customer_demographics.cd_marital_status = 'U'
>  ))
> ;
> {code}
> {code}
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: customer_demographics
>   Statistics: Num rows: 1920800 Data size: 717255532 Basic 
> stats: COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1920800 Data size: 717255532 Basic 
> stats: COMPLETE Column stats: NONE
> value expressions: cd_demo_sk (type: int), 
> cd_marital_status (type: string)
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14227) Investigate invalid SessionHandle and invalid OperationHandle

2016-07-25 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392741#comment-15392741
 ] 

Aihua Xu commented on HIVE-14227:
-

Thanks [~vgumashta]. What I'm trying to do is to let session be aware of the 
connections. So if the session is bound to 3 connections, disconnecting 2 
connections only unbind the session and the disconnection of the third 
connection will close the session. 

> Investigate invalid SessionHandle and invalid OperationHandle
> -
>
> Key: HIVE-14227
> URL: https://issues.apache.org/jira/browse/HIVE-14227
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14227.1.patch
>
>
> There are the following warnings. 
> {noformat}
> WARN  org.apache.hive.service.cli.thrift.ThriftCLIService: 
> [HiveServer2-Handler-Pool: Thread-55]: Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Invalid SessionHandle: 
> SessionHandle [1bc00251-64e9-4a95-acb7-a7f53f773528]
> at 
> org.apache.hive.service.cli.session.SessionManager.getSession(SessionManager.java:318)
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:258)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:506)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> {noformat}
> {noformat}
> WARN  org.apache.hive.service.cli.thrift.ThriftCLIService: 
> [HiveServer2-Handler-Pool: Thread-1060]: Error closing operation:
> org.apache.hive.service.cli.HiveSQLException: Invalid OperationHandle: 
> OperationHandle [opType=EXECUTE_STATEMENT, 
> getHandleIdentifier()=13d930dd-316c-4c09-9f44-fee5f483e73d]
> at 
> org.apache.hive.service.cli.operation.OperationManager.getOperation(OperationManager.java:185)
> at 
> org.apache.hive.service.cli.CLIService.closeOperation(CLIService.java:408)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.CloseOperation(ThriftCLIService.java:664)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1513)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1498)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14318) Vectorization: LIKE should use matches() instead of find(0)

2016-07-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392738#comment-15392738
 ] 

Sergey Shelukhin commented on HIVE-14318:
-

Nm, I just looked at matches javadoc. It actually is a full-string match.

> Vectorization: LIKE should use matches() instead of find(0)
> ---
>
> Key: HIVE-14318
> URL: https://issues.apache.org/jira/browse/HIVE-14318
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 1.2.1, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-14318.1.patch
>
>
> Checking for a match instead of find() would allow matcher to exit early 
> instead of looking for sub-sequences beyond the first non-match.
> In UDFLike.java, the complex pattern checker uses matches() and the 
> vectorized version uses find(0), which is more expensive.
> {code}
> BenchmarkMode  CntScoreError  Units
> RegexBench.testGreedyRegexHitavgt5  379.316 ± 32.444  ns/op
> RegexBench.testGreedyRegexHitCheck   avgt5  344.895 ± 15.436  ns/op
> RegexBench.testGreedyRegexMiss   avgt5  497.193 ± 18.168  ns/op
> RegexBench.testGreedyRegexMissCheck  avgt5  171.872 ±  8.588  ns/op
> {code}
> The miss in match is nearly ~3x more expensive per-row with the .find(0) over 
> the .match() check version.
> The pattern match scenario is nearly the same.
> The lazy scenario makes it slower when there's a hit (because match runs the 
> check till end, but ~2x faster when there's a miss).
> {code}
> RegexBench.testLazyRegexHit  avgt5   78.398 ±  6.007  ns/op
> RegexBench.testLazyRegexHitCheck avgt5  120.557 ±  4.396  ns/op
> RegexBench.testLazyRegexMiss avgt5  387.594 ± 25.672  ns/op
> RegexBench.testLazyRegexMissCheckavgt5  154.489 ± 13.622  ns/op
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14227) Investigate invalid SessionHandle and invalid OperationHandle

2016-07-25 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392735#comment-15392735
 ] 

Vaibhav Gumashta commented on HIVE-14227:
-

[~aihuaxu] Thanks for the patch. Reviewing it today. Please note that we also 
have a way to send thrift payloads over http (in case that needs special 
consideration).

> Investigate invalid SessionHandle and invalid OperationHandle
> -
>
> Key: HIVE-14227
> URL: https://issues.apache.org/jira/browse/HIVE-14227
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14227.1.patch
>
>
> There are the following warnings. 
> {noformat}
> WARN  org.apache.hive.service.cli.thrift.ThriftCLIService: 
> [HiveServer2-Handler-Pool: Thread-55]: Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Invalid SessionHandle: 
> SessionHandle [1bc00251-64e9-4a95-acb7-a7f53f773528]
> at 
> org.apache.hive.service.cli.session.SessionManager.getSession(SessionManager.java:318)
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:258)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:506)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> {noformat}
> {noformat}
> WARN  org.apache.hive.service.cli.thrift.ThriftCLIService: 
> [HiveServer2-Handler-Pool: Thread-1060]: Error closing operation:
> org.apache.hive.service.cli.HiveSQLException: Invalid OperationHandle: 
> OperationHandle [opType=EXECUTE_STATEMENT, 
> getHandleIdentifier()=13d930dd-316c-4c09-9f44-fee5f483e73d]
> at 
> org.apache.hive.service.cli.operation.OperationManager.getOperation(OperationManager.java:185)
> at 
> org.apache.hive.service.cli.CLIService.closeOperation(CLIService.java:408)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.CloseOperation(ThriftCLIService.java:664)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1513)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1498)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14227) Investigate invalid SessionHandle and invalid OperationHandle

2016-07-25 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392731#comment-15392731
 ] 

Vaibhav Gumashta commented on HIVE-14227:
-

[~aihuaxu] I think what [~szehon] is referring to here is that to implement 
session level failover, it will help if we don't close the session on the 
server, when the client tcp socket that opened the session gets closed.

> Investigate invalid SessionHandle and invalid OperationHandle
> -
>
> Key: HIVE-14227
> URL: https://issues.apache.org/jira/browse/HIVE-14227
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14227.1.patch
>
>
> There are the following warnings. 
> {noformat}
> WARN  org.apache.hive.service.cli.thrift.ThriftCLIService: 
> [HiveServer2-Handler-Pool: Thread-55]: Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Invalid SessionHandle: 
> SessionHandle [1bc00251-64e9-4a95-acb7-a7f53f773528]
> at 
> org.apache.hive.service.cli.session.SessionManager.getSession(SessionManager.java:318)
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:258)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:506)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> {noformat}
> {noformat}
> WARN  org.apache.hive.service.cli.thrift.ThriftCLIService: 
> [HiveServer2-Handler-Pool: Thread-1060]: Error closing operation:
> org.apache.hive.service.cli.HiveSQLException: Invalid OperationHandle: 
> OperationHandle [opType=EXECUTE_STATEMENT, 
> getHandleIdentifier()=13d930dd-316c-4c09-9f44-fee5f483e73d]
> at 
> org.apache.hive.service.cli.operation.OperationManager.getOperation(OperationManager.java:185)
> at 
> org.apache.hive.service.cli.CLIService.closeOperation(CLIService.java:408)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.CloseOperation(ThriftCLIService.java:664)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1513)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseOperation.getResult(TCLIService.java:1498)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14318) Vectorization: LIKE should use matches() instead of find(0)

2016-07-25 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392725#comment-15392725
 ] 

Gopal V commented on HIVE-14318:


This is the difference between LIKE and RLIKE - LIKE is matches() and RLIKE is 
find(0). 

The first one stops when the first char isn't 'a', the latter has to keep 
looking for 'a' in every byte.

> Vectorization: LIKE should use matches() instead of find(0)
> ---
>
> Key: HIVE-14318
> URL: https://issues.apache.org/jira/browse/HIVE-14318
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 1.2.1, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-14318.1.patch
>
>
> Checking for a match instead of find() would allow matcher to exit early 
> instead of looking for sub-sequences beyond the first non-match.
> In UDFLike.java, the complex pattern checker uses matches() and the 
> vectorized version uses find(0), which is more expensive.
> {code}
> BenchmarkMode  CntScoreError  Units
> RegexBench.testGreedyRegexHitavgt5  379.316 ± 32.444  ns/op
> RegexBench.testGreedyRegexHitCheck   avgt5  344.895 ± 15.436  ns/op
> RegexBench.testGreedyRegexMiss   avgt5  497.193 ± 18.168  ns/op
> RegexBench.testGreedyRegexMissCheck  avgt5  171.872 ±  8.588  ns/op
> {code}
> The miss in match is nearly ~3x more expensive per-row with the .find(0) over 
> the .match() check version.
> The pattern match scenario is nearly the same.
> The lazy scenario makes it slower when there's a hit (because match runs the 
> check till end, but ~2x faster when there's a miss).
> {code}
> RegexBench.testLazyRegexHit  avgt5   78.398 ±  6.007  ns/op
> RegexBench.testLazyRegexHitCheck avgt5  120.557 ±  4.396  ns/op
> RegexBench.testLazyRegexMiss avgt5  387.594 ± 25.672  ns/op
> RegexBench.testLazyRegexMissCheckavgt5  154.489 ± 13.622  ns/op
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14318) Vectorization: LIKE should use matches() instead of find(0)

2016-07-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392719#comment-15392719
 ] 

Sergey Shelukhin commented on HIVE-14318:
-

Why wouldn't it, .* matches empty string

> Vectorization: LIKE should use matches() instead of find(0)
> ---
>
> Key: HIVE-14318
> URL: https://issues.apache.org/jira/browse/HIVE-14318
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 1.2.1, 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-14318.1.patch
>
>
> Checking for a match instead of find() would allow matcher to exit early 
> instead of looking for sub-sequences beyond the first non-match.
> In UDFLike.java, the complex pattern checker uses matches() and the 
> vectorized version uses find(0), which is more expensive.
> {code}
> BenchmarkMode  CntScoreError  Units
> RegexBench.testGreedyRegexHitavgt5  379.316 ± 32.444  ns/op
> RegexBench.testGreedyRegexHitCheck   avgt5  344.895 ± 15.436  ns/op
> RegexBench.testGreedyRegexMiss   avgt5  497.193 ± 18.168  ns/op
> RegexBench.testGreedyRegexMissCheck  avgt5  171.872 ±  8.588  ns/op
> {code}
> The miss in match is nearly ~3x more expensive per-row with the .find(0) over 
> the .match() check version.
> The pattern match scenario is nearly the same.
> The lazy scenario makes it slower when there's a hit (because match runs the 
> check till end, but ~2x faster when there's a miss).
> {code}
> RegexBench.testLazyRegexHit  avgt5   78.398 ±  6.007  ns/op
> RegexBench.testLazyRegexHitCheck avgt5  120.557 ±  4.396  ns/op
> RegexBench.testLazyRegexMiss avgt5  387.594 ± 25.672  ns/op
> RegexBench.testLazyRegexMissCheckavgt5  154.489 ± 13.622  ns/op
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7239) Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result when input backed by Sequence/RC files

2016-07-25 Thread Illya Yalovyy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Illya Yalovyy updated HIVE-7239:

Status: Patch Available  (was: Open)

> Fix bug in HiveIndexedInputFormat implementation that causes incorrect query 
> result when input backed by Sequence/RC files
> --
>
> Key: HIVE-7239
> URL: https://issues.apache.org/jira/browse/HIVE-7239
> Project: Hive
>  Issue Type: Bug
>  Components: Indexing
>Affects Versions: 2.1.0
>Reporter: Sumit Kumar
>Assignee: Illya Yalovyy
> Attachments: HIVE-7239.2.patch, HIVE-7239.3.patch, HIVE-7239.patch
>
>
> In case of sequence files, it's crucial that splits are calculated around the 
> boundaries enforced by the input sequence file. However by default hadoop 
> creates input splits depending on the configuration parameters which may not 
> match the boundaries for the input sequence file. Hive provides 
> HiveIndexedInputFormat that provides extra logic and recalculates the split 
> boundaries for each split depending on the sequence file's boundaries.
> However we noticed this behavior of "over" reporting from data backed by 
> sequence file. We've a sample data on which we experimented and fixed this 
> bug, we have verified this fix by comparing the query output for input being 
> sequence file format, rc file and regular format. However we have not able to 
> find the right place to include this as a unit test that would execute as 
> part of hive tests. We tried writing a "clientpositive" test as part of ql 
> module but the output seems quite verbose and i couldn't interpret it that 
> well. Can someone please review this change and guide on how to write a test 
> that will execute as part of Hive testing?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7239) Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result when input backed by Sequence/RC files

2016-07-25 Thread Illya Yalovyy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Illya Yalovyy updated HIVE-7239:

Status: Open  (was: Patch Available)

> Fix bug in HiveIndexedInputFormat implementation that causes incorrect query 
> result when input backed by Sequence/RC files
> --
>
> Key: HIVE-7239
> URL: https://issues.apache.org/jira/browse/HIVE-7239
> Project: Hive
>  Issue Type: Bug
>  Components: Indexing
>Affects Versions: 2.1.0
>Reporter: Sumit Kumar
>Assignee: Illya Yalovyy
> Attachments: HIVE-7239.2.patch, HIVE-7239.patch
>
>
> In case of sequence files, it's crucial that splits are calculated around the 
> boundaries enforced by the input sequence file. However by default hadoop 
> creates input splits depending on the configuration parameters which may not 
> match the boundaries for the input sequence file. Hive provides 
> HiveIndexedInputFormat that provides extra logic and recalculates the split 
> boundaries for each split depending on the sequence file's boundaries.
> However we noticed this behavior of "over" reporting from data backed by 
> sequence file. We've a sample data on which we experimented and fixed this 
> bug, we have verified this fix by comparing the query output for input being 
> sequence file format, rc file and regular format. However we have not able to 
> find the right place to include this as a unit test that would execute as 
> part of hive tests. We tried writing a "clientpositive" test as part of ql 
> module but the output seems quite verbose and i couldn't interpret it that 
> well. Can someone please review this change and guide on how to write a test 
> that will execute as part of Hive testing?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7239) Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result when input backed by Sequence/RC files

2016-07-25 Thread Illya Yalovyy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Illya Yalovyy updated HIVE-7239:

Attachment: HIVE-7239.3.patch

> Fix bug in HiveIndexedInputFormat implementation that causes incorrect query 
> result when input backed by Sequence/RC files
> --
>
> Key: HIVE-7239
> URL: https://issues.apache.org/jira/browse/HIVE-7239
> Project: Hive
>  Issue Type: Bug
>  Components: Indexing
>Affects Versions: 2.1.0
>Reporter: Sumit Kumar
>Assignee: Illya Yalovyy
> Attachments: HIVE-7239.2.patch, HIVE-7239.3.patch, HIVE-7239.patch
>
>
> In case of sequence files, it's crucial that splits are calculated around the 
> boundaries enforced by the input sequence file. However by default hadoop 
> creates input splits depending on the configuration parameters which may not 
> match the boundaries for the input sequence file. Hive provides 
> HiveIndexedInputFormat that provides extra logic and recalculates the split 
> boundaries for each split depending on the sequence file's boundaries.
> However we noticed this behavior of "over" reporting from data backed by 
> sequence file. We've a sample data on which we experimented and fixed this 
> bug, we have verified this fix by comparing the query output for input being 
> sequence file format, rc file and regular format. However we have not able to 
> find the right place to include this as a unit test that would execute as 
> part of hive tests. We tried writing a "clientpositive" test as part of ql 
> module but the output seems quite verbose and i couldn't interpret it that 
> well. Can someone please review this change and guide on how to write a test 
> that will execute as part of Hive testing?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13422) Analyse command not working for column having datatype as decimal(38,0)

2016-07-25 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-13422:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Thomas!

> Analyse command not working for column having datatype as decimal(38,0)
> ---
>
> Key: HIVE-13422
> URL: https://issues.apache.org/jira/browse/HIVE-13422
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Statistics
>Affects Versions: 1.1.0
>Reporter: ashim sinha
>Assignee: Thomas Friedrich
> Fix For: 2.2.0
>
> Attachments: HIVE-13422.patch
>
>
> For the repro
> {code}
> drop table sample_test;
> CREATE TABLE IF NOT EXISTS sample_test( key decimal(38,0),b int ) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
> load data local inpath '/home/hive/analyse.txt' into table sample_test;
> ANALYZE TABLE sample_test COMPUTE STATISTICS FOR COLUMNS;
> {code}
> Sample data
> {code}
> 2023456789456749825082498304 0
> 5032080754887849825069508304 0
> 4012080754887849825068718304 0
> 2012080754887849825066778304 0
> 4012080754887849625065678304 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14305) To/From UTC timestamp may return incorrect result because of DST

2016-07-25 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392706#comment-15392706
 ] 

Ryan Blue commented on HIVE-14305:
--

All time calculations should be carried out in UTC, by setting the default 
Timestamp zone to UTC when Hive starts. After that, setDefault shouldn't be 
called. If this conflicts with other Timestamp uses in the JVM, then Hive 
should change so that it doesn't use Timestamp. Spark, for example, represents 
time internally as microseconds from epoch.

> To/From UTC timestamp may return incorrect result because of DST
> 
>
> Key: HIVE-14305
> URL: https://issues.apache.org/jira/browse/HIVE-14305
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rui Li
>Assignee: Rui Li
>
> If the machine's local timezone involves DST, the UDFs return incorrect 
> results.
> For example:
> {code}
> select to_utc_timestamp('2005-04-03 02:01:00','UTC');
> {code}
> returns {{2005-04-03 03:01:00}}. Correct result should be {{2005-04-03 
> 02:01:00}}.
> {code}
> select to_utc_timestamp('2005-04-03 10:01:00','Asia/Shanghai');
> {code}
> returns {{2005-04-03 03:01:00}}. Correct result should be {{2005-04-03 
> 02:01:00}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14305) To/From UTC timestamp may return incorrect result because of DST

2016-07-25 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392703#comment-15392703
 ] 

Ryan Blue commented on HIVE-14305:
--

Spark uses microseconds from epoch to represent timestamp, so I think the 
solution to that problem is entirely different from what Hive should do.

> To/From UTC timestamp may return incorrect result because of DST
> 
>
> Key: HIVE-14305
> URL: https://issues.apache.org/jira/browse/HIVE-14305
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rui Li
>Assignee: Rui Li
>
> If the machine's local timezone involves DST, the UDFs return incorrect 
> results.
> For example:
> {code}
> select to_utc_timestamp('2005-04-03 02:01:00','UTC');
> {code}
> returns {{2005-04-03 03:01:00}}. Correct result should be {{2005-04-03 
> 02:01:00}}.
> {code}
> select to_utc_timestamp('2005-04-03 10:01:00','Asia/Shanghai');
> {code}
> returns {{2005-04-03 03:01:00}}. Correct result should be {{2005-04-03 
> 02:01:00}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10022) Authorization checks for non existent file/directory should not be recursive

2016-07-25 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10022:

Attachment: HIVE-10022.8.patch

Latest .8.patch attached

> Authorization checks for non existent file/directory should not be recursive
> 
>
> Key: HIVE-10022
> URL: https://issues.apache.org/jira/browse/HIVE-10022
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 0.14.0
>Reporter: Pankit Thapar
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-10022.2.patch, HIVE-10022.3.patch, 
> HIVE-10022.4.patch, HIVE-10022.5.patch, HIVE-10022.6.patch, 
> HIVE-10022.7.patch, HIVE-10022.8.patch, HIVE-10022.patch
>
>
> I am testing a query like : 
> set hive.test.authz.sstd.hs2.mode=true;
> set 
> hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
> set 
> hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateConfigUserAuthenticator;
> set hive.security.authorization.enabled=true;
> set user.name=user1;
> create table auth_noupd(i int) clustered by (i) into 2 buckets stored as orc 
> location '${OUTPUT}' TBLPROPERTIES ('transactional'='true');
> Now, in the above query,  since authorization is true, 
> we would end up calling doAuthorizationV2() which ultimately ends up calling 
> SQLAuthorizationUtils.getPrivilegesFromFS() which calls a recursive method : 
> FileUtils.isActionPermittedForFileHierarchy() with the object or the ancestor 
> of the object we are trying to authorize if the object does not exist. 
> The logic in FileUtils.isActionPermittedForFileHierarchy() is DFS.
> Now assume, we have a path as a/b/c/d that we are trying to authorize.
> In case, a/b/c/d does not exist, we would call 
> FileUtils.isActionPermittedForFileHierarchy() with say a/b/ assuming a/b/c 
> also does not exist.
> If under the subtree at a/b, we have millions of files, then 
> FileUtils.isActionPermittedForFileHierarchy()  is going to check file 
> permission on each of those objects. 
> I do not completely understand why do we have to check for file permissions 
> in all the objects in  branch of the tree that we are not  trying to read 
> from /write to.  
> We could have checked file permission on the ancestor that exists and if it 
> matches what we expect, the return true.
> Please confirm if this is a bug so that I can submit a patch else let me know 
> what I am missing ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10022) Authorization checks for non existent file/directory should not be recursive

2016-07-25 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392698#comment-15392698
 ] 

Sushanth Sowmyan commented on HIVE-10022:
-

>From the above test failures, there are 3 relevant failures:

 * 
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_disallow_transform
 * 
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_insert
 * 
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_insert_local

Of these, the first one, that of authorization_disallow_transform.q, is a good 
failure to have had, since it also demonstrates the base bug - the .q.out 
previously generated had a URI disallow error because it kept looked for a 
blank parent, and then recursed down that, rather than failing because 
transforms were disallowed. Thus, the fix for the first one is to regenerate 
the .q.out file.

The remaining two issues are valid bugs in our current patch, where the 
parent-determination logic is incorrect in our current impl. The way we do this 
now is by looking for the filestatus of a dir, or a filestatus of the first 
parent that exists. Then, we compare that against the provided dir, and decide 
that if they're not identical, then we must be in the parent case. This is 
faulty logic, and we shouldn't be doing string compares if possible, especially 
since we already have FileUtils.getFileStatusOrNull that solves the same issue. 
I've cleaned up that logic to do a better job of determining if we're going 
down the parent case. The new logic is as follows:

 * get the fileStatus corresponding to this path from 
FileUtils.getFileStatusOrNull
 * If we got back null, then this does not exist, and thus, we're going down 
the parent-picking line. Otherwise, we can use it as-is.
 * Also, since our parent-picking utility function is via 
FileUtils.getPathOrParentThatExists , passing the direct filePath to it 
directly is a bit of a waste since we've already determined that it does not 
exist. So, I added in one tiny bit of optimization to prevent double-calling 
the first time, by calling FileUtils.getPathOrParentThatExists(fs, 
filePath.getParent()) rather than FileUtils.getPathOrParentThatExists(fs, 
filePath) , both of which will return an identical result in this case.

See https://gist.github.com/khorgath/9eeec30b0035dfdc70ae24dab2dd9923 for a 
diff between the two patch states (b/w .7.patch and .8.patch)

> Authorization checks for non existent file/directory should not be recursive
> 
>
> Key: HIVE-10022
> URL: https://issues.apache.org/jira/browse/HIVE-10022
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 0.14.0
>Reporter: Pankit Thapar
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-10022.2.patch, HIVE-10022.3.patch, 
> HIVE-10022.4.patch, HIVE-10022.5.patch, HIVE-10022.6.patch, 
> HIVE-10022.7.patch, HIVE-10022.patch
>
>
> I am testing a query like : 
> set hive.test.authz.sstd.hs2.mode=true;
> set 
> hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
> set 
> hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateConfigUserAuthenticator;
> set hive.security.authorization.enabled=true;
> set user.name=user1;
> create table auth_noupd(i int) clustered by (i) into 2 buckets stored as orc 
> location '${OUTPUT}' TBLPROPERTIES ('transactional'='true');
> Now, in the above query,  since authorization is true, 
> we would end up calling doAuthorizationV2() which ultimately ends up calling 
> SQLAuthorizationUtils.getPrivilegesFromFS() which calls a recursive method : 
> FileUtils.isActionPermittedForFileHierarchy() with the object or the ancestor 
> of the object we are trying to authorize if the object does not exist. 
> The logic in FileUtils.isActionPermittedForFileHierarchy() is DFS.
> Now assume, we have a path as a/b/c/d that we are trying to authorize.
> In case, a/b/c/d does not exist, we would call 
> FileUtils.isActionPermittedForFileHierarchy() with say a/b/ assuming a/b/c 
> also does not exist.
> If under the subtree at a/b, we have millions of files, then 
> FileUtils.isActionPermittedForFileHierarchy()  is going to check file 
> permission on each of those objects. 
> I do not completely understand why do we have to check for file permissions 
> in all the objects in  branch of the tree that we are not  trying to read 
> from /write to.  
> We could have checked file permission on the ancestor that exists and if it 
> matches what we expect, the return true.
> Please confirm if this is a bug so that I can submit a patch else let me know 
> what I am missing ?



--

[jira] [Commented] (HIVE-7239) Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result when input backed by Sequence/RC files

2016-07-25 Thread Illya Yalovyy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392697#comment-15392697
 ] 

Illya Yalovyy commented on HIVE-7239:
-

It seems like PreCommit job ignores patches from this jira issue.

[~ashutoshc], could you please help me out? How to force tests to run on this 
patch?

> Fix bug in HiveIndexedInputFormat implementation that causes incorrect query 
> result when input backed by Sequence/RC files
> --
>
> Key: HIVE-7239
> URL: https://issues.apache.org/jira/browse/HIVE-7239
> Project: Hive
>  Issue Type: Bug
>  Components: Indexing
>Affects Versions: 2.1.0
>Reporter: Sumit Kumar
>Assignee: Illya Yalovyy
> Attachments: HIVE-7239.2.patch, HIVE-7239.patch
>
>
> In case of sequence files, it's crucial that splits are calculated around the 
> boundaries enforced by the input sequence file. However by default hadoop 
> creates input splits depending on the configuration parameters which may not 
> match the boundaries for the input sequence file. Hive provides 
> HiveIndexedInputFormat that provides extra logic and recalculates the split 
> boundaries for each split depending on the sequence file's boundaries.
> However we noticed this behavior of "over" reporting from data backed by 
> sequence file. We've a sample data on which we experimented and fixed this 
> bug, we have verified this fix by comparing the query output for input being 
> sequence file format, rc file and regular format. However we have not able to 
> find the right place to include this as a unit test that would execute as 
> part of hive tests. We tried writing a "clientpositive" test as part of ql 
> module but the output seems quite verbose and i couldn't interpret it that 
> well. Can someone please review this change and guide on how to write a test 
> that will execute as part of Hive testing?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14287) Explain output: printed nested mapvalues are dependent on map entry iteration order

2016-07-25 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14287:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Zoltan!

> Explain output: printed nested mapvalues are dependent on map entry iteration 
> order
> ---
>
> Key: HIVE-14287
> URL: https://issues.apache.org/jira/browse/HIVE-14287
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Fix For: 2.2.0
>
> Attachments: HIVE-14287.1.patch
>
>
> Order of map keys are handled by a {{TreeSet}} in {{ExplainTask#outputMap}}, 
> but for Map values, there is only a {{toString()}}  which implicitly iterates 
> over the map and prints them - in case of a {{HashMap}} this iteration order 
> may vary between jdk versions - and may cause false positive testfailures
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java#L472



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14305) To/From UTC timestamp may return incorrect result because of DST

2016-07-25 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392682#comment-15392682
 ] 

Xuefu Zhang commented on HIVE-14305:


Does SPARK-16078 offer any idea for Hive?

> To/From UTC timestamp may return incorrect result because of DST
> 
>
> Key: HIVE-14305
> URL: https://issues.apache.org/jira/browse/HIVE-14305
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Rui Li
>Assignee: Rui Li
>
> If the machine's local timezone involves DST, the UDFs return incorrect 
> results.
> For example:
> {code}
> select to_utc_timestamp('2005-04-03 02:01:00','UTC');
> {code}
> returns {{2005-04-03 03:01:00}}. Correct result should be {{2005-04-03 
> 02:01:00}}.
> {code}
> select to_utc_timestamp('2005-04-03 10:01:00','Asia/Shanghai');
> {code}
> returns {{2005-04-03 03:01:00}}. Correct result should be {{2005-04-03 
> 02:01:00}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14317) Make the print of COLUMN_STATS_ACCURATE more stable.

2016-07-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392683#comment-15392683
 ] 

Sergey Shelukhin commented on HIVE-14317:
-

+1 pending tests... However I still think error logs should not be on trace 
level. When would these errors happen? I don't think it would be too often.

> Make the print of COLUMN_STATS_ACCURATE more stable.
> 
>
> Key: HIVE-14317
> URL: https://issues.apache.org/jira/browse/HIVE-14317
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14317.01.patch, HIVE-14317.02.patch
>
>
> based on different versions, we may have COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} or 
> COLUMN_STATS_ACCURATE 
> {"COLUMN_STATS":{"key":"true","value":"true"},"BASIC_STATS":"true"}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14320) Fix table_access_key_stats with returnpath feature on

2016-07-25 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14320:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Vineet!

> Fix table_access_key_stats with returnpath feature on
> -
>
> Key: HIVE-14320
> URL: https://issues.apache.org/jira/browse/HIVE-14320
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Fix For: 2.2.0
>
> Attachments: HIVE-14320.1.patch, HIVE-14320.2.patch
>
>
> With retrunpath feature on you get nullpointer exception with this test.
> This is because TableAccessAnalyzer expects join operator to have list of 
> underlying table reference (baseSrc). But during conversion of calcite plan 
> to hive operator tree this information is not propagated and is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14189) backport HIVE-13945 to branch-1

2016-07-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392678#comment-15392678
 ] 

Sergio Peña commented on HIVE-14189:


Fixed. This time the patch was picked up.
https://builds.apache.org/view/H-L/view/Hive/job/PreCommit-HIVE-BRANCH1-Build/

Let me know if the tests are executed correctly. I set the java to jdk7

> backport HIVE-13945 to branch-1
> ---
>
> Key: HIVE-14189
> URL: https://issues.apache.org/jira/browse/HIVE-14189
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC1.3
> Attachments: HIVE-14189-branch-1.patch, HIVE-14189.01-branch-1.patch, 
> HIVE-14189.02-branch-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14328) Change branch1 to branch-1 for pre-commit tests

2016-07-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-14328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-14328:
---
Attachment: HIVE-14328.1.patch

> Change branch1 to branch-1 for pre-commit tests
> ---
>
> Key: HIVE-14328
> URL: https://issues.apache.org/jira/browse/HIVE-14328
> Project: Hive
>  Issue Type: Task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14328.1.patch
>
>
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14204) Optimize loading dynamic partitions

2016-07-25 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392673#comment-15392673
 ] 

Ashutosh Chauhan commented on HIVE-14204:
-

+1 pending tests

> Optimize loading dynamic partitions 
> 
>
> Key: HIVE-14204
> URL: https://issues.apache.org/jira/browse/HIVE-14204
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14204.1.patch, HIVE-14204.3.patch, 
> HIVE-14204.4.patch
>
>
> Lots of time is spent in sequential fashion to load dynamic partitioned 
> dataset in driver side. E.g simple dynamic partitioned load as follows takes 
> 300+ seconds
> {noformat}
> INSERT INTO web_sales_test partition(ws_sold_date_sk) select * from 
> tpcds_bin_partitioned_orc_200.web_sales;
> Time taken to load dynamic partitions: 309.22 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14328) Change branch1 to branch-1 for pre-commit tests

2016-07-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-14328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-14328:
---
Issue Type: Task  (was: Bug)

> Change branch1 to branch-1 for pre-commit tests
> ---
>
> Key: HIVE-14328
> URL: https://issues.apache.org/jira/browse/HIVE-14328
> Project: Hive
>  Issue Type: Task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14328.1.patch
>
>
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14326) Merging outer joins without conditions can lead to wrong results

2016-07-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392665#comment-15392665
 ] 

Sergey Shelukhin commented on HIVE-14326:
-

The timed-out tests seem to have failed due to {noformat}
[INFO] Downloading: 
http://repository.apache.org/snapshots/org/apache/directory/client/ldap/ldap-client-api/0.1-SNAPSHOT/maven-metadata.xml
[WARNING] Could not transfer metadata 
org.apache.directory.client.ldap:ldap-client-api:0.1-SNAPSHOT/maven-metadata.xml
 from/to apache.snapshots (http://repository.apache.org/snapshots): Connect to 
repository.apache.org:80 [repository.apache.org/207.244.88.143] failed: 
Connection timed out
[WARNING] Failure to transfer 
org.apache.directory.client.ldap:ldap-client-api:0.1-SNAPSHOT/maven-metadata.xml
 from http://repository.apache.org/snapshots was cached in the local 
repository, resolution will not be reattempted until the update interval of 
apache.snapshots has elapsed or updates are forced. Original error: Could not 
transfer metadata 
org.apache.directory.client.ldap:ldap-client-api:0.1-SNAPSHOT/maven-metadata.xml
 from/to apache.snapshots (http://repository.apache.org/snapshots): Connect to 
repository.apache.org:80 [repository.apache.org/207.244.88.143] failed: 
Connection timed out
[INFO] Downloading: 
http://repository.apache.org/snapshots/org/apache/directory/client/ldap/ldap-client-api/0.1-SNAPSHOT/ldap-client-api-0.1-SNAPSHOT.pom
{noformat}

> Merging outer joins without conditions can lead to wrong results
> 
>
> Key: HIVE-14326
> URL: https://issues.apache.org/jira/browse/HIVE-14326
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: HIVE-14326.patch
>
>
> HIVE-13069 enabled cartesian product merging. However, merge should only be 
> performed between INNER joins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14320) Fix table_access_key_stats with returnpath feature on

2016-07-25 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14320:
---
Attachment: HIVE-14320.2.patch

> Fix table_access_key_stats with returnpath feature on
> -
>
> Key: HIVE-14320
> URL: https://issues.apache.org/jira/browse/HIVE-14320
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14320.1.patch, HIVE-14320.2.patch
>
>
> With retrunpath feature on you get nullpointer exception with this test.
> This is because TableAccessAnalyzer expects join operator to have list of 
> underlying table reference (baseSrc). But during conversion of calcite plan 
> to hive operator tree this information is not propagated and is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14320) Fix table_access_key_stats with returnpath feature on

2016-07-25 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14320:
---
Status: Patch Available  (was: Open)

Updated one golden file for table_access_keys_stats for TestSparkCliDriver

> Fix table_access_key_stats with returnpath feature on
> -
>
> Key: HIVE-14320
> URL: https://issues.apache.org/jira/browse/HIVE-14320
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14320.1.patch, HIVE-14320.2.patch
>
>
> With retrunpath feature on you get nullpointer exception with this test.
> This is because TableAccessAnalyzer expects join operator to have list of 
> underlying table reference (baseSrc). But during conversion of calcite plan 
> to hive operator tree this information is not propagated and is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14320) Fix table_access_key_stats with returnpath feature on

2016-07-25 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14320:
---
Status: Open  (was: Patch Available)

> Fix table_access_key_stats with returnpath feature on
> -
>
> Key: HIVE-14320
> URL: https://issues.apache.org/jira/browse/HIVE-14320
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14320.1.patch, HIVE-14320.2.patch
>
>
> With retrunpath feature on you get nullpointer exception with this test.
> This is because TableAccessAnalyzer expects join operator to have list of 
> underlying table reference (baseSrc). But during conversion of calcite plan 
> to hive operator tree this information is not propagated and is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14189) backport HIVE-13945 to branch-1

2016-07-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392639#comment-15392639
 ] 

Sergio Peña commented on HIVE-14189:


It is the correct name, but the branch-1 job is not working as it was lost in 
our last jenkins server.
I will recreate this job so we can run tests against branch-1.

> backport HIVE-13945 to branch-1
> ---
>
> Key: HIVE-14189
> URL: https://issues.apache.org/jira/browse/HIVE-14189
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC1.3
> Attachments: HIVE-14189-branch-1.patch, HIVE-14189.01-branch-1.patch, 
> HIVE-14189.02-branch-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14326) Merging outer joins without conditions can lead to wrong results

2016-07-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392635#comment-15392635
 ] 

Hive QA commented on HIVE-14326:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12819919/HIVE-14326.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 27 failed/errored test(s), 10301 tests 
executed
*Failed tests:*
{noformat}
TestColumn - did not produce a TEST-*.xml file
TestCookieSigner - did not produce a TEST-*.xml file
TestHS2HttpServer - did not produce a TEST-*.xml file
TestHiveSQLException - did not produce a TEST-*.xml file
TestLdapAtnProviderWithMiniDS - did not produce a TEST-*.xml file
TestLdapAuthenticationProviderImpl - did not produce a TEST-*.xml file
TestMsgBusConnection - did not produce a TEST-*.xml file
TestPlainSaslHelper - did not produce a TEST-*.xml file
TestPluggableHiveSessionImpl - did not produce a TEST-*.xml file
TestRetryingThriftCLIServiceClient - did not produce a TEST-*.xml file
TestServerOptionsProcessor - did not produce a TEST-*.xml file
TestSessionCleanup - did not produce a TEST-*.xml file
TestSessionGlobalInitFile - did not produce a TEST-*.xml file
TestSessionHooks - did not produce a TEST-*.xml file
TestSessionManagerMetrics - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_outer_join5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_outer_join5
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testDelayedLocalityNodeCommErrorImmediateAllocation
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/638/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/638/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-638/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 27 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12819919 - PreCommit-HIVE-MASTER-Build

> Merging outer joins without conditions can lead to wrong results
> 
>
> Key: HIVE-14326
> URL: https://issues.apache.org/jira/browse/HIVE-14326
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: HIVE-14326.patch
>
>
> HIVE-13069 enabled cartesian product merging. However, merge should only be 
> performed between INNER joins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14322) Postgres db issues after Datanucleus 4.x upgrade

2016-07-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392624#comment-15392624
 ] 

Sergey Shelukhin edited comment on HIVE-14322 at 7/25/16 8:42 PM:
--

It's suspicious that it would work with auto-create but not with a script, 
which has nothing to do with DN as such.
I wonder if the syntax in the script (""COLUMN_TYPE" character varying(128) 
DEFAULT NULL::character varying) has anything to do with it? Should we change 
these to NULL and would it help?

Can you reference a DN bug if you file one?



was (Author: sershe):
It's suspicious that it would work with auto-create but not with a script, 
which has nothing to do with postgres as such.
I wonder if the syntax in the script (""COLUMN_TYPE" character varying(128) 
DEFAULT NULL::character varying) has anything to do with it? Should we change 
these to NULL and would it help?

Can you reference a DN bug if you file one?


> Postgres db issues after Datanucleus 4.x upgrade
> 
>
> Key: HIVE-14322
> URL: https://issues.apache.org/jira/browse/HIVE-14322
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0, 2.0.1
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>
> With the upgrade to  datanucleus 4.x versions in HIVE-6113, hive does not 
> work properly with postgres.
> The nullable fields in the database have string "NULL::character varying" 
> instead of real NULL values. This causes various issues.
> One example is -
> {code}
> hive> create table t(i int);
> OK
> Time taken: 1.9 seconds
> hive> create view v as select * from t;
> OK
> Time taken: 0.542 seconds
> hive> select * from v;
> FAILED: SemanticException Unable to fetch table v. 
> java.net.URISyntaxException: Relative path in absolute URI: 
> NULL::character%20varying
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14320) Fix table_access_key_stats with returnpath feature on

2016-07-25 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392630#comment-15392630
 ] 

Ashutosh Chauhan commented on HIVE-14320:
-

+1

> Fix table_access_key_stats with returnpath feature on
> -
>
> Key: HIVE-14320
> URL: https://issues.apache.org/jira/browse/HIVE-14320
> Project: Hive
>  Issue Type: Task
>  Components: Hive
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14320.1.patch
>
>
> With retrunpath feature on you get nullpointer exception with this test.
> This is because TableAccessAnalyzer expects join operator to have list of 
> underlying table reference (baseSrc). But during conversion of calcite plan 
> to hive operator tree this information is not propagated and is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14315) Implement StatsProvidingRecordReader for ParquetRecordReaderWrapper

2016-07-25 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392628#comment-15392628
 ] 

Xuefu Zhang commented on HIVE-14315:


+1

> Implement StatsProvidingRecordReader for ParquetRecordReaderWrapper
> ---
>
> Key: HIVE-14315
> URL: https://issues.apache.org/jira/browse/HIVE-14315
> Project: Hive
>  Issue Type: New Feature
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-14315.0.patch, HIVE-14315.1.patch
>
>
> Currently only ORC supports {{analyze table ... compute statistics noscan}} 
> (via HIVE-6578) where stats such as # of rows, raw datasize, etc., can be 
> obtained via the footer. The similar functionality should be implemented on 
> Parquet since it also has the info in the metadata.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14322) Postgres db issues after Datanucleus 4.x upgrade

2016-07-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392624#comment-15392624
 ] 

Sergey Shelukhin commented on HIVE-14322:
-

It's suspicious that it would work with auto-create but not with a script, 
which has nothing to do with postgres as such.
I wonder if the syntax in the script (""COLUMN_TYPE" character varying(128) 
DEFAULT NULL::character varying) has anything to do with it? Should we change 
these to NULL and would it help?

Can you reference a DN bug if you file one?


> Postgres db issues after Datanucleus 4.x upgrade
> 
>
> Key: HIVE-14322
> URL: https://issues.apache.org/jira/browse/HIVE-14322
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0, 2.0.1
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>
> With the upgrade to  datanucleus 4.x versions in HIVE-6113, hive does not 
> work properly with postgres.
> The nullable fields in the database have string "NULL::character varying" 
> instead of real NULL values. This causes various issues.
> One example is -
> {code}
> hive> create table t(i int);
> OK
> Time taken: 1.9 seconds
> hive> create view v as select * from t;
> OK
> Time taken: 0.542 seconds
> hive> select * from v;
> FAILED: SemanticException Unable to fetch table v. 
> java.net.URISyntaxException: Relative path in absolute URI: 
> NULL::character%20varying
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14316) TestLlapTokenChecker.testCheckPermissions, testGetToken fail

2016-07-25 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14316:

Status: Patch Available  (was: Open)

> TestLlapTokenChecker.testCheckPermissions, testGetToken fail
> 
>
> Key: HIVE-14316
> URL: https://issues.apache.org/jira/browse/HIVE-14316
> Project: Hive
>  Issue Type: Test
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14316.patch
>
>
> cc [~sershe]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14316) TestLlapTokenChecker.testCheckPermissions, testGetToken fail

2016-07-25 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14316:

Attachment: HIVE-14316.patch

Simple patch

> TestLlapTokenChecker.testCheckPermissions, testGetToken fail
> 
>
> Key: HIVE-14316
> URL: https://issues.apache.org/jira/browse/HIVE-14316
> Project: Hive
>  Issue Type: Test
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14316.patch
>
>
> cc [~sershe]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14315) Implement StatsProvidingRecordReader for ParquetRecordReaderWrapper

2016-07-25 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-14315:

Attachment: HIVE-14315.1.patch

Thanks [~xuefuz] for the review. Adding tests.

> Implement StatsProvidingRecordReader for ParquetRecordReaderWrapper
> ---
>
> Key: HIVE-14315
> URL: https://issues.apache.org/jira/browse/HIVE-14315
> Project: Hive
>  Issue Type: New Feature
>  Components: Statistics
>Affects Versions: 2.1.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-14315.0.patch, HIVE-14315.1.patch
>
>
> Currently only ORC supports {{analyze table ... compute statistics noscan}} 
> (via HIVE-6578) where stats such as # of rows, raw datasize, etc., can be 
> obtained via the footer. The similar functionality should be implemented on 
> Parquet since it also has the info in the metadata.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14316) TestLlapTokenChecker.testCheckPermissions, testGetToken fail

2016-07-25 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-14316:
---

Assignee: Sergey Shelukhin

> TestLlapTokenChecker.testCheckPermissions, testGetToken fail
> 
>
> Key: HIVE-14316
> URL: https://issues.apache.org/jira/browse/HIVE-14316
> Project: Hive
>  Issue Type: Test
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
>
> cc [~sershe]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13930) upgrade Hive to latest Hadoop version

2016-07-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392584#comment-15392584
 ] 

Sergey Shelukhin commented on HIVE-13930:
-

Trying again. Thank for the update [~stakiar]!

> upgrade Hive to latest Hadoop version
> -
>
> Key: HIVE-13930
> URL: https://issues.apache.org/jira/browse/HIVE-13930
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13930.01.patch, HIVE-13930.02.patch, 
> HIVE-13930.03.patch, HIVE-13930.04.patch, HIVE-13930.05.patch, 
> HIVE-13930.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-13930) upgrade Hive to latest Hadoop version

2016-07-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392584#comment-15392584
 ] 

Sergey Shelukhin edited comment on HIVE-13930 at 7/25/16 8:13 PM:
--

Trying again. Sorry was on vacation so I didn't upload this earlier. Thank for 
the update [~stakiar]!


was (Author: sershe):
Trying again. Thank for the update [~stakiar]!

> upgrade Hive to latest Hadoop version
> -
>
> Key: HIVE-13930
> URL: https://issues.apache.org/jira/browse/HIVE-13930
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13930.01.patch, HIVE-13930.02.patch, 
> HIVE-13930.03.patch, HIVE-13930.04.patch, HIVE-13930.05.patch, 
> HIVE-13930.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13930) upgrade Hive to latest Hadoop version

2016-07-25 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-13930:

Attachment: HIVE-13930.05.patch

> upgrade Hive to latest Hadoop version
> -
>
> Key: HIVE-13930
> URL: https://issues.apache.org/jira/browse/HIVE-13930
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13930.01.patch, HIVE-13930.02.patch, 
> HIVE-13930.03.patch, HIVE-13930.04.patch, HIVE-13930.05.patch, 
> HIVE-13930.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14189) backport HIVE-13945 to branch-1

2016-07-25 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14189:

Attachment: HIVE-14189-branch-1.patch

> backport HIVE-13945 to branch-1
> ---
>
> Key: HIVE-14189
> URL: https://issues.apache.org/jira/browse/HIVE-14189
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC1.3
> Attachments: HIVE-14189-branch-1.patch, HIVE-14189.01-branch-1.patch, 
> HIVE-14189.02-branch-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14189) backport HIVE-13945 to branch-1

2016-07-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392573#comment-15392573
 ] 

Sergey Shelukhin commented on HIVE-14189:
-

[~spena] is this a correct name for branch-1 patch? 
HIVE-[.][-].patch is the pattern on the 
wiki, it seems to match but doesn't get picked up. I'll reattach just in case 
HiveQA was broken repeatedly

> backport HIVE-13945 to branch-1
> ---
>
> Key: HIVE-14189
> URL: https://issues.apache.org/jira/browse/HIVE-14189
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC1.3
> Attachments: HIVE-14189.01-branch-1.patch, 
> HIVE-14189.02-branch-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13913) LLAP: introduce backpressure to recordreader

2016-07-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392567#comment-15392567
 ] 

Sergey Shelukhin commented on HIVE-13913:
-

[~gopalv] ping?

> LLAP: introduce backpressure to recordreader
> 
>
> Key: HIVE-13913
> URL: https://issues.apache.org/jira/browse/HIVE-13913
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13913.01.patch, HIVE-13913.02.patch, 
> HIVE-13913.03.patch, HIVE-13913.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14323) Reduce number of FS permissions and redundant FS operations

2016-07-25 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392555#comment-15392555
 ] 

Chris Nauroth commented on HIVE-14323:
--

[~rajesh.balamohan], thank you for the patch.

Is the change in {{FileUtils#mkdir}} required?  It appears that the 
{{inheritPerms}} argument is already intended to capture the setting of 
{{HIVE_WAREHOUSE_SUBDIR_INHERIT_PERMS}}, so looking it up again within the 
method might be confusing.  I see some call sites pass along the value of that 
property and others hard-code it.  I see your patch is also updating some of 
those call sites to respect the configuration.  Do you think this change should 
be handled completely by updating the call sites?

{code}
-if (fs.exists(ptnPath)){
-  fs.delete(ptnPath,true);
+try {
+  fs.delete(ptnPath, true);
+} catch (IOException ioe) {
+  //ignore
 }
{code}

I think the intent here is "try the delete, and if the path doesn't exist, just 
keep going."  Catching every {{IOException}} could mask other I/O errors 
though.  Right now, exceptions would propagate out to a wider {{catch 
(Exception)}} block, where there is additional cleanup logic.  I wonder if 
catching every {{IOException}} would harm this cleanup logic.

According to the [FileSystem 
Specification|http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/filesystem/filesystem.html]
 for delete, if there is a recursive delete attempted on a path that doesn't 
exist, then it fails by returning {{false}}, not throwing an exception.  There 
are contract tests that verify this behavior too.

{code}
  LOG.info("Patch..checking isEmptyPath for : " + dirPath);
{code}

Is this a leftover log statement from debugging, or is it intentional to 
include it in the patch?

I don't feel confident commenting on the logic in {{Hive#replaceFiles}}, so 
I'll defer to others more familiar with Hive to review that part.

> Reduce number of FS permissions and redundant FS operations
> ---
>
> Key: HIVE-14323
> URL: https://issues.apache.org/jira/browse/HIVE-14323
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14323.1.patch
>
>
> Some examples are given below.
> 1. When creating stage directory, FileUtils sets the directory permissions by 
> running a set of chgrp and chmod commands. In systems like S3, this would not 
> be relevant.
> 2. In some cases, fs.delete() is followed by fs.exists(). In this case, it 
> might be redundant to check for exists() (lookup ops are expensive in systems 
> like S3). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data

2016-07-25 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14251:

Attachment: HIVE-14251.3.patch

Attached patch-3: update other affected unit tests. Removed one unit test since 
union_type_chk.q is dup of union36.q with the current change.

> Union All of different types resolves to incorrect data
> ---
>
> Key: HIVE-14251
> URL: https://issues.apache.org/jira/browse/HIVE-14251
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14251.1.patch, HIVE-14251.2.patch, 
> HIVE-14251.3.patch
>
>
> create table src(c1 date, c2 int, c3 double);
> insert into src values ('2016-01-01',5,1.25);
> select * from 
> (select c1 from src union all
> select c2 from src union all
> select c3 from src) t;
> It will return NULL for the c1 values. Seems the common data type is resolved 
> to the last c3 which is double.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14302) Tez: Optimized Hashtable can support DECIMAL keys of same precision

2016-07-25 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392522#comment-15392522
 ] 

Gopal V commented on HIVE-14302:


I like the idea of checking up for equality of types between the SerDes.

If the types aren't the same, the issue should be resolved at the query 
compiler level - we can skip the optimization when the types are mismatched, 
instead of specializing on Decimal.

> Tez: Optimized Hashtable can support DECIMAL keys of same precision
> ---
>
> Key: HIVE-14302
> URL: https://issues.apache.org/jira/browse/HIVE-14302
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
>
> Decimal support in the optimized hashtable was decided on the basis of the 
> fact that Decimal(10,1) == Decimal(10, 2) when both contain "1.0" and "1.00".
> However, the joins now don't have any issues with decimal precision because 
> they cast to common.
> {code}
> create temporary table x (a decimal(10,2), b decimal(10,1)) stored as orc;
> insert into x values (1.0, 1.0);
> > explain logical select count(1) from x, x x1 where x.a = x1.b;
> OK  
> LOGICAL PLAN:
> $hdt$_0:$hdt$_0:x
>   TableScan (TS_0)
> alias: x
> filterExpr: (a is not null and true) (type: boolean)
> Filter Operator (FIL_18)
>   predicate: (a is not null and true) (type: boolean)
>   Select Operator (SEL_2)
> expressions: a (type: decimal(10,2))
> outputColumnNames: _col0
> Reduce Output Operator (RS_6)
>   key expressions: _col0 (type: decimal(11,2))
>   sort order: +
>   Map-reduce partition columns: _col0 (type: decimal(11,2))
>   Join Operator (JOIN_8)
> condition map:
>  Inner Join 0 to 1
> keys:
>   0 _col0 (type: decimal(11,2))
>   1 _col0 (type: decimal(11,2))
> Group By Operator (GBY_11)
>   aggregations: count(1)
>   mode: hash
>   outputColumnNames: _col0
> {code}
> See cast up to Decimal(11, 2) in the plan, which normalizes both sides of the 
> join to be able to compare HiveDecimal as-is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14317) Make the print of COLUMN_STATS_ACCURATE more stable.

2016-07-25 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14317:
---
Status: Patch Available  (was: Open)

> Make the print of COLUMN_STATS_ACCURATE more stable.
> 
>
> Key: HIVE-14317
> URL: https://issues.apache.org/jira/browse/HIVE-14317
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14317.01.patch, HIVE-14317.02.patch
>
>
> based on different versions, we may have COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} or 
> COLUMN_STATS_ACCURATE 
> {"COLUMN_STATS":{"key":"true","value":"true"},"BASIC_STATS":"true"}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14317) Make the print of COLUMN_STATS_ACCURATE more stable.

2016-07-25 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14317:
---
Status: Open  (was: Patch Available)

> Make the print of COLUMN_STATS_ACCURATE more stable.
> 
>
> Key: HIVE-14317
> URL: https://issues.apache.org/jira/browse/HIVE-14317
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14317.01.patch, HIVE-14317.02.patch
>
>
> based on different versions, we may have COLUMN_STATS_ACCURATE 
> {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}} or 
> COLUMN_STATS_ACCURATE 
> {"COLUMN_STATS":{"key":"true","value":"true"},"BASIC_STATS":"true"}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >