[jira] [Updated] (HIVE-15143) add logging for HIVE-15024

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15143:

Status: Patch Available  (was: Open)

> add logging for HIVE-15024
> --
>
> Key: HIVE-15143
> URL: https://issues.apache.org/jira/browse/HIVE-15143
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
>Priority: Critical
> Attachments: HIVE-15143.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to 
> org.apache.orc.impl.BufferChunk
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:383)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:338)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:278)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:167)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 23 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to 
> org.apache.orc.impl.BufferChunk
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.prepareRangesForCompressedRead(EncodedReaderImpl.java:728)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:616)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:397)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:424)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:227)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:224)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:224)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:93)
> ... 6 more
> 2016-10-20T00:48:45,354 WARN  [TezTaskRunner 
> (1475017598908_0410_15_00_20_0)] 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Ignoring exception when 
> closing input part(cleanup). Exception class
> =java.io.IOException, message=java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to 
> org.apache.orc.impl.BufferChunk
> 2016-10-20T00:48:45,416 WARN  [TaskHeartbeatThread 
> (1475017598908_0410_15_00_20_0)] 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter: Exiting 
> TaskReporter thread with pending queue size=2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15093) S3-to-S3 Renames: Files should be moved individually rather than at a directory level

2016-11-07 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-15093:

Attachment: HIVE-15093.8.patch

> S3-to-S3 Renames: Files should be moved individually rather than at a 
> directory level
> -
>
> Key: HIVE-15093
> URL: https://issues.apache.org/jira/browse/HIVE-15093
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.1.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15093.1.patch, HIVE-15093.2.patch, 
> HIVE-15093.3.patch, HIVE-15093.4.patch, HIVE-15093.5.patch, 
> HIVE-15093.6.patch, HIVE-15093.7.patch, HIVE-15093.8.patch
>
>
> Hive's MoveTask uses the Hive.moveFile method to move data within a 
> distributed filesystem as well as blobstore filesystems.
> If the move is done within the same filesystem:
> 1: If the source path is a subdirectory of the destination path, files will 
> be moved one by one using a threapool of workers
> 2: If the source path is not a subdirectory of the destination path, a single 
> rename operation is used to move the entire directory
> The second option may not work well on blobstores such as S3. Renames are not 
> metadata operations and require copying all the data. Client connectors to 
> blobstores may not efficiently rename directories. Worst case, the connector 
> will copy each file one by one, sequentially rather than using a threadpool 
> of workers to copy the data (e.g. HADOOP-13600).
> Hive already has code to rename files using a threadpool of workers, but this 
> only occurs in case number 1.
> This JIRA aims to modify the code so that case 1 is triggered when copying 
> within a blobstore. The focus is on copies within a blobstore because 
> needToCopy will return true if the src and target filesystems are different, 
> in which case a different code path is triggered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15144) JSON.org license is now CatX

2016-11-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645636#comment-15645636
 ] 

Sergey Shelukhin commented on HIVE-15144:
-

cc [~sseth] [~hitesh] fyi is this also used in Tez?

> JSON.org license is now CatX
> 
>
> Key: HIVE-15144
> URL: https://issues.apache.org/jira/browse/HIVE-15144
> Project: Hive
>  Issue Type: Bug
>Reporter: Robert Kanter
>Priority: Blocker
> Fix For: 2.2.0
>
>
> per [update resolved legal|http://www.apache.org/legal/resolved.html#json]:
> {quote}
> CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE?
> No. As of 2016-11-03 this has been moved to the 'Category X' license list. 
> Prior to this, use of the JSON Java library was allowed. See Debian's page 
> for a list of alternatives.
> {quote}
> I'm not sure when this dependency was first introduced, but it looks like 
> it's currently used in a few places:
> https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14089) complex type support in LLAP IO is broken

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14089:

Attachment: HIVE-14089.08.patch

A couple more fixes. Debug logging will be removed in due course

> complex type support in LLAP IO is broken 
> --
>
> Key: HIVE-14089
> URL: https://issues.apache.org/jira/browse/HIVE-14089
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14089.04.patch, HIVE-14089.05.patch, 
> HIVE-14089.06.patch, HIVE-14089.07.patch, HIVE-14089.08.patch, 
> HIVE-14089.WIP.2.patch, HIVE-14089.WIP.3.patch, HIVE-14089.WIP.patch
>
>
> HIVE-13617 is causing MiniLlapCliDriver following test failures
> {code}
> org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_all
> org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vector_complex_join
> {code}
> Note to self - need to add multi-stripe test, and also test complex types 
> with some nulls so that present stream is not suppressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15143) add logging for HIVE-15024

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15143:

Attachment: HIVE-15143.patch

The same patch as in HIVE-15024 (that I'm about to remove from there).
cc [~prasanth_j] [~gopalv]

> add logging for HIVE-15024
> --
>
> Key: HIVE-15143
> URL: https://issues.apache.org/jira/browse/HIVE-15143
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
>Priority: Critical
> Attachments: HIVE-15143.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to 
> org.apache.orc.impl.BufferChunk
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:383)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:338)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:278)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:167)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 23 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to 
> org.apache.orc.impl.BufferChunk
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.prepareRangesForCompressedRead(EncodedReaderImpl.java:728)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:616)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:397)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:424)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:227)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:224)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:224)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:93)
> ... 6 more
> 2016-10-20T00:48:45,354 WARN  [TezTaskRunner 
> (1475017598908_0410_15_00_20_0)] 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Ignoring exception when 
> closing input part(cleanup). Exception class
> =java.io.IOException, message=java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to 
> org.apache.orc.impl.BufferChunk
> 2016-10-20T00:48:45,416 WARN  [TaskHeartbeatThread 
> (1475017598908_0410_15_00_20_0)] 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter: Exiting 
> TaskReporter thread with pending queue size=2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15105) Hive shell runs out of memory on Tez

2016-11-07 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645476#comment-15645476
 ] 

Prasanth Jayachandran commented on HIVE-15105:
--

HIVE-11751 should fix this.

> Hive shell runs out of memory on Tez
> 
>
> Key: HIVE-15105
> URL: https://issues.apache.org/jira/browse/HIVE-15105
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.0.1
>Reporter: Premal Shah
>
> Hive 2.0.1
> Hadoop 2.7.2
> Tex 0.8.4
> We have a UDF in hive which take in some values and outputs a score. When 
> running a query on a table which calls the score function on every row, looks 
> like tez is not running the query on YARN, but trying to run it in local 
> mode. It then runs out of memory trying to insert that data into a table.
> Here's the query
> {noformat}
> ADD JAR score.jar;
> CREATE TEMPORARY FUNCTION score AS 'hive.udf.ScoreUDF';
> CREATE TABLE abc AS
> SELECT
> id,
> score(col1, col2) as score
> , '2016-10-11' AS dt
> FROM input_table
> ;
> {noformat}
> Here's the output of the shell
> {noformat}
> Query ID = hadoop_20161028232841_5a06db96-ffaa-4e75-a657-c7cb46ccb3f5
> Total jobs = 1
> Launching Job 1 out of 1
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:3332)
> at 
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
> at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:622)
> at java.lang.StringBuilder.append(StringBuilder.java:202)
> at com.google.protobuf.TextFormat.escapeBytes(TextFormat.java:1283)
> at 
> com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:394)
> at 
> com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
> at 
> com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
> at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
> at 
> com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
> at 
> com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
> at 
> com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
> at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
> at 
> com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
> at 
> com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
> at 
> com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
> at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
> at 
> com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
> at 
> com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
> at 
> com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:283)
> at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
> at 
> com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
> at 
> com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
> at 
> com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:283)
> at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
> at 
> com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
> at 
> com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
> at 
> com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
> at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
> at 
> com.google.protobuf.TextFormat$Printer.access$400(TextFormat.java:248)
> at com.google.protobuf.TextFormat.shortDebugString(TextFormat.java:88)
> FAILED: Execution Error, return code -101 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Java heap space
> {noformat}
> It looks like the job is not getting submitted to the cluster, but running 
> locally. We can't get tez to run the query on the cluster. 
> The hive shell starts with an Xmx of 4G. 
> If I set hive.execution.engine = mr, then the query works, because it runs on 
> the hadoop cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15143) add logging for HIVE-15024

2016-11-07 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645465#comment-15645465
 ] 

Prasanth Jayachandran commented on HIVE-15143:
--

+1

> add logging for HIVE-15024
> --
>
> Key: HIVE-15143
> URL: https://issues.apache.org/jira/browse/HIVE-15143
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
>Priority: Critical
> Attachments: HIVE-15143.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to 
> org.apache.orc.impl.BufferChunk
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:383)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:338)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:278)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:167)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 23 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to 
> org.apache.orc.impl.BufferChunk
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.prepareRangesForCompressedRead(EncodedReaderImpl.java:728)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:616)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:397)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:424)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:227)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:224)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:224)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:93)
> ... 6 more
> 2016-10-20T00:48:45,354 WARN  [TezTaskRunner 
> (1475017598908_0410_15_00_20_0)] 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Ignoring exception when 
> closing input part(cleanup). Exception class
> =java.io.IOException, message=java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to 
> org.apache.orc.impl.BufferChunk
> 2016-10-20T00:48:45,416 WARN  [TaskHeartbeatThread 
> (1475017598908_0410_15_00_20_0)] 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter: Exiting 
> TaskReporter thread with pending queue size=2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15023) SimpleFetchOptimizer needs to optimize limit=0

2016-11-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645547#comment-15645547
 ] 

Ashutosh Chauhan commented on HIVE-15023:
-

LGTM +1

> SimpleFetchOptimizer needs to optimize limit=0
> --
>
> Key: HIVE-15023
> URL: https://issues.apache.org/jira/browse/HIVE-15023
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15023.01.patch, HIVE-15023.02.patch
>
>
> on current master
> {code}
> hive> explain select key from src limit 0;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: 0
>   Processor Tree:
> TableScan
>   alias: src
>   Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: key (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>   Number of rows: 0
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>   ListSink
> Time taken: 7.534 seconds, Fetched: 20 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14924) MSCK REPAIR table with single threaded is throwing null pointer exception

2016-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645545#comment-15645545
 ] 

Hive QA commented on HIVE-14924:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12837818/HIVE-14924.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10629 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=91)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=90)
org.apache.hive.spark.client.TestSparkClient.testJobSubmission (batchId=272)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2008/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2008/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2008/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12837818 - PreCommit-HIVE-Build

> MSCK REPAIR table with single threaded is throwing null pointer exception
> -
>
> Key: HIVE-14924
> URL: https://issues.apache.org/jira/browse/HIVE-14924
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.2.0
>Reporter: Ratheesh Kamoor
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14924.01.patch
>
>
> MSCK REPAIR TABLE is throwing Null Pointer Exception while running on single 
> threaded mode (hive.mv.files.thread=0)
> Error:
> 2016-10-10T22:27:13,564 ERROR [e9ce04a8-2a84-426d-8e79-a2d15b8cee09 
> main([])]: exec.DDLTask (DDLTask.java:failed(581)) - 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkPartitionDirs(HiveMetaStoreChecker.java:423)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:315)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:291)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:236)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:113)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1834)
> In order to reproduce:
> set hive.mv.files.thread=0 and run MSCK REPAIR TABLE command



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14990:

Description: 
Expected failures (lack of support in MM tables for certain commands) 
1) All HCat tests
2) Almost all merge tests (alter .. concat is not supported).
3) Tests that run dfs commands with specific paths.
4) Truncate column
5) Describe formatted will have the new fields in the output before merging 
with ACID
6) Many tests w/explain extended - diff in partition "base file name"

  was:
Expected failures (lack of support in MM tables for certain commands) 
1) All HCat tests
2) Almost all merge tests (alter .. concat is not supported).
3) Tests that run dfs commands with specific paths.
4) Truncate column
5) Describe formatted will have the new fields in the output before merging 
with ACID


> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.patch
>
>
> Expected failures (lack of support in MM tables for certain commands) 
> 1) All HCat tests
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths.
> 4) Truncate column
> 5) Describe formatted will have the new fields in the output before merging 
> with ACID
> 6) Many tests w/explain extended - diff in partition "base file name"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15023) SimpleFetchOptimizer needs to optimize limit=0

2016-11-07 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645537#comment-15645537
 ] 

Pengcheng Xiong commented on HIVE-15023:


all the tests look good to me except 
{code}
 
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
   40 sec  1
 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
  10 sec  58
 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
3 sec   58
{code}
They seem unrelated. [~ashutoshc] or [~jcamachorodriguez], could u take a look? 
Thanks.

> SimpleFetchOptimizer needs to optimize limit=0
> --
>
> Key: HIVE-15023
> URL: https://issues.apache.org/jira/browse/HIVE-15023
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15023.01.patch, HIVE-15023.02.patch
>
>
> on current master
> {code}
> hive> explain select key from src limit 0;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: 0
>   Processor Tree:
> TableScan
>   alias: src
>   Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: key (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>   Number of rows: 0
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>   ListSink
> Time taken: 7.534 seconds, Fetched: 20 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15108) allow Hive script to skip hadoop version check and HBase classpath

2016-11-07 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645573#comment-15645573
 ] 

Gopal V commented on HIVE-15108:


LGTM - +1

> allow Hive script to skip hadoop version check and HBase classpath
> --
>
> Key: HIVE-15108
> URL: https://issues.apache.org/jira/browse/HIVE-15108
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15108.patch, HIVE-15108.patch
>
>
> Both will be performed by default, as before



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15024) LLAP: ClassCastException: org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to org.apache.orc.impl.BufferChunk

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15024:

Attachment: (was: HIVE-15024.patch)

> LLAP: ClassCastException: org.apache.hadoop.hive.common.io.DiskRangeList 
> cannot be cast to org.apache.orc.impl.BufferChunk
> --
>
> Key: HIVE-15024
> URL: https://issues.apache.org/jira/browse/HIVE-15024
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
>Priority: Critical
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to 
> org.apache.orc.impl.BufferChunk
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:383)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:338)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:278)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:167)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 23 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to 
> org.apache.orc.impl.BufferChunk
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.prepareRangesForCompressedRead(EncodedReaderImpl.java:728)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:616)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:397)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:424)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:227)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:224)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:224)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:93)
> ... 6 more
> 2016-10-20T00:48:45,354 WARN  [TezTaskRunner 
> (1475017598908_0410_15_00_20_0)] 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Ignoring exception when 
> closing input part(cleanup). Exception class
> =java.io.IOException, message=java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to 
> org.apache.orc.impl.BufferChunk
> 2016-10-20T00:48:45,416 WARN  [TaskHeartbeatThread 
> (1475017598908_0410_15_00_20_0)] 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter: Exiting 
> TaskReporter thread with pending queue size=2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15024) LLAP: ClassCastException: org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to org.apache.orc.impl.BufferChunk

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15024:

Status: Open  (was: Patch Available)

> LLAP: ClassCastException: org.apache.hadoop.hive.common.io.DiskRangeList 
> cannot be cast to org.apache.orc.impl.BufferChunk
> --
>
> Key: HIVE-15024
> URL: https://issues.apache.org/jira/browse/HIVE-15024
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
>Priority: Critical
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to 
> org.apache.orc.impl.BufferChunk
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:383)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:338)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:278)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:167)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 23 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to 
> org.apache.orc.impl.BufferChunk
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.prepareRangesForCompressedRead(EncodedReaderImpl.java:728)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:616)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:397)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:424)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:227)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:224)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:224)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:93)
> ... 6 more
> 2016-10-20T00:48:45,354 WARN  [TezTaskRunner 
> (1475017598908_0410_15_00_20_0)] 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Ignoring exception when 
> closing input part(cleanup). Exception class
> =java.io.IOException, message=java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to 
> org.apache.orc.impl.BufferChunk
> 2016-10-20T00:48:45,416 WARN  [TaskHeartbeatThread 
> (1475017598908_0410_15_00_20_0)] 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter: Exiting 
> TaskReporter thread with pending queue size=2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15144) JSON.org license is now CatX

2016-11-07 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645496#comment-15645496
 ] 

Robert Kanter commented on HIVE-15144:
--

This is also a problem for the Oozie release we're currently working on.  See 
OOZIE-2723.
Oozie is either going to have to hold the release until Hive has a release with 
this fix or it's going to have to just exclude the dependency and hope things 
mostly work without it.

> JSON.org license is now CatX
> 
>
> Key: HIVE-15144
> URL: https://issues.apache.org/jira/browse/HIVE-15144
> Project: Hive
>  Issue Type: Bug
>Reporter: Robert Kanter
>Priority: Blocker
> Fix For: 2.2.0
>
>
> per [update resolved legal|http://www.apache.org/legal/resolved.html#json]:
> {quote}
> CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE?
> No. As of 2016-11-03 this has been moved to the 'Category X' license list. 
> Prior to this, use of the JSON Java library was allowed. See Debian's page 
> for a list of alternatives.
> {quote}
> I'm not sure when this dependency was first introduced, but it looks like 
> it's currently used in a few places:
> https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15144) JSON.org license is now CatX

2016-11-07 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645760#comment-15645760
 ] 

Hitesh Shah commented on HIVE-15144:


Tez uses jersey-json which is CDDL licensed and does not have the do no evil 
disclaimer. 

> JSON.org license is now CatX
> 
>
> Key: HIVE-15144
> URL: https://issues.apache.org/jira/browse/HIVE-15144
> Project: Hive
>  Issue Type: Bug
>Reporter: Robert Kanter
>Priority: Blocker
> Fix For: 2.2.0
>
>
> per [update resolved legal|http://www.apache.org/legal/resolved.html#json]:
> {quote}
> CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE?
> No. As of 2016-11-03 this has been moved to the 'Category X' license list. 
> Prior to this, use of the JSON Java library was allowed. See Debian's page 
> for a list of alternatives.
> {quote}
> I'm not sure when this dependency was first introduced, but it looks like 
> it's currently used in a few places:
> https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13120) propagate doAs when generating ORC splits

2016-11-07 Thread Chris Drome (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645462#comment-15645462
 ] 

Chris Drome commented on HIVE-13120:


Thanks for the confirmation.

> propagate doAs when generating ORC splits
> -
>
> Key: HIVE-13120
> URL: https://issues.apache.org/jira/browse/HIVE-13120
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yi Zhang
>Assignee: Sergey Shelukhin
> Fix For: 2.1.0, 2.0.1
>
> Attachments: HIVE-13120.patch
>
>
> ORC+HS2+doAs+FetchTask conversion = weird permission errors, e.g. 
> {noformat}
> 2016-02-22 17:24:39,005 WARN  [HiveServer2-Handler-Pool: Thread-587]: 
> thrift.ThriftCLIService (ThriftCLIService.java:FetchResults(681)) - Error 
> fetching results:
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> java.lang.RuntimeException: serious problem
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:352)
> [snip]
> Caused by: java.io.IOException: java.lang.RuntimeException: serious problem
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:415)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1720)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:347)
> ... 24 more
> Caused by: java.lang.RuntimeException: serious problem
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1059)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1086)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:363)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:295)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:446)
> ... 28 more
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.hadoop.security.AccessControlException: Permission denied: 
> user=[snip], access=READ_EXECUTE, inode=[snip]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15120) Storage based auth: allow option to enforce write checks for external tables

2016-11-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-15120:
--
Attachment: HIVE-15120.2.patch

Address Thejas's comments. Set default value to true and use 
HiveConf.ConfVars.METASTORE_AUTHORIZATION_EXTERNALTABLE_DROP_CHECK.defaultBoolVal
 in StorageBasedAuthorizationProvider. The test is running. disallowDropOnTable 
is not a test, it is invoked by testSimplePrivileges. However, the test is not 
right in previous patch, corrected in the new one.

> Storage based auth: allow option to enforce write checks for external tables
> 
>
> Key: HIVE-15120
> URL: https://issues.apache.org/jira/browse/HIVE-15120
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Attachments: HIVE-15120.1.patch, HIVE-15120.2.patch
>
>
> Under storage based authorization, we don't require write permissions on 
> table directory for external table create/drop.
> This is because external table contents are populated often from outside of 
> hive and are not written into from hive. So write access is not needed. Also, 
> we can't require write permissions to drop a table if we don't require them 
> for creation (users who created them should be able to drop them).
> However, this difference in behavior of external tables is not well 
> documented. So users get surprised to learn that drop table can be done by 
> just any user who has read access to the directory. At that point changing 
> the large number of scripts that use external tables is hard. 
> It would be good to have a user config option to have external tables to be 
> treated same as managed tables.
> The option should be off by default, so that the behavior is backward 
> compatible by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14908) Upgrade ANTLR to 3.5.2

2016-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645671#comment-15645671
 ] 

Hive QA commented on HIVE-14908:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12837820/HIVE-14908.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10628 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=90)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[columnstats_partlvl_multiple_part_clause]
 (batchId=83)
org.apache.hadoop.hive.ql.parse.TestParseNegativeDriver.testCliDriver[missing_overwrite]
 (batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2009/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2009/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2009/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12837820 - PreCommit-HIVE-Build

> Upgrade ANTLR to 3.5.2
> --
>
> Key: HIVE-14908
> URL: https://issues.apache.org/jira/browse/HIVE-14908
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14908.01.patch, HIVE-14908.02.patch
>
>
> Antlr v4 is also available but it does not support "->" which is widely used 
> in our grammar. Antlr 3.5.2 is the latest v3 version. It will reduce the code 
> size:
> {code}
> Here is summary of current parser code size
> 422345  HiveLexer.java
> 2436601  HiveParser.java
> 814184  HiveParser_FromClauseParser.java
> 2705920  HiveParser_IdentifiersParser.java
> 777665 HiveParser_SelectClauseParser.java
>After change, it will become
> 319589 HiveLexer.java
> 1853104 HiveParser.java
> 574156 HiveParser_FromClauseParser.java
> 1799195 HiveParser_IdentifiersParser.java
> 587305 HiveParser_SelectClauseParser.java
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14990:

Description: 
Expected failures (lack of support in MM tables for certain commands) 
1) All HCat tests
2) Almost all merge tests (alter .. concat is not supported).
3) Tests that run dfs commands with specific paths.
4) Truncate column
5) Describe formatted will have the new fields in the output before merging 
with ACID

  was:
Expected failures (lack of support in MM tables for certain commands) 
1) All HCat tests
2) Almost all merge tests (alter .. concat is not supported).
3) Tests that run dfs commands with specific paths.
4) Truncate column


> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.patch
>
>
> Expected failures (lack of support in MM tables for certain commands) 
> 1) All HCat tests
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths.
> 4) Truncate column
> 5) Describe formatted will have the new fields in the output before merging 
> with ACID



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15143) add logging for HIVE-15024

2016-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645792#comment-15645792
 ] 

Hive QA commented on HIVE-15143:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12837839/HIVE-15143.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10628 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=91)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2010/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2010/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2010/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12837839 - PreCommit-HIVE-Build

> add logging for HIVE-15024
> --
>
> Key: HIVE-15143
> URL: https://issues.apache.org/jira/browse/HIVE-15143
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
>Priority: Critical
> Attachments: HIVE-15143.patch
>
>
> {code}
> Caused by: java.io.IOException: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to 
> org.apache.orc.impl.BufferChunk
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:383)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:338)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:278)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:167)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 23 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to 
> org.apache.orc.impl.BufferChunk
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.prepareRangesForCompressedRead(EncodedReaderImpl.java:728)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:616)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:397)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:424)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:227)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:224)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:224)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:93)
> ... 6 more
> 2016-10-20T00:48:45,354 WARN  [TezTaskRunner 
> (1475017598908_0410_15_00_20_0)] 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask: Ignoring exception when 
> closing input part(cleanup). Exception class
> =java.io.IOException, message=java.lang.ClassCastException: 
> org.apache.hadoop.hive.common.io.DiskRangeList cannot be cast to 
> org.apache.orc.impl.BufferChunk
> 2016-10-20T00:48:45,416 WARN  [TaskHeartbeatThread 
> (1475017598908_0410_15_00_20_0)] 
> org.apache.hadoop.hive.llap.daemon.impl.LlapTaskReporter: Exiting 
> TaskReporter thread with pending queue size=2
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13966) DbNotificationListener: can loose DDL operation notifications

2016-11-07 Thread Mohit Sabharwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohit Sabharwal updated HIVE-13966:
---
Attachment: HIVE-13966.4.patch

> DbNotificationListener: can loose DDL operation notifications
> -
>
> Key: HIVE-13966
> URL: https://issues.apache.org/jira/browse/HIVE-13966
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Nachiket Vaidya
>Assignee: Mohit Sabharwal
>Priority: Critical
> Attachments: HIVE-13966.1.patch, HIVE-13966.2.patch, 
> HIVE-13966.3.patch, HIVE-13966.4.patch, HIVE-13966.pdf
>
>
> The code for each API in HiveMetaStore.java is like this:
> 1. openTransaction()
> 2. -- operation--
> 3. commit() or rollback() based on result of the operation.
> 4. add entry to notification log (unconditionally)
> If the operation is failed (in step 2), we still add entry to notification 
> log. Found this issue in testing.
> It is still ok as this is the case of false positive.
> If the operation is successful and adding to notification log failed, the 
> user will get an MetaException. It will not rollback the operation, as it is 
> already committed. We need to handle this case so that we will not have false 
> negatives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13120) propagate doAs when generating ORC splits

2016-11-07 Thread Chris Drome (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645802#comment-15645802
 ] 

Chris Drome commented on HIVE-13120:


[~sershe], if you have the branch-1 patch available, could attach it to this 
JIRA as well for reference. Thanks.

> propagate doAs when generating ORC splits
> -
>
> Key: HIVE-13120
> URL: https://issues.apache.org/jira/browse/HIVE-13120
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yi Zhang
>Assignee: Sergey Shelukhin
> Fix For: 2.1.0, 2.0.1
>
> Attachments: HIVE-13120.patch
>
>
> ORC+HS2+doAs+FetchTask conversion = weird permission errors, e.g. 
> {noformat}
> 2016-02-22 17:24:39,005 WARN  [HiveServer2-Handler-Pool: Thread-587]: 
> thrift.ThriftCLIService (ThriftCLIService.java:FetchResults(681)) - Error 
> fetching results:
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> java.lang.RuntimeException: serious problem
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:352)
> [snip]
> Caused by: java.io.IOException: java.lang.RuntimeException: serious problem
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:415)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1720)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:347)
> ... 24 more
> Caused by: java.lang.RuntimeException: serious problem
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1059)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1086)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:363)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:295)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:446)
> ... 28 more
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.hadoop.security.AccessControlException: Permission denied: 
> user=[snip], access=READ_EXECUTE, inode=[snip]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15093) S3-to-S3 Renames: Files should be moved individually rather than at a directory level

2016-11-07 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645575#comment-15645575
 ] 

Sahil Takiar commented on HIVE-15093:
-

[~spena] comments addressed. Added some q tests also.

> S3-to-S3 Renames: Files should be moved individually rather than at a 
> directory level
> -
>
> Key: HIVE-15093
> URL: https://issues.apache.org/jira/browse/HIVE-15093
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.1.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15093.1.patch, HIVE-15093.2.patch, 
> HIVE-15093.3.patch, HIVE-15093.4.patch, HIVE-15093.5.patch, 
> HIVE-15093.6.patch, HIVE-15093.7.patch, HIVE-15093.8.patch
>
>
> Hive's MoveTask uses the Hive.moveFile method to move data within a 
> distributed filesystem as well as blobstore filesystems.
> If the move is done within the same filesystem:
> 1: If the source path is a subdirectory of the destination path, files will 
> be moved one by one using a threapool of workers
> 2: If the source path is not a subdirectory of the destination path, a single 
> rename operation is used to move the entire directory
> The second option may not work well on blobstores such as S3. Renames are not 
> metadata operations and require copying all the data. Client connectors to 
> blobstores may not efficiently rename directories. Worst case, the connector 
> will copy each file one by one, sequentially rather than using a threadpool 
> of workers to copy the data (e.g. HADOOP-13600).
> Hive already has code to rename files using a threadpool of workers, but this 
> only occurs in case number 1.
> This JIRA aims to modify the code so that case 1 is triggered when copying 
> within a blobstore. The focus is on copies within a blobstore because 
> needToCopy will return true if the src and target filesystems are different, 
> in which case a different code path is triggered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-15144) JSON.org license is now CatX

2016-11-07 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645496#comment-15645496
 ] 

Robert Kanter edited comment on HIVE-15144 at 11/7/16 9:36 PM:
---

This is also a problem for the Oozie release we're currently working on.  See 
OOZIE-2723.

How important is this dependency?  One of the things we're considering in 
OOZIE-2723 is simply excluding the JSON.org dependency.


was (Author: rkanter):
This is also a problem for the Oozie release we're currently working on.  See 
OOZIE-2723.
Oozie is either going to have to hold the release until Hive has a release with 
this fix or it's going to have to just exclude the dependency and hope things 
mostly work without it.

> JSON.org license is now CatX
> 
>
> Key: HIVE-15144
> URL: https://issues.apache.org/jira/browse/HIVE-15144
> Project: Hive
>  Issue Type: Bug
>Reporter: Robert Kanter
>Priority: Blocker
> Fix For: 2.2.0
>
>
> per [update resolved legal|http://www.apache.org/legal/resolved.html#json]:
> {quote}
> CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE?
> No. As of 2016-11-03 this has been moved to the 'Category X' license list. 
> Prior to this, use of the JSON Java library was allowed. See Debian's page 
> for a list of alternatives.
> {quote}
> I'm not sure when this dependency was first introduced, but it looks like 
> it's currently used in a few places:
> https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15120) Storage based auth: allow option to enforce write checks for external tables

2016-11-07 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645716#comment-15645716
 ] 

Thejas M Nair commented on HIVE-15120:
--

+1 Pending tests


> Storage based auth: allow option to enforce write checks for external tables
> 
>
> Key: HIVE-15120
> URL: https://issues.apache.org/jira/browse/HIVE-15120
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Attachments: HIVE-15120.1.patch, HIVE-15120.2.patch
>
>
> Under storage based authorization, we don't require write permissions on 
> table directory for external table create/drop.
> This is because external table contents are populated often from outside of 
> hive and are not written into from hive. So write access is not needed. Also, 
> we can't require write permissions to drop a table if we don't require them 
> for creation (users who created them should be able to drop them).
> However, this difference in behavior of external tables is not well 
> documented. So users get surprised to learn that drop table can be done by 
> just any user who has read access to the directory. At that point changing 
> the large number of scripts that use external tables is hard. 
> It would be good to have a user config option to have external tables to be 
> treated same as managed tables.
> The option should be off by default, so that the behavior is backward 
> compatible by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15023) SimpleFetchOptimizer needs to optimize limit=0

2016-11-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15023:
---
Attachment: HIVE-15023.02.patch

> SimpleFetchOptimizer needs to optimize limit=0
> --
>
> Key: HIVE-15023
> URL: https://issues.apache.org/jira/browse/HIVE-15023
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15023.01.patch, HIVE-15023.02.patch
>
>
> on current master
> {code}
> hive> explain select key from src limit 0;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: 0
>   Processor Tree:
> TableScan
>   alias: src
>   Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: key (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>   Number of rows: 0
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>   ListSink
> Time taken: 7.534 seconds, Fetched: 20 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14924) MSCK REPAIR table with single threaded is throwing null pointer exception

2016-11-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14924:
---
Attachment: HIVE-14924.01.patch

> MSCK REPAIR table with single threaded is throwing null pointer exception
> -
>
> Key: HIVE-14924
> URL: https://issues.apache.org/jira/browse/HIVE-14924
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.2.0
>Reporter: Ratheesh Kamoor
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14924.01.patch
>
>
> MSCK REPAIR TABLE is throwing Null Pointer Exception while running on single 
> threaded mode (hive.mv.files.thread=0)
> Error:
> 2016-10-10T22:27:13,564 ERROR [e9ce04a8-2a84-426d-8e79-a2d15b8cee09 
> main([])]: exec.DDLTask (DDLTask.java:failed(581)) - 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkPartitionDirs(HiveMetaStoreChecker.java:423)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:315)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:291)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:236)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:113)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1834)
> In order to reproduce:
> set hive.mv.files.thread=0 and run MSCK REPAIR TABLE command



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14924) MSCK REPAIR table with single threaded is throwing null pointer exception

2016-11-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14924:
---
Status: Open  (was: Patch Available)

> MSCK REPAIR table with single threaded is throwing null pointer exception
> -
>
> Key: HIVE-14924
> URL: https://issues.apache.org/jira/browse/HIVE-14924
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.2.0
>Reporter: Ratheesh Kamoor
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14924.01.patch
>
>
> MSCK REPAIR TABLE is throwing Null Pointer Exception while running on single 
> threaded mode (hive.mv.files.thread=0)
> Error:
> 2016-10-10T22:27:13,564 ERROR [e9ce04a8-2a84-426d-8e79-a2d15b8cee09 
> main([])]: exec.DDLTask (DDLTask.java:failed(581)) - 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkPartitionDirs(HiveMetaStoreChecker.java:423)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:315)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:291)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:236)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:113)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1834)
> In order to reproduce:
> set hive.mv.files.thread=0 and run MSCK REPAIR TABLE command



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14815) Implement Parquet vectorization reader for Primitive types

2016-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644810#comment-15644810
 ] 

Hive QA commented on HIVE-14815:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12837794/HIVE-14815.6.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10636 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnarserde_create_shortcut]
 (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_parquet_types]
 (batchId=60)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_parquet_types]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=91)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2003/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2003/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2003/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12837794 - PreCommit-HIVE-Build

> Implement Parquet vectorization reader for Primitive types 
> ---
>
> Key: HIVE-14815
> URL: https://issues.apache.org/jira/browse/HIVE-14815
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14815.1.patch, HIVE-14815.2.patch, 
> HIVE-14815.3.patch, HIVE-14815.4.patch, HIVE-14815.5.patch, 
> HIVE-14815.6.patch, HIVE-14815.patch
>
>
> Parquet doesn't provide a vectorized reader which can be used by Hive 
> directly. Also for Decimal Column batch, it consists of a batch of 
> HiveDecimal which is a Hive type which is unknown for Parquet. To support 
> Hive vectorization execution engine in Hive, we have to implement the 
> vectorized Parquet reader in Hive side. To limit the performance impacts, we 
> need to implement a page level vectorized reader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15119) Support standard syntax for ROLLUP & CUBE

2016-11-07 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644820#comment-15644820
 ] 

Vineet Garg commented on HIVE-15119:


Thanks Jesus. I'll check and regenerate the q file changes

> Support standard syntax for ROLLUP & CUBE
> -
>
> Key: HIVE-15119
> URL: https://issues.apache.org/jira/browse/HIVE-15119
> Project: Hive
>  Issue Type: Task
>  Components: Parser, SQL
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-15119.03.patch, HIVE-15119.2.patch, HIVE-15119.patch
>
>
> Standard ROLLUP and CUBE syntax is GROUP BY ROLLUP (expression list)... and 
> GROUP BY CUBE (expression list) respectively. 
> Currently HIVE only allows GROUP BY  WITH ROLLUP/CUBE syntax.
>  
>  We would like HIVE to support standard ROLLUP/CUBE syntax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15023) SimpleFetchOptimizer needs to optimize limit=0

2016-11-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15023:
---
Status: Patch Available  (was: Open)

> SimpleFetchOptimizer needs to optimize limit=0
> --
>
> Key: HIVE-15023
> URL: https://issues.apache.org/jira/browse/HIVE-15023
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15023.01.patch, HIVE-15023.02.patch
>
>
> on current master
> {code}
> hive> explain select key from src limit 0;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: 0
>   Processor Tree:
> TableScan
>   alias: src
>   Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: key (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>   Number of rows: 0
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>   ListSink
> Time taken: 7.534 seconds, Fetched: 20 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14924) MSCK REPAIR table with single threaded is throwing null pointer exception

2016-11-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14924:
---
Attachment: (was: HIVE-14924.01.patch)

> MSCK REPAIR table with single threaded is throwing null pointer exception
> -
>
> Key: HIVE-14924
> URL: https://issues.apache.org/jira/browse/HIVE-14924
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.2.0
>Reporter: Ratheesh Kamoor
>Assignee: Pengcheng Xiong
>
> MSCK REPAIR TABLE is throwing Null Pointer Exception while running on single 
> threaded mode (hive.mv.files.thread=0)
> Error:
> 2016-10-10T22:27:13,564 ERROR [e9ce04a8-2a84-426d-8e79-a2d15b8cee09 
> main([])]: exec.DDLTask (DDLTask.java:failed(581)) - 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkPartitionDirs(HiveMetaStoreChecker.java:423)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:315)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:291)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:236)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:113)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1834)
> In order to reproduce:
> set hive.mv.files.thread=0 and run MSCK REPAIR TABLE command



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14735) Build Infra: Spark artifacts download takes a long time

2016-11-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644916#comment-15644916
 ] 

Sergio Peña commented on HIVE-14735:


Thanks [~kgyrtkirk]. The patch looks good, but I need to dig a little more, and 
test it. But it looks promising.
I'll try to review it this week.

> Build Infra: Spark artifacts download takes a long time
> ---
>
> Key: HIVE-14735
> URL: https://issues.apache.org/jira/browse/HIVE-14735
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Reporter: Vaibhav Gumashta
>Assignee: Zoltan Haindrich
> Attachments: HIVE-14735.1.patch, HIVE-14735.1.patch, 
> HIVE-14735.1.patch, HIVE-14735.1.patch, HIVE-14735.2.patch
>
>
> In particular this command:
> {{curl -Sso ./../thirdparty/spark-1.6.0-bin-hadoop2-without-hive.tgz 
> http://d3jw87u4immizc.cloudfront.net/spark-tarball/spark-1.6.0-bin-hadoop2-without-hive.tgz}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14979) Removing stale Zookeeper locks at HiveServer2 initialization

2016-11-07 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644934#comment-15644934
 ] 

Peter Vary commented on HIVE-14979:
---

Thanks [~thejas] for your review!

As for the drawbacks of the current solution, some of them I did think about 
and tired to highlight in the description of the new configuration value, some 
of them I did not. Thanks for pointing those later ones out.

In both cases we try to provide resilience against temporary GC or network 
issues, and a session loss or an improper shutdown will have different effect:
- GC or network issue will cause:
-- Service discovery - Shutting down of the HiveServer2 instance - please 
correct me if I am wrong
-- Query locks - Possible data corruption
- Improper shutdown will cause:
-- Service discovery - Clients connecting to another server until the timeout 
is reached - please correct me if I am wrong
-- Query locks - Locks persists until the timeout is reached

After this discussion I tend to agree with you that different situations call 
for different configurations, so the best solution would to have the provide 
the administrator the ability to match the specific needs.
What would be the default values of the new configurations?
- 20 mins for the Service discovery timeout
- 3 mins for the Lock timeout

If we agree on this, I would create a patch for it.

Thanks,
Peter


> Removing stale Zookeeper locks at HiveServer2 initialization
> 
>
> Key: HIVE-14979
> URL: https://issues.apache.org/jira/browse/HIVE-14979
> Project: Hive
>  Issue Type: Improvement
>  Components: Locking
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-14979.3.patch, HIVE-14979.4.patch, 
> HIVE-14979.5.patch, HIVE-14979.patch
>
>
> HiveServer2 could use Zookeeper to store token that indicate that particular 
> tables are locked with the creation of persistent Zookeeper objects. 
> A problem can occur when a HiveServer2 instance creates a lock on a table and 
> the HiveServer2 instances crashes ("Out of Memory" for example) and the locks 
> are not released in Zookeeper. This lock will then remain until it is 
> manually cleared by an admin.
> There should be a way to remove stale locks at HiveServer2 initialization, 
> helping the admins life.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12891) Hive fails when java.io.tmpdir is set to a relative location

2016-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644942#comment-15644942
 ] 

Hive QA commented on HIVE-12891:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12784190/HIVE-12891.04.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10629 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2004/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2004/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2004/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12784190 - PreCommit-HIVE-Build

> Hive fails when java.io.tmpdir is set to a relative location
> 
>
> Key: HIVE-12891
> URL: https://issues.apache.org/jira/browse/HIVE-12891
> Project: Hive
>  Issue Type: Bug
>Reporter: Reuben Kuhnert
>Assignee: Reuben Kuhnert
> Attachments: HIVE-12891.01.19.2016.01.patch, HIVE-12891.03.patch, 
> HIVE-12891.04.patch, HIVE-12981.01.22.2016.02.patch
>
>
> The function {{SessionState.createSessionDirs}} fails when trying to create 
> directories where {{java.io.tmpdir}} is set to a relative location.
> {code}
> \[SubtaskRunner] ERROR o.a.h.hive..ql.Driver - FAILED: 
> IllegalArgumentException java.net.URISyntaxException: Relative path in 
> absolute URI: 
> file:./tmp///hive_2015_12_11_09-12-25_352_4325234652356-1
> ...
> Minor variations:
> \[SubtaskRunner] ERROR o.a.h.hive..ql.Driver - FAILED: SemanticException 
> Exception while processing Exception while writing out the local file 
> o.a.h.hive.ql/parse.SemanticException: Exception while processing exception 
> while writing out local file 
> ... 
> caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
> Relative path in absolute URI: 
> file:./tmp///hive_2015_12_11_09-12-25_352_4325234652356-1 
> at o.a.h.fs.Path.initialize (206) 
> at o.a.h.fs.Path.(197)... 
> at o.a.h.hive.ql.context.getScratchDir(267) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15023) SimpleFetchOptimizer needs to optimize limit=0

2016-11-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15023:
---
Status: Open  (was: Patch Available)

> SimpleFetchOptimizer needs to optimize limit=0
> --
>
> Key: HIVE-15023
> URL: https://issues.apache.org/jira/browse/HIVE-15023
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15023.01.patch, HIVE-15023.02.patch
>
>
> on current master
> {code}
> hive> explain select key from src limit 0;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: 0
>   Processor Tree:
> TableScan
>   alias: src
>   Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: key (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>   Number of rows: 0
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>   ListSink
> Time taken: 7.534 seconds, Fetched: 20 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15023) SimpleFetchOptimizer needs to optimize limit=0

2016-11-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15023:
---
Attachment: (was: HIVE-15023.02.patch)

> SimpleFetchOptimizer needs to optimize limit=0
> --
>
> Key: HIVE-15023
> URL: https://issues.apache.org/jira/browse/HIVE-15023
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15023.01.patch, HIVE-15023.02.patch
>
>
> on current master
> {code}
> hive> explain select key from src limit 0;
> OK
> STAGE DEPENDENCIES:
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-0
> Fetch Operator
>   limit: 0
>   Processor Tree:
> TableScan
>   alias: src
>   Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
>   Select Operator
> expressions: key (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>   Number of rows: 0
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE
>   ListSink
> Time taken: 7.534 seconds, Fetched: 20 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14908) Upgrade ANTLR to 3.5.2

2016-11-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14908:
---
Status: Patch Available  (was: Open)

> Upgrade ANTLR to 3.5.2
> --
>
> Key: HIVE-14908
> URL: https://issues.apache.org/jira/browse/HIVE-14908
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14908.01.patch, HIVE-14908.02.patch
>
>
> Antlr v4 is also available but it does not support "->" which is widely used 
> in our grammar. Antlr 3.5.2 is the latest v3 version. It will reduce the code 
> size:
> {code}
> Here is summary of current parser code size
> 422345  HiveLexer.java
> 2436601  HiveParser.java
> 814184  HiveParser_FromClauseParser.java
> 2705920  HiveParser_IdentifiersParser.java
> 777665 HiveParser_SelectClauseParser.java
>After change, it will become
> 319589 HiveLexer.java
> 1853104 HiveParser.java
> 574156 HiveParser_FromClauseParser.java
> 1799195 HiveParser_IdentifiersParser.java
> 587305 HiveParser_SelectClauseParser.java
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14908) Upgrade ANTLR to 3.5.2

2016-11-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14908:
---
Attachment: HIVE-14908.02.patch

> Upgrade ANTLR to 3.5.2
> --
>
> Key: HIVE-14908
> URL: https://issues.apache.org/jira/browse/HIVE-14908
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14908.01.patch, HIVE-14908.02.patch
>
>
> Antlr v4 is also available but it does not support "->" which is widely used 
> in our grammar. Antlr 3.5.2 is the latest v3 version. It will reduce the code 
> size:
> {code}
> Here is summary of current parser code size
> 422345  HiveLexer.java
> 2436601  HiveParser.java
> 814184  HiveParser_FromClauseParser.java
> 2705920  HiveParser_IdentifiersParser.java
> 777665 HiveParser_SelectClauseParser.java
>After change, it will become
> 319589 HiveLexer.java
> 1853104 HiveParser.java
> 574156 HiveParser_FromClauseParser.java
> 1799195 HiveParser_IdentifiersParser.java
> 587305 HiveParser_SelectClauseParser.java
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15135) Add an llap mode which fails if queries cannot run in llap

2016-11-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644833#comment-15644833
 ] 

Siddharth Seth commented on HIVE-15135:
---

Doesn't indicate whether queries will fail or not. Thoughts on compile time vs 
runtime? Maybe all_force for runtime failures. Leaning more towards adding both 
nodes - runtime failure mode (was initially worried about this as a security 
concern, which it isn't since we don't localize jars, and to some extent 
debuggability and potential hangs in llap.

> Add an llap mode which fails if queries cannot run in llap
> --
>
> Key: HIVE-15135
> URL: https://issues.apache.org/jira/browse/HIVE-15135
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-15135.01.patch
>
>
> ALL currently ends up launching new containers for queries which cannot run 
> in llap.
> There should be a mode where these queries don't run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14924) MSCK REPAIR table with single threaded is throwing null pointer exception

2016-11-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14924:
---
Status: Patch Available  (was: Open)

> MSCK REPAIR table with single threaded is throwing null pointer exception
> -
>
> Key: HIVE-14924
> URL: https://issues.apache.org/jira/browse/HIVE-14924
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.2.0
>Reporter: Ratheesh Kamoor
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14924.01.patch
>
>
> MSCK REPAIR TABLE is throwing Null Pointer Exception while running on single 
> threaded mode (hive.mv.files.thread=0)
> Error:
> 2016-10-10T22:27:13,564 ERROR [e9ce04a8-2a84-426d-8e79-a2d15b8cee09 
> main([])]: exec.DDLTask (DDLTask.java:failed(581)) - 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkPartitionDirs(HiveMetaStoreChecker.java:423)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:315)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:291)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:236)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:113)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1834)
> In order to reproduce:
> set hive.mv.files.thread=0 and run MSCK REPAIR TABLE command



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15085) Reduce the memory used by unit tests, MiniCliDriver, MiniLlapLocal, MiniSpark

2016-11-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-15085:
--
Attachment: HIVE-15085.02.patch

Updated.

> Reduce the memory used by unit tests, MiniCliDriver, MiniLlapLocal, MiniSpark
> -
>
> Key: HIVE-15085
> URL: https://issues.apache.org/jira/browse/HIVE-15085
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-15085.01.patch, HIVE-15085.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14908) Upgrade ANTLR to 3.5.2

2016-11-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14908:
---
Status: Open  (was: Patch Available)

> Upgrade ANTLR to 3.5.2
> --
>
> Key: HIVE-14908
> URL: https://issues.apache.org/jira/browse/HIVE-14908
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14908.01.patch, HIVE-14908.02.patch
>
>
> Antlr v4 is also available but it does not support "->" which is widely used 
> in our grammar. Antlr 3.5.2 is the latest v3 version. It will reduce the code 
> size:
> {code}
> Here is summary of current parser code size
> 422345  HiveLexer.java
> 2436601  HiveParser.java
> 814184  HiveParser_FromClauseParser.java
> 2705920  HiveParser_IdentifiersParser.java
> 777665 HiveParser_SelectClauseParser.java
>After change, it will become
> 319589 HiveLexer.java
> 1853104 HiveParser.java
> 574156 HiveParser_FromClauseParser.java
> 1799195 HiveParser_IdentifiersParser.java
> 587305 HiveParser_SelectClauseParser.java
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15129) LLAP : Enhance cache hits for stripe metadata across queries

2016-11-07 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15129:

Attachment: HIVE-15129.2.patch

Footer information can be reused in {{loadMissingIndexes}} which would be very 
useful for cloud environments. Added it in .2 patch

> LLAP : Enhance cache hits for stripe metadata across queries
> 
>
> Key: HIVE-15129
> URL: https://issues.apache.org/jira/browse/HIVE-15129
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15129.1.patch, HIVE-15129.2.patch
>
>
> When multiple queries are run in LLAP, stripe metadata cache misses were 
> observed even though enough memory was available. 
> https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L655.
>  Even in cases when data was found in cache, it wasn't getting used as 
> {{globalnc}} changed from query to query.  Creating a superset of existing 
> indexes with {{globalInc}} would be helpful. 
> This would be lot more beneficial in cloud storage where opening and reading 
> small of data can be expensive compared to HDFS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14910) Flaky test: TestSparkClient.testJobSubmission

2016-11-07 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-14910:
---
Attachment: HIVE-14910.3.patch

Reuploading patch nr 3.

> Flaky test: TestSparkClient.testJobSubmission
> -
>
> Key: HIVE-14910
> URL: https://issues.apache.org/jira/browse/HIVE-14910
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-14910.1.patch, HIVE-14910.2.patch, 
> HIVE-14910.3.patch, HIVE-14910.patch
>
>
> Have seen this fail in multiple runs (not consistently)
> e.g. https://builds.apache.org/job/PreCommit-HIVE-Build/1426/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14910) Flaky test: TestSparkClient.testJobSubmission

2016-11-07 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-14910:
---
Attachment: (was: HIVE-14910.3.patch)

> Flaky test: TestSparkClient.testJobSubmission
> -
>
> Key: HIVE-14910
> URL: https://issues.apache.org/jira/browse/HIVE-14910
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-14910.1.patch, HIVE-14910.2.patch, 
> HIVE-14910.3.patch, HIVE-14910.patch
>
>
> Have seen this fail in multiple runs (not consistently)
> e.g. https://builds.apache.org/job/PreCommit-HIVE-Build/1426/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15138) String + Integer gets converted to UDFToDouble causing number format exceptions

2016-11-07 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15643514#comment-15643514
 ] 

Rajesh Balamohan commented on HIVE-15138:
-

Yeah, HIVE-5021 would be needed for better handling of this. 

It would be helpful if Hive adds WARN messages so that users are aware of such 
conversions in the queries.

> String + Integer gets converted to UDFToDouble causing number format 
> exceptions
> ---
>
> Key: HIVE-15138
> URL: https://issues.apache.org/jira/browse/HIVE-15138
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
>
> TPCDS Query 72 has {{"d3.d_date > d1.d_date + 5"}} where in, d_date contains 
> data like {{2002-02-03, 2001-11-07}}. When running this query, compiler 
> converts this into UDFToDouble and causes large number of
> {{NumberFormatExceptions}} trying to convert string to double. Example Stack 
> trace is given below, which can be a good amount of perf hit filling up the 
> stack for every row, depending on the amount of data.
> {noformat}
> "TezTaskRunner" #41340 daemon prio=5 os_prio=0 tid=0x7f7914745000 
> nid=0x9725 runnable [0x7f787ee4a000]
>java.lang.Thread.State: RUNNABLE
> at java.lang.Throwable.fillInStackTrace(Native Method)
> at java.lang.Throwable.fillInStackTrace(Throwable.java:783)
> - locked <0x7f804b125ab0> (a java.lang.NumberFormatException)
> at java.lang.Throwable.(Throwable.java:265)
> at java.lang.Exception.(Exception.java:66)
> at java.lang.RuntimeException.(RuntimeException.java:62)
> at 
> java.lang.IllegalArgumentException.(IllegalArgumentException.java:52)
> at 
> java.lang.NumberFormatException.(NumberFormatException.java:55)
> at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
> at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
> at java.lang.Double.parseDouble(Double.java:538)
> at 
> org.apache.hadoop.hive.ql.udf.UDFToDouble.evaluate(UDFToDouble.java:172)
> at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:967)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:194)
> at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.setResult(VectorUDFAdaptor.java:194)
> at 
> org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:150)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:121)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterDoubleColGreaterDoubleColumn.evaluate(FilterDoubleColGreaterDoubleColumn.java:51)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:110)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:144)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
> at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:600)
> at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:386)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
> at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:600)
> at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerLongOperator.process(VectorMapJoinInnerLongOperator.java:386)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
> {noformat}
> Simple query to reproduce this issue is given below.  It would be helpful if 
> hive gives explicit WARN messages so that end user can add explicit casts to 
> avoid such situations.
> {noformat}
> Latest Hive (master): (Check UDFToDouble for d_date field)
> 
> hive> explain select distinct d_date + 5 from date_dim limit 10;
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: rbalamohan_20161107005816_1cc412bf-c19c-45c4-b468-236e4fc8ae09:8
>   Edges:
> 

[jira] [Updated] (HIVE-14919) Improve the performance of Hive on Spark 2.0.0

2016-11-07 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-14919:

Attachment: (was: benchmark.xlsx)

> Improve the performance of Hive on Spark 2.0.0
> --
>
> Key: HIVE-14919
> URL: https://issues.apache.org/jira/browse/HIVE-14919
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
>
> In HIVE-14029, we have updated Spark dependency to 2.0.0. We use Intel 
> BigBench[1] to run benchmark with Spark 2.0 over 10 GB data set comparing 
> with Spark 1.6. We can see quite some performance degradation for most of the 
> queries for BigBench. For detailed information, please see the attached file 
> for detailed information. This JIRA is the umbrella ticket addressing those 
> performance issues.
> [1] https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14990:

Description: 
Expected failures (lack of support in MM tables for certain commands) 
1) All HCat tests
2) Almost all merge tests (alter .. concat is not supported).
3) Tests that run dfs commands with specific paths.
4) Truncate column
5) Describe formatted will have the new fields in the output before merging 
with ACID
6) Many tests w/explain extended - diff in partition "base file name"
7) TestTxnCommands - all the conversion tests, as they check for bucket count 
using file status

  was:
Expected failures (lack of support in MM tables for certain commands) 
1) All HCat tests
2) Almost all merge tests (alter .. concat is not supported).
3) Tests that run dfs commands with specific paths.
4) Truncate column
5) Describe formatted will have the new fields in the output before merging 
with ACID
6) Many tests w/explain extended - diff in partition "base file name"


> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.patch
>
>
> Expected failures (lack of support in MM tables for certain commands) 
> 1) All HCat tests
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths.
> 4) Truncate column
> 5) Describe formatted will have the new fields in the output before merging 
> with ACID
> 6) Many tests w/explain extended - diff in partition "base file name"
> 7) TestTxnCommands - all the conversion tests, as they check for bucket count 
> using file status



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2016-11-07 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-15121:

Attachment: HIVE-15121.1.patch

Adding qtest.

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15121.1.patch, HIVE-15121.WIP.1.patch, 
> HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a mutli-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14990:

Description: 
Expected failures 
1) All HCat tests (cannot write MM tables via the HCat writer)
2) Almost all merge tests (alter .. concat is not supported).
3) Tests that run dfs commands with specific paths (path changes).
4) Truncate column (not supported).
5) Describe formatted will have the new table fields in the output (before 
merging MM with ACID).
6) Many tests w/explain extended - diff in partition "base file name" (path 
changes).
7) TestTxnCommands - all the conversion tests, as they check for bucket count 
using file lists (path changes).
8) HBase metastore tests cause methods are not implemented.
9) Some load and ExIm tests that export a table and then rely on specific path 
for load (path changes).

  was:
Expected failures 
1) All HCat tests (cannot write MM tables via the HCat writer)
2) Almost all merge tests (alter .. concat is not supported).
3) Tests that run dfs commands with specific paths.
4) Truncate column (not supported)
5) Describe formatted will have the new fields in the output before merging 
with ACID
6) Many tests w/explain extended - diff in partition "base file name"
7) TestTxnCommands - all the conversion tests, as they check for bucket count 
using file status
8) HBase metastore tests cause methods are not implemented


> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.patch
>
>
> Expected failures 
> 1) All HCat tests (cannot write MM tables via the HCat writer)
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths (path changes).
> 4) Truncate column (not supported).
> 5) Describe formatted will have the new table fields in the output (before 
> merging MM with ACID).
> 6) Many tests w/explain extended - diff in partition "base file name" (path 
> changes).
> 7) TestTxnCommands - all the conversion tests, as they check for bucket count 
> using file lists (path changes).
> 8) HBase metastore tests cause methods are not implemented.
> 9) Some load and ExIm tests that export a table and then rely on specific 
> path for load (path changes).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15147) LLAP: use LLAP cache for non-columnar formats in a somewhat general way

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15147:

Description: 
The primary goal for the first pass is caching text formats. Nothing would 
prevent other formats from using the same path, in principle, although, as was 
originally done with ORC, it may be better to have native caching support 
optimized for each particular format.
Given that caching pure text is not smart, and we already have ORC-encoded 
cache that is columnar due to ORC file structure, we will transform data into 
columnar ORC.
The general idea is to treat all the data in the world as merely ORC that was 
compressed with some poor compression codec, such as csv. Using the original IF 
and serde, as well as ORC writer (with some heavyweight optimizations removed, 
potentially), we can "uncompress" the data into "original" ORC, then reuse a 
lot of the existing code.
Various other points:
1) Granularity in the file will have to be somehow determined (horizontal 
slicing of the file, to avoid caching entire columns). We can base it on 
arbitrary disk offsets determined during reading, but they will actually have 
to be propagated to the reader from the original inputformat. Row counts are 
easier to use but there's a problem of how to actually map them to missing 
ranges to read from disk.
2) Obviously for row-based formats, if any one column one needs is evicted, 
"all the columns" have to be read for the corresponding slice. The vague plan 
is to handle this implicitly, similarly to how ORC reader handles CB-RG 
overlaps - it will just so happen that a missing column will expand the 
disk-range-to-read into the whole horizontal slice of the file.
3) Granularity/etc. won't work for gzipped text. If anything at all is evicted, 
the entire file has to be re-read. Gzipped text is a ridiculous feature, so 
this is by design.
4) In future, it would be possible to also build some form or metadata/indexes 
for this cached data to do PPD, etc. This is out of the scope of this stage.

  was:
The primary goal for the first pass is caching text formats. Nothing would 
prevent other formats from using the same path, in principle, although, as was 
originally done with ORC, it may be better to have native caching support 
optimized for each particular format.
Given that caching pure text is not smart, and we already have ORC-encoded 
cache that is columnar due to ORC file structure, we will try to reuse that. 
The general idea is to treat all the data in the world as merely ORC that was 
compressed with some poor compression codec, such as csv. Using the original IF 
and serde, as well as ORC writer (with some heavyweight optimizations removed, 
potentially), we can "uncompress" the data into "original" ORC, then reuse a 
lot of the existing code.
Various other points:
1) Granularity in the file will have to be somehow determined (horizontal 
slicing of the file, to avoid caching entire columns). We can base it on 
arbitrary disk offsets determined during reading, but they will actually have 
to be propagated to the reader from the original inputformat. Row counts are 
easier to use but there's a problem of how to actually map them to missing 
ranges to read from disk.
2) Obviously for row-based formats, if any one column one needs is evicted, 
"all the columns" have to be read for the corresponding slice. The vague plan 
is to handle this implicitly, similarly to how ORC reader handles CB-RG 
overlaps - it will just so happen that a missing column will expand the 
disk-range-to-read into the whole horizontal slice of the file.
3) Granularity/etc. won't work for gzipped text. If anything at all is evicted, 
the entire file has to be re-read. Gzipped text is a ridiculous feature, so 
this is by design.
4) In future, it would be possible to also build some form or metadata/indexes 
for this cached data to do PPD, etc. This is out of the scope of this stage.


> LLAP: use LLAP cache for non-columnar formats in a somewhat general way
> ---
>
> Key: HIVE-15147
> URL: https://issues.apache.org/jira/browse/HIVE-15147
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> The primary goal for the first pass is caching text formats. Nothing would 
> prevent other formats from using the same path, in principle, although, as 
> was originally done with ORC, it may be better to have native caching support 
> optimized for each particular format.
> Given that caching pure text is not smart, and we already have ORC-encoded 
> cache that is columnar due to ORC file structure, we will transform data into 
> columnar ORC.
> The general idea is to treat all the data in the world as merely ORC that was 
> compressed with some poor 

[jira] [Updated] (HIVE-15120) Storage based auth: allow option to enforce write checks for external tables

2016-11-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-15120:
--
Attachment: HIVE-15120.3.patch

Fix UT failures.

> Storage based auth: allow option to enforce write checks for external tables
> 
>
> Key: HIVE-15120
> URL: https://issues.apache.org/jira/browse/HIVE-15120
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Attachments: HIVE-15120.1.patch, HIVE-15120.2.patch, 
> HIVE-15120.3.patch
>
>
> Under storage based authorization, we don't require write permissions on 
> table directory for external table create/drop.
> This is because external table contents are populated often from outside of 
> hive and are not written into from hive. So write access is not needed. Also, 
> we can't require write permissions to drop a table if we don't require them 
> for creation (users who created them should be able to drop them).
> However, this difference in behavior of external tables is not well 
> documented. So users get surprised to learn that drop table can be done by 
> just any user who has read access to the directory. At that point changing 
> the large number of scripts that use external tables is hard. 
> It would be good to have a user config option to have external tables to be 
> treated same as managed tables.
> The option should be off by default, so that the behavior is backward 
> compatible by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15139) HoS local mode fails with NumberFormatException

2016-11-07 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646423#comment-15646423
 ] 

Chaoyu Tang commented on HIVE-15139:


Great, many thanks.

> HoS local mode fails with NumberFormatException
> ---
>
> Key: HIVE-15139
> URL: https://issues.apache.org/jira/browse/HIVE-15139
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-15139.1.patch
>
>
> It's because we store {{stageId_attemptNum}} in JobMetricsListener but expect 
> only {{stageId}} in LocalSparkJobStatus.
> {noformat}
> java.lang.NumberFormatException: For input string: "0_0"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.status.impl.LocalSparkJobStatus.getSparkStatistics(LocalSparkJobStatus.java:146)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:104)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15120) Storage based auth: allow option to enforce write checks for external tables

2016-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646460#comment-15646460
 ] 

Hive QA commented on HIVE-15120:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12837885/HIVE-15120.3.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10628 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2016/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2016/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2016/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12837885 - PreCommit-HIVE-Build

> Storage based auth: allow option to enforce write checks for external tables
> 
>
> Key: HIVE-15120
> URL: https://issues.apache.org/jira/browse/HIVE-15120
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Attachments: HIVE-15120.1.patch, HIVE-15120.2.patch, 
> HIVE-15120.3.patch
>
>
> Under storage based authorization, we don't require write permissions on 
> table directory for external table create/drop.
> This is because external table contents are populated often from outside of 
> hive and are not written into from hive. So write access is not needed. Also, 
> we can't require write permissions to drop a table if we don't require them 
> for creation (users who created them should be able to drop them).
> However, this difference in behavior of external tables is not well 
> documented. So users get surprised to learn that drop table can be done by 
> just any user who has read access to the directory. At that point changing 
> the large number of scripts that use external tables is hard. 
> It would be good to have a user config option to have external tables to be 
> treated same as managed tables.
> The option should be off by default, so that the behavior is backward 
> compatible by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14825) Figure out the minimum set of required jars for Hive on Spark after bumping up to Spark 2.0.0

2016-11-07 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-14825:
--
Fix Version/s: 2.2.0

> Figure out the minimum set of required jars for Hive on Spark after bumping 
> up to Spark 2.0.0
> -
>
> Key: HIVE-14825
> URL: https://issues.apache.org/jira/browse/HIVE-14825
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Ferdinand Xu
>Assignee: Rui Li
> Fix For: 2.2.0
>
>
> Considering that there's no assembly jar for Spark since 2.0.0, we should 
> figure out the minimum set of required jars for HoS to work after bumping up 
> to Spark 2.0.0. By this way, users can decide whether they want to add just 
> the required jars, or all the jars under spark's dir for convenience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-14825) Figure out the minimum set of required jars for Hive on Spark after bumping up to Spark 2.0.0

2016-11-07 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li resolved HIVE-14825.
---
Resolution: Resolved
  Assignee: Rui Li

Updated the wiki.

> Figure out the minimum set of required jars for Hive on Spark after bumping 
> up to Spark 2.0.0
> -
>
> Key: HIVE-14825
> URL: https://issues.apache.org/jira/browse/HIVE-14825
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Ferdinand Xu
>Assignee: Rui Li
>
> Considering that there's no assembly jar for Spark since 2.0.0, we should 
> figure out the minimum set of required jars for HoS to work after bumping up 
> to Spark 2.0.0. By this way, users can decide whether they want to add just 
> the required jars, or all the jars under spark's dir for convenience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14825) Figure out the minimum set of required jars for Hive on Spark after bumping up to Spark 2.0.0

2016-11-07 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-14825:
--
Component/s: Documentation

> Figure out the minimum set of required jars for Hive on Spark after bumping 
> up to Spark 2.0.0
> -
>
> Key: HIVE-14825
> URL: https://issues.apache.org/jira/browse/HIVE-14825
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Ferdinand Xu
>
> Considering that there's no assembly jar for Spark since 2.0.0, we should 
> figure out the minimum set of required jars for HoS to work after bumping up 
> to Spark 2.0.0. By this way, users can decide whether they want to add just 
> the required jars, or all the jars under spark's dir for convenience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14825) Figure out the minimum set of required jars for Hive on Spark after bumping up to Spark 2.0.0

2016-11-07 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-14825:
--
Issue Type: Task  (was: Bug)

> Figure out the minimum set of required jars for Hive on Spark after bumping 
> up to Spark 2.0.0
> -
>
> Key: HIVE-14825
> URL: https://issues.apache.org/jira/browse/HIVE-14825
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Ferdinand Xu
>
> Considering that there's no assembly jar for Spark since 2.0.0, we should 
> figure out the minimum set of required jars for HoS to work after bumping up 
> to Spark 2.0.0. By this way, users can decide whether they want to add just 
> the required jars, or all the jars under spark's dir for convenience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15057) Support other types of operators (other than SELECT)

2016-11-07 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15057:

Attachment: (was: HIVE-15057.wip.patch)

> Support other types of operators (other than SELECT)
> 
>
> Key: HIVE-15057
> URL: https://issues.apache.org/jira/browse/HIVE-15057
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Physical Optimizer
>Reporter: Chao Sun
>Assignee: Chao Sun
>
> Currently only SELECT operators are supported for nested column pruning. We 
> should add support for other types of operators so the optimization can work 
> for complex queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15057) Support other types of operators (other than SELECT)

2016-11-07 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15057:

Attachment: HIVE-15057.wip.patch

> Support other types of operators (other than SELECT)
> 
>
> Key: HIVE-15057
> URL: https://issues.apache.org/jira/browse/HIVE-15057
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Physical Optimizer
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15057.wip.patch
>
>
> Currently only SELECT operators are supported for nested column pruning. We 
> should add support for other types of operators so the optimization can work 
> for complex queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15119) Support standard syntax for ROLLUP & CUBE

2016-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646825#comment-15646825
 ] 

Hive QA commented on HIVE-15119:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12837912/HIVE-15119.4.patch

{color:green}SUCCESS:{color} +1 due to 16 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 10628 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_groupby] 
(batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_annotate_stats_groupby]
 (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_cube_multi_gby] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_grouping_id3] 
(batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_grouping_sets1] 
(batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_grouping_sets6] 
(batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_grouping_sets_limit]
 (batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_grouping_window] 
(batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[infer_bucket_sort_grouping_operators]
 (batchId=49)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_bulk] 
(batchId=89)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=91)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[groupby_grouping_sets1]
 (batchId=83)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[groupby_grouping_sets2]
 (batchId=83)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[groupby_grouping_sets3]
 (batchId=83)
org.apache.hive.hcatalog.pig.TestAvroHCatStorer.testWriteDate2 (batchId=170)
org.apache.hive.hcatalog.pig.TestSequenceFileHCatStorer.testWriteTinyint 
(batchId=170)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2020/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2020/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2020/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12837912 - PreCommit-HIVE-Build

> Support standard syntax for ROLLUP & CUBE
> -
>
> Key: HIVE-15119
> URL: https://issues.apache.org/jira/browse/HIVE-15119
> Project: Hive
>  Issue Type: Task
>  Components: Parser, SQL
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-15119.03.patch, HIVE-15119.2.patch, 
> HIVE-15119.4.patch, HIVE-15119.patch
>
>
> Standard ROLLUP and CUBE syntax is GROUP BY ROLLUP (expression list)... and 
> GROUP BY CUBE (expression list) respectively. 
> Currently HIVE only allows GROUP BY  WITH ROLLUP/CUBE syntax.
>  
>  We would like HIVE to support standard ROLLUP/CUBE syntax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15139) HoS local mode fails with NumberFormatException

2016-11-07 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646692#comment-15646692
 ] 

Rui Li commented on HIVE-15139:
---

[~ctang.ma] - no problem.
[~xuefuz], I'll commit this if you have no further comments.

> HoS local mode fails with NumberFormatException
> ---
>
> Key: HIVE-15139
> URL: https://issues.apache.org/jira/browse/HIVE-15139
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-15139.1.patch
>
>
> It's because we store {{stageId_attemptNum}} in JobMetricsListener but expect 
> only {{stageId}} in LocalSparkJobStatus.
> {noformat}
> java.lang.NumberFormatException: For input string: "0_0"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.status.impl.LocalSparkJobStatus.getSparkStatistics(LocalSparkJobStatus.java:146)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:104)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15093) S3-to-S3 Renames: Files should be moved individually rather than at a directory level

2016-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646545#comment-15646545
 ] 

Hive QA commented on HIVE-15093:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12837895/HIVE-15093.9.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10628 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testTaskStatus 
(batchId=207)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2017/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2017/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2017/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12837895 - PreCommit-HIVE-Build

> S3-to-S3 Renames: Files should be moved individually rather than at a 
> directory level
> -
>
> Key: HIVE-15093
> URL: https://issues.apache.org/jira/browse/HIVE-15093
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.1.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15093.1.patch, HIVE-15093.2.patch, 
> HIVE-15093.3.patch, HIVE-15093.4.patch, HIVE-15093.5.patch, 
> HIVE-15093.6.patch, HIVE-15093.7.patch, HIVE-15093.8.patch, HIVE-15093.9.patch
>
>
> Hive's MoveTask uses the Hive.moveFile method to move data within a 
> distributed filesystem as well as blobstore filesystems.
> If the move is done within the same filesystem:
> 1: If the source path is a subdirectory of the destination path, files will 
> be moved one by one using a threapool of workers
> 2: If the source path is not a subdirectory of the destination path, a single 
> rename operation is used to move the entire directory
> The second option may not work well on blobstores such as S3. Renames are not 
> metadata operations and require copying all the data. Client connectors to 
> blobstores may not efficiently rename directories. Worst case, the connector 
> will copy each file one by one, sequentially rather than using a threadpool 
> of workers to copy the data (e.g. HADOOP-13600).
> Hive already has code to rename files using a threadpool of workers, but this 
> only occurs in case number 1.
> This JIRA aims to modify the code so that case 1 is triggered when copying 
> within a blobstore. The focus is on copies within a blobstore because 
> needToCopy will return true if the src and target filesystems are different, 
> in which case a different code path is triggered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646733#comment-15646733
 ] 

Hive QA commented on HIVE-14990:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12837903/HIVE-14990.08.patch

{color:green}SUCCESS:{color} +1 due to 17 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 749 failed/errored test(s), 9996 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_concatenate_indexed_table]
 (batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_merge] (batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_merge_2] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_merge_2_orc] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_merge_3] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_merge_stats] 
(batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_insert] 
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_load] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_8] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_9] 
(batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join32] (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_11] 
(batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_13] 
(batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_14] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_15] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_1] 
(batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_3] 
(batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_4] 
(batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_5] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_7] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[binary_output_format] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark1] 
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark2] 
(batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark3] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_1] 
(batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_2] 
(batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_3] 
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_4] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_5] 
(batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_6] 
(batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_7] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_8] 
(batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin11] 
(batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin12] 
(batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin13] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin5] 
(batchId=75)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin8] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative2] 
(batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative3] 
(batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative] 
(batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_1]
 (batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_3]
 (batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_4]
 (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_5]
 (batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_8]
 (batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[concatenate_inherit_table_location]
 (batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_default_prop] 
(batchId=29)

[jira] [Commented] (HIVE-14924) MSCK REPAIR table with single threaded is throwing null pointer exception

2016-11-07 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646753#comment-15646753
 ] 

Pengcheng Xiong commented on HIVE-14924:


pushed to master. Thanks [~ashutoshc] for the review.

> MSCK REPAIR table with single threaded is throwing null pointer exception
> -
>
> Key: HIVE-14924
> URL: https://issues.apache.org/jira/browse/HIVE-14924
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.2.0
>Reporter: Ratheesh Kamoor
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14924.01.patch
>
>
> MSCK REPAIR TABLE is throwing Null Pointer Exception while running on single 
> threaded mode (hive.mv.files.thread=0)
> Error:
> 2016-10-10T22:27:13,564 ERROR [e9ce04a8-2a84-426d-8e79-a2d15b8cee09 
> main([])]: exec.DDLTask (DDLTask.java:failed(581)) - 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkPartitionDirs(HiveMetaStoreChecker.java:423)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:315)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:291)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:236)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:113)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1834)
> In order to reproduce:
> set hive.mv.files.thread=0 and run MSCK REPAIR TABLE command



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14815) Implement Parquet vectorization reader for Primitive types

2016-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1564#comment-1564
 ] 

Hive QA commented on HIVE-14815:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12837897/HIVE-14815.7.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10636 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_1] 
(batchId=90)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2018/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2018/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2018/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12837897 - PreCommit-HIVE-Build

> Implement Parquet vectorization reader for Primitive types 
> ---
>
> Key: HIVE-14815
> URL: https://issues.apache.org/jira/browse/HIVE-14815
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
> Attachments: HIVE-14815.1.patch, HIVE-14815.2.patch, 
> HIVE-14815.3.patch, HIVE-14815.4.patch, HIVE-14815.5.patch, 
> HIVE-14815.6.patch, HIVE-14815.7.patch, HIVE-14815.patch
>
>
> Parquet doesn't provide a vectorized reader which can be used by Hive 
> directly. Also for Decimal Column batch, it consists of a batch of 
> HiveDecimal which is a Hive type which is unknown for Parquet. To support 
> Hive vectorization execution engine in Hive, we have to implement the 
> vectorized Parquet reader in Hive side. To limit the performance impacts, we 
> need to implement a page level vectorized reader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-15144) JSON.org license is now CatX

2016-11-07 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-15144:
---

Assignee: Zoltan Haindrich

> JSON.org license is now CatX
> 
>
> Key: HIVE-15144
> URL: https://issues.apache.org/jira/browse/HIVE-15144
> Project: Hive
>  Issue Type: Bug
>Reporter: Robert Kanter
>Assignee: Zoltan Haindrich
>Priority: Blocker
> Fix For: 2.2.0
>
>
> per [update resolved legal|http://www.apache.org/legal/resolved.html#json]:
> {quote}
> CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE?
> No. As of 2016-11-03 this has been moved to the 'Category X' license list. 
> Prior to this, use of the JSON Java library was allowed. See Debian's page 
> for a list of alternatives.
> {quote}
> I'm not sure when this dependency was first introduced, but it looks like 
> it's currently used in a few places:
> https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14908) Upgrade ANTLR to 3.5.2

2016-11-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14908:
---
Affects Version/s: 2.1.0

> Upgrade ANTLR to 3.5.2
> --
>
> Key: HIVE-14908
> URL: https://issues.apache.org/jira/browse/HIVE-14908
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
> Attachments: HIVE-14908.01.patch, HIVE-14908.02.patch
>
>
> Antlr v4 is also available but it does not support "->" which is widely used 
> in our grammar. Antlr 3.5.2 is the latest v3 version. It will reduce the code 
> size:
> {code}
> Here is summary of current parser code size
> 422345  HiveLexer.java
> 2436601  HiveParser.java
> 814184  HiveParser_FromClauseParser.java
> 2705920  HiveParser_IdentifiersParser.java
> 777665 HiveParser_SelectClauseParser.java
>After change, it will become
> 319589 HiveLexer.java
> 1853104 HiveParser.java
> 574156 HiveParser_FromClauseParser.java
> 1799195 HiveParser_IdentifiersParser.java
> 587305 HiveParser_SelectClauseParser.java
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14908) Upgrade ANTLR to 3.5.2

2016-11-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14908:
---
Fix Version/s: 2.2.0

> Upgrade ANTLR to 3.5.2
> --
>
> Key: HIVE-14908
> URL: https://issues.apache.org/jira/browse/HIVE-14908
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
> Attachments: HIVE-14908.01.patch, HIVE-14908.02.patch
>
>
> Antlr v4 is also available but it does not support "->" which is widely used 
> in our grammar. Antlr 3.5.2 is the latest v3 version. It will reduce the code 
> size:
> {code}
> Here is summary of current parser code size
> 422345  HiveLexer.java
> 2436601  HiveParser.java
> 814184  HiveParser_FromClauseParser.java
> 2705920  HiveParser_IdentifiersParser.java
> 777665 HiveParser_SelectClauseParser.java
>After change, it will become
> 319589 HiveLexer.java
> 1853104 HiveParser.java
> 574156 HiveParser_FromClauseParser.java
> 1799195 HiveParser_IdentifiersParser.java
> 587305 HiveParser_SelectClauseParser.java
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14908) Upgrade ANTLR to 3.5.2

2016-11-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14908:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Some of the test case failures are unrelated. Pushed to master. Thanks 
[~ashutoshc] for the review! 

> Upgrade ANTLR to 3.5.2
> --
>
> Key: HIVE-14908
> URL: https://issues.apache.org/jira/browse/HIVE-14908
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14908.01.patch, HIVE-14908.02.patch
>
>
> Antlr v4 is also available but it does not support "->" which is widely used 
> in our grammar. Antlr 3.5.2 is the latest v3 version. It will reduce the code 
> size:
> {code}
> Here is summary of current parser code size
> 422345  HiveLexer.java
> 2436601  HiveParser.java
> 814184  HiveParser_FromClauseParser.java
> 2705920  HiveParser_IdentifiersParser.java
> 777665 HiveParser_SelectClauseParser.java
>After change, it will become
> 319589 HiveLexer.java
> 1853104 HiveParser.java
> 574156 HiveParser_FromClauseParser.java
> 1799195 HiveParser_IdentifiersParser.java
> 587305 HiveParser_SelectClauseParser.java
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14924) MSCK REPAIR table with single threaded is throwing null pointer exception

2016-11-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14924:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> MSCK REPAIR table with single threaded is throwing null pointer exception
> -
>
> Key: HIVE-14924
> URL: https://issues.apache.org/jira/browse/HIVE-14924
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Ratheesh Kamoor
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14924.01.patch
>
>
> MSCK REPAIR TABLE is throwing Null Pointer Exception while running on single 
> threaded mode (hive.mv.files.thread=0)
> Error:
> 2016-10-10T22:27:13,564 ERROR [e9ce04a8-2a84-426d-8e79-a2d15b8cee09 
> main([])]: exec.DDLTask (DDLTask.java:failed(581)) - 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkPartitionDirs(HiveMetaStoreChecker.java:423)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:315)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:291)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:236)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:113)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1834)
> In order to reproduce:
> set hive.mv.files.thread=0 and run MSCK REPAIR TABLE command



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14924) MSCK REPAIR table with single threaded is throwing null pointer exception

2016-11-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14924:
---
Affects Version/s: 2.1.0

> MSCK REPAIR table with single threaded is throwing null pointer exception
> -
>
> Key: HIVE-14924
> URL: https://issues.apache.org/jira/browse/HIVE-14924
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Ratheesh Kamoor
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
> Attachments: HIVE-14924.01.patch
>
>
> MSCK REPAIR TABLE is throwing Null Pointer Exception while running on single 
> threaded mode (hive.mv.files.thread=0)
> Error:
> 2016-10-10T22:27:13,564 ERROR [e9ce04a8-2a84-426d-8e79-a2d15b8cee09 
> main([])]: exec.DDLTask (DDLTask.java:failed(581)) - 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkPartitionDirs(HiveMetaStoreChecker.java:423)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:315)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:291)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:236)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:113)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1834)
> In order to reproduce:
> set hive.mv.files.thread=0 and run MSCK REPAIR TABLE command



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14924) MSCK REPAIR table with single threaded is throwing null pointer exception

2016-11-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14924:
---
Fix Version/s: 2.2.0

> MSCK REPAIR table with single threaded is throwing null pointer exception
> -
>
> Key: HIVE-14924
> URL: https://issues.apache.org/jira/browse/HIVE-14924
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Ratheesh Kamoor
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0
>
> Attachments: HIVE-14924.01.patch
>
>
> MSCK REPAIR TABLE is throwing Null Pointer Exception while running on single 
> threaded mode (hive.mv.files.thread=0)
> Error:
> 2016-10-10T22:27:13,564 ERROR [e9ce04a8-2a84-426d-8e79-a2d15b8cee09 
> main([])]: exec.DDLTask (DDLTask.java:failed(581)) - 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkPartitionDirs(HiveMetaStoreChecker.java:423)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:315)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:291)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:236)
>   at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:113)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1834)
> In order to reproduce:
> set hive.mv.files.thread=0 and run MSCK REPAIR TABLE command



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15139) HoS local mode fails with NumberFormatException

2016-11-07 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644452#comment-15644452
 ] 

Chaoyu Tang commented on HIVE-15139:


Thanks [~lirui] for the patch, it looks good to me. I ran into the exactly same 
issue when playing with the HoS statistics, and worked it around by passing 0 
instead of Integer.parseInt(stageId) to metricsCollection.addMetrics just like 
the taskId. I thought stageId would not be used anyway when we get all metrics 
from metricsCollection (metricsCollection.getAllMetrics()) in 
getSparkStatistics. My change is like following:
{code}
@@ -143,7 +143,7 @@ public SparkStatistics getSparkStatistics() {
   List taskMetrics = jobMetric.get(stageId);
   for (TaskMetrics taskMetric : taskMetrics) {
 Metrics metrics = new Metrics(taskMetric);
-metricsCollection.addMetrics(jobId, Integer.parseInt(stageId), 0, 
metrics);
+metricsCollection.addMetrics(jobId, 0, 0, metrics);
   }
 }
 SparkJobUtils sparkJobUtils = new SparkJobUtils();
{code}
Your patch removes the stageAttemptId from jobMetrics, I wonder if it might 
still be useful for some other metrics (e.g. average times a stage attempted)? 
It is my first time to look into HoS, I wonder if my question makes sense. 
Thanks.

> HoS local mode fails with NumberFormatException
> ---
>
> Key: HIVE-15139
> URL: https://issues.apache.org/jira/browse/HIVE-15139
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-15139.1.patch
>
>
> It's because we store {{stageId_attemptNum}} in JobMetricsListener but expect 
> only {{stageId}} in LocalSparkJobStatus.
> {noformat}
> java.lang.NumberFormatException: For input string: "0_0"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.status.impl.LocalSparkJobStatus.getSparkStatistics(LocalSparkJobStatus.java:146)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:104)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15129) LLAP : Enhance cache hits for stripe metadata across queries

2016-11-07 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15129:

Attachment: HIVE-15129.3.patch

Removing the last change w.r.t to footer cache. Will create separate ticket 
after this gets in.

> LLAP : Enhance cache hits for stripe metadata across queries
> 
>
> Key: HIVE-15129
> URL: https://issues.apache.org/jira/browse/HIVE-15129
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-15129.1.patch, HIVE-15129.2.patch, 
> HIVE-15129.3.patch
>
>
> When multiple queries are run in LLAP, stripe metadata cache misses were 
> observed even though enough memory was available. 
> https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L655.
>  Even in cases when data was found in cache, it wasn't getting used as 
> {{globalnc}} changed from query to query.  Creating a superset of existing 
> indexes with {{globalInc}} would be helpful. 
> This would be lot more beneficial in cloud storage where opening and reading 
> small of data can be expensive compared to HDFS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13947) HoS print wrong number for hash table size in map join scenario

2016-11-07 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-13947:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks Xuefu for reviewing.

> HoS print wrong number for hash table size in map join scenario
> ---
>
> Key: HIVE-13947
> URL: https://issues.apache.org/jira/browse/HIVE-13947
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.2.1
>Reporter: wangwenli
>Assignee: Aihua Xu
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-13947.1.patch
>
>
> In *sparkHashTableSinkOperator*, when *flushToFile*, before close output 
> stream, it try to get the file length, and will get 0 for it,  take 
> *hashTableSinkOperator* for ref, it should get length after output stream 
> closed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10701) Escape apostrophe not work properly

2016-11-07 Thread Miklos Csanady (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Csanady reassigned HIVE-10701:
-

Assignee: (was: Miklos Csanady)

can be closed

> Escape apostrophe not work properly
> ---
>
> Key: HIVE-10701
> URL: https://issues.apache.org/jira/browse/HIVE-10701
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.12.0, 0.13.0, 0.14.0
>Reporter: Tracy Y
>Priority: Minor
>
> SELECT  'S''2'  FROM table return S2 instead of S'2
> The apostrophe suppose to be escaped by the single quote in front.  
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15100) HiveSQLException: Invalid OperationHandle: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier

2016-11-07 Thread Andy Braslavskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644462#comment-15644462
 ] 

Andy Braslavskiy commented on HIVE-15100:
-

I have the same exceptions, when doing any interactions from Hue to Hive.
This problems I have only on kerberised cluster, with Hadoop in Classic mode, 
and Hue 3.10.
But, everything works fine with Hue 3.9 / non kerberised cluster / Hadoop in 
YARN mode.

[~WillCup], can you provide any details, of your configuration?

> HiveSQLException: Invalid OperationHandle: OperationHandle 
> [opType=EXECUTE_STATEMENT, getHandleIdentifier
> -
>
> Key: HIVE-15100
> URL: https://issues.apache.org/jira/browse/HIVE-15100
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.1
>Reporter: Will Chen
>
> I am using hue + hiveserver2. 
> First I execute sql through Hue, this can be done correctly. But when I 
> export the result as csv file or excel file, an exception will appear in 
> hiveserver2.log:
> 2016-11-01 14:54:11,622 WARN  [HiveServer2-Handler-Pool: Thread-47]: 
> thrift.ThriftCLIService (ThriftCLIService.java:FetchResults(681)) - Error 
> fetching results: 
> org.apache.hive.service.cli.HiveSQLException: Invalid OperationHandle: 
> OperationHandle [opType=EXECUTE_STATEMENT, 
> getHandleIdentifier()=feb55386-72f0-445c-9774-dd8780acf442]
>   at 
> org.apache.hive.service.cli.operation.OperationManager.getOperation(OperationManager.java:154)
>   at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:456)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1557)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1542)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> I add a few session related configurations:
> hive.server2.idle.operatoin.timeout = 1 day
> hive.server2.idle.session.timeout = 3 days
> hive.server2.session.check.interval = 1 hour
> And my operations log config like this:
> hive.server2.loggging.operation.enabled=true
> hive.server2.loggging.operation.log.location=/tmp/hive/operation_logs
> I try to remove all files in my operation_logs dir(/tmp/hive/operation_logs), 
> and clean all session data in hue tables(beeswax_session, django_session), 
> then restart hiveserver2 and hue. But nothing getting better..
> Begging for ur help~
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15119) Support standard syntax for ROLLUP & CUBE

2016-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644207#comment-15644207
 ] 

Hive QA commented on HIVE-15119:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12837733/HIVE-15119.03.patch

{color:green}SUCCESS:{color} +1 due to 16 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10628 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] 
(batchId=90)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=91)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_cube1] 
(batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_grouping_id2]
 (batchId=107)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_rollup1] 
(batchId=106)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2001/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2001/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2001/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12837733 - PreCommit-HIVE-Build

> Support standard syntax for ROLLUP & CUBE
> -
>
> Key: HIVE-15119
> URL: https://issues.apache.org/jira/browse/HIVE-15119
> Project: Hive
>  Issue Type: Task
>  Components: Parser, SQL
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-15119.03.patch, HIVE-15119.2.patch, HIVE-15119.patch
>
>
> Standard ROLLUP and CUBE syntax is GROUP BY ROLLUP (expression list)... and 
> GROUP BY CUBE (expression list) respectively. 
> Currently HIVE only allows GROUP BY  WITH ROLLUP/CUBE syntax.
>  
>  We would like HIVE to support standard ROLLUP/CUBE syntax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15148) disallow loading data into bucketed tables (by default?)

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15148:

Description: 
A few q file tests still use the following, allowed, pattern:
{noformat}
CREATE TABLE bucket_small (key string, value string) partitioned by (ds string) 
CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
load data local inpath '../../data/files/smallsrcsortbucket1outof4.txt' INTO 
TABLE bucket_small partition(ds='2008-04-08');
load data local inpath '../../data/files/smallsrcsortbucket2outof4.txt' INTO 
TABLE bucket_small partition(ds='2008-04-08');
{noformat}

This relies on the user to load the correct number of files with correctly 
hashed data and the correct order of file names; if there's some discrepancy in 
any of the above, the queries will fail or may produce incorrect results if 
some bucket-based optimizations kick in.
Additionally, even if the user does everything correctly, as far as I know some 
code derives bucket number from file name, which won't work in this case (as 
opposed to getting buckets based on the order of files, which will work here 
but won't work as per  HIVE-14970... sigh).

Hive enforces bucketing in other cases (the check cannot even be disabled these 
days), so I suggest that we either prohibit the above outright, or at least add 
a safety config setting that would disallow it by default.

  was:
A few q file tests still use the following, allowed, pattern:
{noformat}
CREATE TABLE bucket_small (key string, value string) partitioned by (ds string) 
CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
load data local inpath '../../data/files/smallsrcsortbucket1outof4.txt' INTO 
TABLE bucket_small partition(ds='2008-04-08');
load data local inpath '../../data/files/smallsrcsortbucket2outof4.txt' INTO 
TABLE bucket_small partition(ds='2008-04-08');
{noformat}

This relies on the user to load the correct number of files with correctly 
hashed data with correct names; if the user doesn't do that the queries will 
fail or may produce incorrect results if some bucket-based optimizations kick 
in.
Additionally, even if the user does everything correctly, as far as I know some 
code derives bucket number from file name, which won't work in this case (as 
opposed to getting buckets based on the order of files, which will work here 
but won't work as per  HIVE-14970... sigh).

Hive enforces bucketing in other cases (the check cannot even be disabled these 
days), so I suggest that we either prohibit the above outright, or at least add 
a safety config setting that would disallow it by default.


> disallow loading data into bucketed tables (by default?)
> 
>
> Key: HIVE-15148
> URL: https://issues.apache.org/jira/browse/HIVE-15148
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> A few q file tests still use the following, allowed, pattern:
> {noformat}
> CREATE TABLE bucket_small (key string, value string) partitioned by (ds 
> string) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> load data local inpath '../../data/files/smallsrcsortbucket1outof4.txt' INTO 
> TABLE bucket_small partition(ds='2008-04-08');
> load data local inpath '../../data/files/smallsrcsortbucket2outof4.txt' INTO 
> TABLE bucket_small partition(ds='2008-04-08');
> {noformat}
> This relies on the user to load the correct number of files with correctly 
> hashed data and the correct order of file names; if there's some discrepancy 
> in any of the above, the queries will fail or may produce incorrect results 
> if some bucket-based optimizations kick in.
> Additionally, even if the user does everything correctly, as far as I know 
> some code derives bucket number from file name, which won't work in this case 
> (as opposed to getting buckets based on the order of files, which will work 
> here but won't work as per  HIVE-14970... sigh).
> Hive enforces bucketing in other cases (the check cannot even be disabled 
> these days), so I suggest that we either prohibit the above outright, or at 
> least add a safety config setting that would disallow it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15148) disallow loading data into bucketed tables (by default?)

2016-11-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646162#comment-15646162
 ] 

Sergey Shelukhin commented on HIVE-15148:
-

[~ashutoshc] [~jdere] do you have any input? or do you know who would be the 
bucketing expert?

> disallow loading data into bucketed tables (by default?)
> 
>
> Key: HIVE-15148
> URL: https://issues.apache.org/jira/browse/HIVE-15148
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> A few q file tests still use the following, allowed, pattern:
> {noformat}
> CREATE TABLE bucket_small (key string, value string) partitioned by (ds 
> string) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> load data local inpath '../../data/files/smallsrcsortbucket1outof4.txt' INTO 
> TABLE bucket_small partition(ds='2008-04-08');
> load data local inpath '../../data/files/smallsrcsortbucket2outof4.txt' INTO 
> TABLE bucket_small partition(ds='2008-04-08');
> {noformat}
> This relies on the user to load the correct number of files with correctly 
> hashed data and the correct order of file names; if there's some discrepancy 
> in any of the above, the queries will fail or may produce incorrect results 
> if some bucket-based optimizations kick in.
> Additionally, even if the user does everything correctly, as far as I know 
> some code derives bucket number from file name, which won't work in this case 
> (as opposed to getting buckets based on the order of files, which will work 
> here but won't work as per  HIVE-14970... sigh).
> Hive enforces bucketing in other cases (the check cannot even be disabled 
> these days), so I suggest that we either prohibit the above outright, or at 
> least add a safety config setting that would disallow it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-15148) disallow loading data into bucketed tables (by default?)

2016-11-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646162#comment-15646162
 ] 

Sergey Shelukhin edited comment on HIVE-15148 at 11/8/16 2:00 AM:
--

[~ashutoshc] [~jdere] do you have any input? or do you know who would be the 
bucketing expert?
I can make a patch if there's consensus.


was (Author: sershe):
[~ashutoshc] [~jdere] do you have any input? or do you know who would be the 
bucketing expert?

> disallow loading data into bucketed tables (by default?)
> 
>
> Key: HIVE-15148
> URL: https://issues.apache.org/jira/browse/HIVE-15148
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> A few q file tests still use the following, allowed, pattern:
> {noformat}
> CREATE TABLE bucket_small (key string, value string) partitioned by (ds 
> string) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> load data local inpath '../../data/files/smallsrcsortbucket1outof4.txt' INTO 
> TABLE bucket_small partition(ds='2008-04-08');
> load data local inpath '../../data/files/smallsrcsortbucket2outof4.txt' INTO 
> TABLE bucket_small partition(ds='2008-04-08');
> {noformat}
> This relies on the user to load the correct number of files with correctly 
> hashed data and the correct order of file names; if there's some discrepancy 
> in any of the above, the queries will fail or may produce incorrect results 
> if some bucket-based optimizations kick in.
> Additionally, even if the user does everything correctly, as far as I know 
> some code derives bucket number from file name, which won't work in this case 
> (as opposed to getting buckets based on the order of files, which will work 
> here but won't work as per  HIVE-14970... sigh).
> Hive enforces bucketing in other cases (the check cannot even be disabled 
> these days), so I suggest that we either prohibit the above outright, or at 
> least add a safety config setting that would disallow it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15139) HoS local mode fails with NumberFormatException

2016-11-07 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646385#comment-15646385
 ] 

Rui Li commented on HIVE-15139:
---

[~xuefuz], no it's actually related to HIVE-12205. It only happens with local 
mode. So I guess it's not reported because local mode is rarely used. And I 
happened to hit it when I'm working on HIVE-14825 :)

> HoS local mode fails with NumberFormatException
> ---
>
> Key: HIVE-15139
> URL: https://issues.apache.org/jira/browse/HIVE-15139
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-15139.1.patch
>
>
> It's because we store {{stageId_attemptNum}} in JobMetricsListener but expect 
> only {{stageId}} in LocalSparkJobStatus.
> {noformat}
> java.lang.NumberFormatException: For input string: "0_0"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.status.impl.LocalSparkJobStatus.getSparkStatistics(LocalSparkJobStatus.java:146)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:104)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13966) DbNotificationListener: can loose DDL operation notifications

2016-11-07 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645839#comment-15645839
 ] 

Mohit Sabharwal commented on HIVE-13966:


[~alangates], [~sushanth], [~ctang.ma], could you please take a look at the 
latest patch ?

I updated RB at https://reviews.apache.org/r/52800/

Filed HIVE-15145 for a related issue that I saw while I was looking at 
HiveAlterTable.

> DbNotificationListener: can loose DDL operation notifications
> -
>
> Key: HIVE-13966
> URL: https://issues.apache.org/jira/browse/HIVE-13966
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Nachiket Vaidya
>Assignee: Mohit Sabharwal
>Priority: Critical
> Attachments: HIVE-13966.1.patch, HIVE-13966.2.patch, 
> HIVE-13966.3.patch, HIVE-13966.4.patch, HIVE-13966.pdf
>
>
> The code for each API in HiveMetaStore.java is like this:
> 1. openTransaction()
> 2. -- operation--
> 3. commit() or rollback() based on result of the operation.
> 4. add entry to notification log (unconditionally)
> If the operation is failed (in step 2), we still add entry to notification 
> log. Found this issue in testing.
> It is still ok as this is the case of false positive.
> If the operation is successful and adding to notification log failed, the 
> user will get an MetaException. It will not rollback the operation, as it is 
> already committed. We need to handle this case so that we will not have false 
> negatives.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14990:

Description: 
Expected failures (lack of support in MM tables for certain commands) 
1) All HCat tests
2) Almost all merge tests (alter .. concat is not supported).
3) Tests that run dfs commands with specific paths.
4) Truncate column
5) Describe formatted will have the new fields in the output before merging 
with ACID
6) Many tests w/explain extended - diff in partition "base file name"
7) TestTxnCommands - all the conversion tests, as they check for bucket count 
using file status
8) HBase metastore tests cause methods are not implemented

  was:
Expected failures (lack of support in MM tables for certain commands) 
1) All HCat tests
2) Almost all merge tests (alter .. concat is not supported).
3) Tests that run dfs commands with specific paths.
4) Truncate column
5) Describe formatted will have the new fields in the output before merging 
with ACID
6) Many tests w/explain extended - diff in partition "base file name"
7) TestTxnCommands - all the conversion tests, as they check for bucket count 
using file status


> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.patch
>
>
> Expected failures (lack of support in MM tables for certain commands) 
> 1) All HCat tests
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths.
> 4) Truncate column
> 5) Describe formatted will have the new fields in the output before merging 
> with ACID
> 6) Many tests w/explain extended - diff in partition "base file name"
> 7) TestTxnCommands - all the conversion tests, as they check for bucket count 
> using file status
> 8) HBase metastore tests cause methods are not implemented



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14990:

Description: 
Expected failures 
1) All HCat tests (cannot write MM tables via the HCat writer)
2) Almost all merge tests (alter .. concat is not supported).
3) Tests that run dfs commands with specific paths.
4) Truncate column (not supported)
5) Describe formatted will have the new fields in the output before merging 
with ACID
6) Many tests w/explain extended - diff in partition "base file name"
7) TestTxnCommands - all the conversion tests, as they check for bucket count 
using file status
8) HBase metastore tests cause methods are not implemented

  was:
Expected failures (lack of support in MM tables for certain commands) 
1) All HCat tests
2) Almost all merge tests (alter .. concat is not supported).
3) Tests that run dfs commands with specific paths.
4) Truncate column
5) Describe formatted will have the new fields in the output before merging 
with ACID
6) Many tests w/explain extended - diff in partition "base file name"
7) TestTxnCommands - all the conversion tests, as they check for bucket count 
using file status
8) HBase metastore tests cause methods are not implemented


> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.patch
>
>
> Expected failures 
> 1) All HCat tests (cannot write MM tables via the HCat writer)
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths.
> 4) Truncate column (not supported)
> 5) Describe formatted will have the new fields in the output before merging 
> with ACID
> 6) Many tests w/explain extended - diff in partition "base file name"
> 7) TestTxnCommands - all the conversion tests, as they check for bucket count 
> using file status
> 8) HBase metastore tests cause methods are not implemented



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14943) Base Implementation

2016-11-07 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646013#comment-15646013
 ] 

Eugene Koifman commented on HIVE-14943:
---

yes, there is a doc jira HIVE-15132

> Base Implementation
> ---
>
> Key: HIVE-14943
> URL: https://issues.apache.org/jira/browse/HIVE-14943
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-14943.2.patch, HIVE-14943.3.patch, 
> HIVE-14943.4.patch, HIVE-14943.5.patch, HIVE-14943.6.patch, 
> HIVE-14943.7.patch, HIVE-14943.8.patch, HIVE-14943.9.patch, HIVE-14943.patch
>
>
> Create the 1st pass functional implementation of MERGE
> This should run e2e and produce correct results.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15147) LLAP: use LLAP cache for non-columnar formats in a somewhat general way

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15147:

Description: 
The primary goal for the first pass is caching text formats. Nothing would 
prevent other formats from using the same path, in principle, although, as was 
originally done with ORC, it may be better to have native caching support 
optimized for each particular format.
Given that caching pure text is not smart, and we already have ORC-encoded 
cache that is columnar due to ORC file structure, we will transform data into 
columnar ORC.
The general idea is to treat all the data in the world as merely ORC that was 
compressed with some poor compression codec, such as csv. Using the original IF 
and serde, as well as an ORC writer (with some heavyweight optimizations 
disabled, potentially), we can "uncompress" the csv/whatever data into its 
"original" ORC representation, then cache it efficiently, by column, and also 
reuse a lot of the existing code.

Various other points:
1) Caching granularity will have to be somehow determined (i.e. how do we slice 
the file horizontally, to avoid caching entire columns). As with ORC 
uncompressed files, the specific offsets don't really matter as long as they 
are consistent between reads. The problem is that the file offsets will 
actually need to be propagated to the new reader from the original inputformat. 
Row counts are easier to use but there's a problem of how to actually map them 
to missing ranges to read from disk.
2) Obviously, for row-based formats, if any one column that is to be read has 
been evicted or is otherwise missing, "all the columns" have to be read for the 
corresponding slice to cache and read that one column. The vague plan is to 
handle this implicitly, similarly to how ORC reader handles CB-RG overlaps - it 
will just so happen that a missing column in disk range list to retrieve will 
expand the disk-range-to-read into the whole horizontal slice of the file.
3) Granularity/etc. won't work for gzipped text. If anything at all is evicted, 
the entire file has to be re-read. Gzipped text is a ridiculous feature, so 
this is by design.
4) In future, it would be possible to also build some form or metadata/indexes 
for this cached data to do PPD, etc. This is out of the scope of this stage.

  was:
The primary goal for the first pass is caching text formats. Nothing would 
prevent other formats from using the same path, in principle, although, as was 
originally done with ORC, it may be better to have native caching support 
optimized for each particular format.
Given that caching pure text is not smart, and we already have ORC-encoded 
cache that is columnar due to ORC file structure, we will transform data into 
columnar ORC.
The general idea is to treat all the data in the world as merely ORC that was 
compressed with some poor compression codec, such as csv. Using the original IF 
and serde, as well as ORC writer (with some heavyweight optimizations removed, 
potentially), we can "uncompress" the data into "original" ORC, then reuse a 
lot of the existing code.
Various other points:
1) Granularity in the file will have to be somehow determined (horizontal 
slicing of the file, to avoid caching entire columns). We can base it on 
arbitrary disk offsets determined during reading, but they will actually have 
to be propagated to the reader from the original inputformat. Row counts are 
easier to use but there's a problem of how to actually map them to missing 
ranges to read from disk.
2) Obviously for row-based formats, if any one column one needs is evicted, 
"all the columns" have to be read for the corresponding slice. The vague plan 
is to handle this implicitly, similarly to how ORC reader handles CB-RG 
overlaps - it will just so happen that a missing column will expand the 
disk-range-to-read into the whole horizontal slice of the file.
3) Granularity/etc. won't work for gzipped text. If anything at all is evicted, 
the entire file has to be re-read. Gzipped text is a ridiculous feature, so 
this is by design.
4) In future, it would be possible to also build some form or metadata/indexes 
for this cached data to do PPD, etc. This is out of the scope of this stage.


> LLAP: use LLAP cache for non-columnar formats in a somewhat general way
> ---
>
> Key: HIVE-15147
> URL: https://issues.apache.org/jira/browse/HIVE-15147
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> The primary goal for the first pass is caching text formats. Nothing would 
> prevent other formats from using the same path, in principle, although, as 
> was originally done with ORC, it may be better to have native caching support 
> optimized for each particular 

[jira] [Commented] (HIVE-15147) LLAP: use LLAP cache for non-columnar formats in a somewhat general way

2016-11-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646095#comment-15646095
 ] 

Sergey Shelukhin commented on HIVE-15147:
-

[~gopalv] [~cartershanklin] fyi

> LLAP: use LLAP cache for non-columnar formats in a somewhat general way
> ---
>
> Key: HIVE-15147
> URL: https://issues.apache.org/jira/browse/HIVE-15147
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> The primary goal for the first pass is caching text formats. Nothing would 
> prevent other formats from using the same path, in principle, although, as 
> was originally done with ORC, it may be better to have native caching support 
> optimized for each particular format.
> Given that caching pure text is not smart, and we already have ORC-encoded 
> cache that is columnar due to ORC file structure, we will transform data into 
> columnar ORC.
> The general idea is to treat all the data in the world as merely ORC that was 
> compressed with some poor compression codec, such as csv. Using the original 
> IF and serde, as well as an ORC writer (with some heavyweight optimizations 
> disabled, potentially), we can "uncompress" the csv/whatever data into its 
> "original" ORC representation, then cache it efficiently, by column, and also 
> reuse a lot of the existing code.
> Various other points:
> 1) Caching granularity will have to be somehow determined (i.e. how do we 
> slice the file horizontally, to avoid caching entire columns). As with ORC 
> uncompressed files, the specific offsets don't really matter as long as they 
> are consistent between reads. The problem is that the file offsets will 
> actually need to be propagated to the new reader from the original 
> inputformat. Row counts are easier to use but there's a problem of how to 
> actually map them to missing ranges to read from disk.
> 2) Obviously, for row-based formats, if any one column that is to be read has 
> been evicted or is otherwise missing, "all the columns" have to be read for 
> the corresponding slice to cache and read that one column. The vague plan is 
> to handle this implicitly, similarly to how ORC reader handles CB-RG overlaps 
> - it will just so happen that a missing column in disk range list to retrieve 
> will expand the disk-range-to-read into the whole horizontal slice of the 
> file.
> 3) Granularity/etc. won't work for gzipped text. If anything at all is 
> evicted, the entire file has to be re-read. Gzipped text is a ridiculous 
> feature, so this is by design.
> 4) In future, it would be possible to also build some form or 
> metadata/indexes for this cached data to do PPD, etc. This is out of the 
> scope for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15147) LLAP: use LLAP cache for non-columnar formats in a somewhat general way

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15147:

Description: 
The primary goal for the first pass is caching text formats. Nothing would 
prevent other formats from using the same path, in principle, although, as was 
originally done with ORC, it may be better to have native caching support 
optimized for each particular format.
Given that caching pure text is not smart, and we already have ORC-encoded 
cache that is columnar due to ORC file structure, we will transform data into 
columnar ORC.
The general idea is to treat all the data in the world as merely ORC that was 
compressed with some poor compression codec, such as csv. Using the original IF 
and serde, as well as an ORC writer (with some heavyweight optimizations 
disabled, potentially), we can "uncompress" the csv/whatever data into its 
"original" ORC representation, then cache it efficiently, by column, and also 
reuse a lot of the existing code.

Various other points:
1) Caching granularity will have to be somehow determined (i.e. how do we slice 
the file horizontally, to avoid caching entire columns). As with ORC 
uncompressed files, the specific offsets don't really matter as long as they 
are consistent between reads. The problem is that the file offsets will 
actually need to be propagated to the new reader from the original inputformat. 
Row counts are easier to use but there's a problem of how to actually map them 
to missing ranges to read from disk.
2) Obviously, for row-based formats, if any one column that is to be read has 
been evicted or is otherwise missing, "all the columns" have to be read for the 
corresponding slice to cache and read that one column. The vague plan is to 
handle this implicitly, similarly to how ORC reader handles CB-RG overlaps - it 
will just so happen that a missing column in disk range list to retrieve will 
expand the disk-range-to-read into the whole horizontal slice of the file.
3) Granularity/etc. won't work for gzipped text. If anything at all is evicted, 
the entire file has to be re-read. Gzipped text is a ridiculous feature, so 
this is by design.
4) In future, it would be possible to also build some form or metadata/indexes 
for this cached data to do PPD, etc. This is out of the scope for now.

  was:
The primary goal for the first pass is caching text formats. Nothing would 
prevent other formats from using the same path, in principle, although, as was 
originally done with ORC, it may be better to have native caching support 
optimized for each particular format.
Given that caching pure text is not smart, and we already have ORC-encoded 
cache that is columnar due to ORC file structure, we will transform data into 
columnar ORC.
The general idea is to treat all the data in the world as merely ORC that was 
compressed with some poor compression codec, such as csv. Using the original IF 
and serde, as well as an ORC writer (with some heavyweight optimizations 
disabled, potentially), we can "uncompress" the csv/whatever data into its 
"original" ORC representation, then cache it efficiently, by column, and also 
reuse a lot of the existing code.

Various other points:
1) Caching granularity will have to be somehow determined (i.e. how do we slice 
the file horizontally, to avoid caching entire columns). As with ORC 
uncompressed files, the specific offsets don't really matter as long as they 
are consistent between reads. The problem is that the file offsets will 
actually need to be propagated to the new reader from the original inputformat. 
Row counts are easier to use but there's a problem of how to actually map them 
to missing ranges to read from disk.
2) Obviously, for row-based formats, if any one column that is to be read has 
been evicted or is otherwise missing, "all the columns" have to be read for the 
corresponding slice to cache and read that one column. The vague plan is to 
handle this implicitly, similarly to how ORC reader handles CB-RG overlaps - it 
will just so happen that a missing column in disk range list to retrieve will 
expand the disk-range-to-read into the whole horizontal slice of the file.
3) Granularity/etc. won't work for gzipped text. If anything at all is evicted, 
the entire file has to be re-read. Gzipped text is a ridiculous feature, so 
this is by design.
4) In future, it would be possible to also build some form or metadata/indexes 
for this cached data to do PPD, etc. This is out of the scope of this stage.


> LLAP: use LLAP cache for non-columnar formats in a somewhat general way
> ---
>
> Key: HIVE-15147
> URL: https://issues.apache.org/jira/browse/HIVE-15147
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> 

[jira] [Updated] (HIVE-15093) S3-to-S3 Renames: Files should be moved individually rather than at a directory level

2016-11-07 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-15093:

Attachment: HIVE-15093.9.patch

> S3-to-S3 Renames: Files should be moved individually rather than at a 
> directory level
> -
>
> Key: HIVE-15093
> URL: https://issues.apache.org/jira/browse/HIVE-15093
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.1.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15093.1.patch, HIVE-15093.2.patch, 
> HIVE-15093.3.patch, HIVE-15093.4.patch, HIVE-15093.5.patch, 
> HIVE-15093.6.patch, HIVE-15093.7.patch, HIVE-15093.8.patch, HIVE-15093.9.patch
>
>
> Hive's MoveTask uses the Hive.moveFile method to move data within a 
> distributed filesystem as well as blobstore filesystems.
> If the move is done within the same filesystem:
> 1: If the source path is a subdirectory of the destination path, files will 
> be moved one by one using a threapool of workers
> 2: If the source path is not a subdirectory of the destination path, a single 
> rename operation is used to move the entire directory
> The second option may not work well on blobstores such as S3. Renames are not 
> metadata operations and require copying all the data. Client connectors to 
> blobstores may not efficiently rename directories. Worst case, the connector 
> will copy each file one by one, sequentially rather than using a threadpool 
> of workers to copy the data (e.g. HADOOP-13600).
> Hive already has code to rename files using a threadpool of workers, but this 
> only occurs in case number 1.
> This JIRA aims to modify the code so that case 1 is triggered when copying 
> within a blobstore. The focus is on copies within a blobstore because 
> needToCopy will return true if the src and target filesystems are different, 
> in which case a different code path is triggered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14919) Improve the performance of Hive on Spark 2.0.0

2016-11-07 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-14919:

Description: 
In HIVE-14029, we have updated Spark dependency to 2.0.0. We use Intel 
BigBench[1] to run benchmark with Spark 2.0 over 1 TB data set comparing with 
Spark 1.6. We can see performance improvments about 5.4% in general and 45% for 
the best case. However, some queries doesn't have significant performance 
improvements.  This JIRA is the umbrella ticket addressing those performance 
issues.

[1] https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench

  was:
In HIVE-14029, we have updated Spark dependency to 2.0.0. We use Intel 
BigBench[1] to run benchmark with Spark 2.0 over 10 GB data set comparing with 
Spark 1.6. We can see quite some performance degradation for most of the 
queries for BigBench. For detailed information, please see the attached file 
for detailed information. This JIRA is the umbrella ticket addressing those 
performance issues.

[1] https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench


> Improve the performance of Hive on Spark 2.0.0
> --
>
> Key: HIVE-14919
> URL: https://issues.apache.org/jira/browse/HIVE-14919
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
>
> In HIVE-14029, we have updated Spark dependency to 2.0.0. We use Intel 
> BigBench[1] to run benchmark with Spark 2.0 over 1 TB data set comparing with 
> Spark 1.6. We can see performance improvments about 5.4% in general and 45% 
> for the best case. However, some queries doesn't have significant performance 
> improvements.  This JIRA is the umbrella ticket addressing those performance 
> issues.
> [1] https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14990:

Attachment: HIVE-14990.08.patch

Some more test-specific and non-test-specific issues, adding missing out files. 
Idenitified many more "by design" failures and diffs with the bogus isMm 
function. Less than half of last failed batch are actually relevant.

> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, 
> HIVE-14990.patch
>
>
> Expected failures 
> 1) All HCat tests (cannot write MM tables via the HCat writer)
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths (path changes).
> 4) Truncate column (not supported).
> 5) Describe formatted will have the new table fields in the output (before 
> merging MM with ACID).
> 6) Many tests w/explain extended - diff in partition "base file name" (path 
> changes).
> 7) TestTxnCommands - all the conversion tests, as they check for bucket count 
> using file lists (path changes).
> 8) HBase metastore tests cause methods are not implemented.
> 9) Some load and ExIm tests that export a table and then rely on specific 
> path for load (path changes).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14990:

Description: 
Expected failures 
1) All HCat tests (cannot write MM tables via the HCat writer)
2) Almost all merge tests (alter .. concat is not supported).
3) Tests that run dfs commands with specific paths (path changes).
4) Truncate column (not supported).
5) Describe formatted will have the new table fields in the output (before 
merging MM with ACID).
6) Many tests w/explain extended - diff in partition "base file name" (path 
changes).
7) TestTxnCommands - all the conversion tests, as they check for bucket count 
using file lists (path changes).
8) HBase metastore tests cause methods are not implemented.
9) Some load and ExIm tests that export a table and then rely on specific path 
for load (path changes).
10) Bucket map join/etc. - disabled the optimization for MM tables due to how 
it accounts for buckets

  was:
Expected failures 
1) All HCat tests (cannot write MM tables via the HCat writer)
2) Almost all merge tests (alter .. concat is not supported).
3) Tests that run dfs commands with specific paths (path changes).
4) Truncate column (not supported).
5) Describe formatted will have the new table fields in the output (before 
merging MM with ACID).
6) Many tests w/explain extended - diff in partition "base file name" (path 
changes).
7) TestTxnCommands - all the conversion tests, as they check for bucket count 
using file lists (path changes).
8) HBase metastore tests cause methods are not implemented.
9) Some load and ExIm tests that export a table and then rely on specific path 
for load (path changes).


> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, 
> HIVE-14990.patch
>
>
> Expected failures 
> 1) All HCat tests (cannot write MM tables via the HCat writer)
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths (path changes).
> 4) Truncate column (not supported).
> 5) Describe formatted will have the new table fields in the output (before 
> merging MM with ACID).
> 6) Many tests w/explain extended - diff in partition "base file name" (path 
> changes).
> 7) TestTxnCommands - all the conversion tests, as they check for bucket count 
> using file lists (path changes).
> 8) HBase metastore tests cause methods are not implemented.
> 9) Some load and ExIm tests that export a table and then rely on specific 
> path for load (path changes).
> 10) Bucket map join/etc. - disabled the optimization for MM tables due to how 
> it accounts for buckets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15120) Storage based auth: allow option to enforce write checks for external tables

2016-11-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645892#comment-15645892
 ] 

Hive QA commented on HIVE-15120:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12837842/HIVE-15120.2.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10628 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[join_acid_non_acid]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=91)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_5] 
(batchId=90)
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationProviderWithACL.testSimplePrivileges
 (batchId=216)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2011/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2011/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2011/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12837842 - PreCommit-HIVE-Build

> Storage based auth: allow option to enforce write checks for external tables
> 
>
> Key: HIVE-15120
> URL: https://issues.apache.org/jira/browse/HIVE-15120
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
> Attachments: HIVE-15120.1.patch, HIVE-15120.2.patch
>
>
> Under storage based authorization, we don't require write permissions on 
> table directory for external table create/drop.
> This is because external table contents are populated often from outside of 
> hive and are not written into from hive. So write access is not needed. Also, 
> we can't require write permissions to drop a table if we don't require them 
> for creation (users who created them should be able to drop them).
> However, this difference in behavior of external tables is not well 
> documented. So users get surprised to learn that drop table can be done by 
> just any user who has read access to the directory. At that point changing 
> the large number of scripts that use external tables is hard. 
> It would be good to have a user config option to have external tables to be 
> treated same as managed tables.
> The option should be off by default, so that the behavior is backward 
> compatible by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15147) LLAP: use LLAP cache for non-columnar formats in a somewhat general way

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15147:

Summary: LLAP: use LLAP cache for non-columnar formats in a somewhat 
general way  (was: LLAP: support cache for non-columnar formats in a somewhat 
general way)

> LLAP: use LLAP cache for non-columnar formats in a somewhat general way
> ---
>
> Key: HIVE-15147
> URL: https://issues.apache.org/jira/browse/HIVE-15147
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> The primary target for the first pass is caching text formats. Nothing would 
> prevent other formats from using the same path, in principle, although, as 
> was originally done with ORC, it may be better to have native caching support 
> optimized for each particular format.
> Given that caching pure text is not smart, and we already have ORC-encoded 
> cache that is columnar due to ORC file structure, we will try to reuse that. 
> The general idea is to treat all the data in the world as merely ORC that was 
> compressed with some poor compression codec, such as csv. Using the original 
> IF and serde, as well as ORC writer (with some heavyweight optimizations 
> removed, potentially), we can "uncompress" the data into "original" ORC, then 
> reuse a lot of the existing code.
> Various other points:
> 1) Granularity in the file will have to be somehow determined (horizontal 
> slicing of the file, to avoid caching entire columns). We can base it on 
> arbitrary disk offsets determined during reading, but they will actually have 
> to be propagated to the reader from the original inputformat. Row counts are 
> easier to use but there's a problem of how to actually map them to missing 
> ranges to read from disk.
> 2) Obviously for row-based formats, if any one column one needs is evicted, 
> "all the columns" have to be read for the corresponding slice. The vague plan 
> is to handle this implicitly, similarly to how ORC reader handles CB-RG 
> overlaps - it will just so happen that a missing column will expand the 
> disk-range-to-read into the whole horizontal slice of the file.
> 3) Granularity/etc. won't work for gzipped text. If anything at all is 
> evicted, the entire file has to be re-read. Gzipped text is a ridiculous 
> feature, so this is by design.
> 4) In future, it would be possible to also build some form or 
> metadata/indexes for this cached data to do PPD, etc. This is out of the 
> scope of this stage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15147) LLAP: use LLAP cache for non-columnar formats in a somewhat general way

2016-11-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15147:

Description: 
The primary goal for the first pass is caching text formats. Nothing would 
prevent other formats from using the same path, in principle, although, as was 
originally done with ORC, it may be better to have native caching support 
optimized for each particular format.
Given that caching pure text is not smart, and we already have ORC-encoded 
cache that is columnar due to ORC file structure, we will try to reuse that. 
The general idea is to treat all the data in the world as merely ORC that was 
compressed with some poor compression codec, such as csv. Using the original IF 
and serde, as well as ORC writer (with some heavyweight optimizations removed, 
potentially), we can "uncompress" the data into "original" ORC, then reuse a 
lot of the existing code.
Various other points:
1) Granularity in the file will have to be somehow determined (horizontal 
slicing of the file, to avoid caching entire columns). We can base it on 
arbitrary disk offsets determined during reading, but they will actually have 
to be propagated to the reader from the original inputformat. Row counts are 
easier to use but there's a problem of how to actually map them to missing 
ranges to read from disk.
2) Obviously for row-based formats, if any one column one needs is evicted, 
"all the columns" have to be read for the corresponding slice. The vague plan 
is to handle this implicitly, similarly to how ORC reader handles CB-RG 
overlaps - it will just so happen that a missing column will expand the 
disk-range-to-read into the whole horizontal slice of the file.
3) Granularity/etc. won't work for gzipped text. If anything at all is evicted, 
the entire file has to be re-read. Gzipped text is a ridiculous feature, so 
this is by design.
4) In future, it would be possible to also build some form or metadata/indexes 
for this cached data to do PPD, etc. This is out of the scope of this stage.

  was:
The primary target for the first pass is caching text formats. Nothing would 
prevent other formats from using the same path, in principle, although, as was 
originally done with ORC, it may be better to have native caching support 
optimized for each particular format.
Given that caching pure text is not smart, and we already have ORC-encoded 
cache that is columnar due to ORC file structure, we will try to reuse that. 
The general idea is to treat all the data in the world as merely ORC that was 
compressed with some poor compression codec, such as csv. Using the original IF 
and serde, as well as ORC writer (with some heavyweight optimizations removed, 
potentially), we can "uncompress" the data into "original" ORC, then reuse a 
lot of the existing code.
Various other points:
1) Granularity in the file will have to be somehow determined (horizontal 
slicing of the file, to avoid caching entire columns). We can base it on 
arbitrary disk offsets determined during reading, but they will actually have 
to be propagated to the reader from the original inputformat. Row counts are 
easier to use but there's a problem of how to actually map them to missing 
ranges to read from disk.
2) Obviously for row-based formats, if any one column one needs is evicted, 
"all the columns" have to be read for the corresponding slice. The vague plan 
is to handle this implicitly, similarly to how ORC reader handles CB-RG 
overlaps - it will just so happen that a missing column will expand the 
disk-range-to-read into the whole horizontal slice of the file.
3) Granularity/etc. won't work for gzipped text. If anything at all is evicted, 
the entire file has to be re-read. Gzipped text is a ridiculous feature, so 
this is by design.
4) In future, it would be possible to also build some form or metadata/indexes 
for this cached data to do PPD, etc. This is out of the scope of this stage.


> LLAP: use LLAP cache for non-columnar formats in a somewhat general way
> ---
>
> Key: HIVE-15147
> URL: https://issues.apache.org/jira/browse/HIVE-15147
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> The primary goal for the first pass is caching text formats. Nothing would 
> prevent other formats from using the same path, in principle, although, as 
> was originally done with ORC, it may be better to have native caching support 
> optimized for each particular format.
> Given that caching pure text is not smart, and we already have ORC-encoded 
> cache that is columnar due to ORC file structure, we will try to reuse that. 
> The general idea is to treat all the data in the world as merely ORC that was 
> compressed with some poor compression codec, such as csv. 

[jira] [Updated] (HIVE-15147) LLAP: use LLAP cache for non-columnar formats in a somewhat general way

2016-11-07 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-15147:
---
Issue Type: New Feature  (was: Bug)

> LLAP: use LLAP cache for non-columnar formats in a somewhat general way
> ---
>
> Key: HIVE-15147
> URL: https://issues.apache.org/jira/browse/HIVE-15147
> Project: Hive
>  Issue Type: New Feature
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> The primary goal for the first pass is caching text files. Nothing would 
> prevent other formats from using the same path, in principle, although, as 
> was originally done with ORC, it may be better to have native caching support 
> optimized for each particular format.
> Given that caching pure text is not smart, and we already have ORC-encoded 
> cache that is columnar due to ORC file structure, we will transform data into 
> columnar ORC.
> The general idea is to treat all the data in the world as merely ORC that was 
> compressed with some poor compression codec, such as csv. Using the original 
> IF and serde, as well as an ORC writer (with some heavyweight optimizations 
> disabled, potentially), we can "uncompress" the csv/whatever data into its 
> "original" ORC representation, then cache it efficiently, by column, and also 
> reuse a lot of the existing code.
> Various other points:
> 1) Caching granularity will have to be somehow determined (i.e. how do we 
> slice the file horizontally, to avoid caching entire columns). As with ORC 
> uncompressed files, the specific offsets don't really matter as long as they 
> are consistent between reads. The problem is that the file offsets will 
> actually need to be propagated to the new reader from the original 
> inputformat. Row counts are easier to use but there's a problem of how to 
> actually map them to missing ranges to read from disk.
> 2) Obviously, for row-based formats, if any one column that is to be read has 
> been evicted or is otherwise missing, "all the columns" have to be read for 
> the corresponding slice to cache and read that one column. The vague plan is 
> to handle this implicitly, similarly to how ORC reader handles CB-RG overlaps 
> - it will just so happen that a missing column in disk range list to retrieve 
> will expand the disk-range-to-read into the whole horizontal slice of the 
> file.
> 3) Granularity/etc. won't work for gzipped text. If anything at all is 
> evicted, the entire file has to be re-read. Gzipped text is a ridiculous 
> feature, so this is by design.
> 4) In future, it would be possible to also build some form or 
> metadata/indexes for this cached data to do PPD, etc. This is out of the 
> scope for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >