[jira] [Updated] (HIVE-12464) Inconsistent behavior between MapReduce and Spark engine on bucket mapjoin

2015-11-19 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-12464:
-
Summary: Inconsistent behavior between MapReduce and Spark engine on bucket 
mapjoin  (was: Inconsistent behavior between MapReduce and Spark engine on 
bucketed mapjoin)

> Inconsistent behavior between MapReduce and Spark engine on bucket mapjoin
> --
>
> Key: HIVE-12464
> URL: https://issues.apache.org/jira/browse/HIVE-12464
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Spark
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
>
> Steps to reproduce:
> 1,prepare the table and data
> {noformat}
> create table if not exists lxw_test(imei string,sndaid string,data_time 
> string)
> CLUSTERED BY(imei) SORTED BY(imei) INTO 10 BUCKETS;
> create table if not exists lxw_test1(imei string,sndaid string,data_time 
> string)
> CLUSTERED BY(imei) SORTED BY(imei) INTO 5 BUCKETS;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table lxw_test
> values(1,1,1),(2,2,2),(3,3,3),(4,4,4),(5,5,5),(6,6,6),(7,7,7),(8,8,8),(9,9,9),(10,10,10);
> insert overwrite table lxw_test1
> values 
> (1,1,1),(2,2,2),(3,3,3),(4,4,4),(5,5,5),(6,6,6),(7,7,7),(8,8,8),(9,9,9),(10,10,10);
> set hive.enforce.bucketing;
> insert into table lxw_test1 select * from lxw_test;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> {noformat}
> 2,the following sql will success :
> {noformat}
> set hive.execution.engine=mr;
> explain select  count(1) 
> from lxw_test1 a 
> join lxw_test b 
> on a.imei = b.imei ;
> {noformat}
> 3,this one will fail :
> {noformat}
> set hive.execution.engine=spark;
> explain select  count(1) 
> from lxw_test1 a 
> join lxw_test b 
> on a.imei = b.imei ;
> {noformat}
> On spark,the query returns this error:
> {noformat}
> Error: Error while compiling statement: FAILED: SemanticException [Error 
> 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
> bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number 
> of buckets for table lxw_test1 is 5, whereas the number of files is 10 
> (state=42000,code=10141)
> {noformat}
> After set hive.ignore.mapjoin.hint=false and use mapjoin hint,the MapReduce 
> engine return the same error.
> {noformat}
> set hive.execution.engine=mr;
> set hive.ignore.mapjoin.hint=false;
> explain
> select /*+ mapjoin(b) */ count(1) 
> from lxw_test1 a 
> join lxw_test b 
> on a.imei = b.imei ;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11981) ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)

2015-11-19 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013322#comment-15013322
 ] 

Matt McCline commented on HIVE-11981:
-

The new parameter  hive.exec.schema.evolution is intended to be general and not 
be ORC only.  Other file formats that contain schema metadata (e.g. parquet) 
could add the marker interface SelfDescribingInputFormatInterface and provide 
schema evolution functionality.

> ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)
> --
>
> Key: HIVE-11981
> URL: https://issues.apache.org/jira/browse/HIVE-11981
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11981.01.patch, HIVE-11981.02.patch, 
> HIVE-11981.03.patch, HIVE-11981.05.patch, HIVE-11981.06.patch, 
> HIVE-11981.07.patch, HIVE-11981.08.patch, HIVE-11981.09.patch, 
> HIVE-11981.091.patch, HIVE-11981.092.patch, HIVE-11981.093.patch, 
> HIVE-11981.094.patch, HIVE-11981.095.patch, HIVE-11981.096.patch, 
> HIVE-11981.097.patch, HIVE-11981.098.patch, HIVE-11981.099.patch, 
> HIVE-11981.0991.patch, HIVE-11981.0992.patch, ORC Schema Evolution Issues.docx
>
>
> High priority issues with schema evolution for the ORC file format.
> Schema evolution here is limited to adding new columns and a few cases of 
> column type-widening (e.g. int to bigint).
> Renaming columns, deleting column, moving columns and other schema evolution 
> were not pursued due to lack of importance and lack of time.  Also, it 
> appears a much more sophisticated metadata would be needed to support them.
> The biggest issues for users have been adding new columns for ACID table 
> (HIVE-11421 Support Schema evolution for ACID tables) and vectorization 
> (HIVE-10598 Vectorization borks when column is added to table).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12463) VectorMapJoinFastKeyStore has Array OOB errors

2015-11-19 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-12463:
---
Affects Version/s: 1.2.1

> VectorMapJoinFastKeyStore has Array OOB errors
> --
>
> Key: HIVE-12463
> URL: https://issues.apache.org/jira/browse/HIVE-12463
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 1.2.1, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>
> When combining different sized keys, observing an occasional error in 
> hashtable probes.
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 162046429
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastKeyStore.equalKey(VectorMapJoinFastKeyStore.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashTable.findReadSlot(VectorMapJoinFastBytesHashTable.java:191)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.lookup(VectorMapJoinFastBytesHashMap.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerMultiKeyOperator.process(VectorMapJoinInnerMultiKeyOperator.java:300)
>   ... 26 more
> {code}
> {code}
> // Our reading is positioned to the key.
> writeBuffers.getByteSegmentRefToCurrent(byteSegmentRef, keyLength, 
> readPos);
> byte[] currentBytes = byteSegmentRef.getBytes();
> int currentStart = (int) byteSegmentRef.getOffset();
> for (int i = 0; i < keyLength; i++) {
>   if (currentBytes[currentStart + i] != keyBytes[keyStart + i]) {
> // LOG.debug("VectorMapJoinFastKeyStore equalKey no match on bytes");
> return false;
>   }
> }
> {code}
> This needs an identical fix to match 
> {code}
> // Rare case of buffer boundary. Unfortunately we'd have to copy some 
> bytes.
>// Rare case of buffer boundary. Unfortunately we'd have to copy some 
> bytes.
> byte[] bytes = new byte[length];
> int destOffset = 0;
> while (destOffset < length) {
>   ponderNextBufferToRead(readPos);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11981) ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)

2015-11-19 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013146#comment-15013146
 ] 

Carl Steinbach commented on HIVE-11981:
---

bq. This adds hive.exec.schema.evolution to HiveConf.java ... Although it seems 
to be a general parameter, this JIRA issue is ORC-specific...

If this property is ORC specific then it seems like a mistake to name it 
hive.exec.orc.schema.evolution. Is there a good reason why "orc" doesn't appear 
in the property name or property description?

> ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)
> --
>
> Key: HIVE-11981
> URL: https://issues.apache.org/jira/browse/HIVE-11981
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11981.01.patch, HIVE-11981.02.patch, 
> HIVE-11981.03.patch, HIVE-11981.05.patch, HIVE-11981.06.patch, 
> HIVE-11981.07.patch, HIVE-11981.08.patch, HIVE-11981.09.patch, 
> HIVE-11981.091.patch, HIVE-11981.092.patch, HIVE-11981.093.patch, 
> HIVE-11981.094.patch, HIVE-11981.095.patch, HIVE-11981.096.patch, 
> HIVE-11981.097.patch, HIVE-11981.098.patch, HIVE-11981.099.patch, 
> HIVE-11981.0991.patch, HIVE-11981.0992.patch, ORC Schema Evolution Issues.docx
>
>
> High priority issues with schema evolution for the ORC file format.
> Schema evolution here is limited to adding new columns and a few cases of 
> column type-widening (e.g. int to bigint).
> Renaming columns, deleting column, moving columns and other schema evolution 
> were not pursued due to lack of importance and lack of time.  Also, it 
> appears a much more sophisticated metadata would be needed to support them.
> The biggest issues for users have been adding new columns for ACID table 
> (HIVE-11421 Support Schema evolution for ACID tables) and vectorization 
> (HIVE-10598 Vectorization borks when column is added to table).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12463) VectorMapJoinFastKeyStore has Array OOB errors

2015-11-19 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-12463:
---
Affects Version/s: 1.3.0

> VectorMapJoinFastKeyStore has Array OOB errors
> --
>
> Key: HIVE-12463
> URL: https://issues.apache.org/jira/browse/HIVE-12463
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>
> When combining different sized keys, observing an occasional error in 
> hashtable probes.
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 162046429
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastKeyStore.equalKey(VectorMapJoinFastKeyStore.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashTable.findReadSlot(VectorMapJoinFastBytesHashTable.java:191)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.lookup(VectorMapJoinFastBytesHashMap.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerMultiKeyOperator.process(VectorMapJoinInnerMultiKeyOperator.java:300)
>   ... 26 more
> {code}
> {code}
> // Our reading is positioned to the key.
> writeBuffers.getByteSegmentRefToCurrent(byteSegmentRef, keyLength, 
> readPos);
> byte[] currentBytes = byteSegmentRef.getBytes();
> int currentStart = (int) byteSegmentRef.getOffset();
> for (int i = 0; i < keyLength; i++) {
>   if (currentBytes[currentStart + i] != keyBytes[keyStart + i]) {
> // LOG.debug("VectorMapJoinFastKeyStore equalKey no match on bytes");
> return false;
>   }
> }
> {code}
> This needs an identical fix to match 
> {code}
> // Rare case of buffer boundary. Unfortunately we'd have to copy some 
> bytes.
>// Rare case of buffer boundary. Unfortunately we'd have to copy some 
> bytes.
> byte[] bytes = new byte[length];
> int destOffset = 0;
> while (destOffset < length) {
>   ponderNextBufferToRead(readPos);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12463) VectorMapJoinFastKeyStore has Array OOB errors

2015-11-19 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013207#comment-15013207
 ] 

Gopal V commented on HIVE-12463:


[~sershe]: this one was missing the rare case for 1 key being split internally 
inside WriteBuffers 

> VectorMapJoinFastKeyStore has Array OOB errors
> --
>
> Key: HIVE-12463
> URL: https://issues.apache.org/jira/browse/HIVE-12463
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 1.2.1, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-12463.1.patch
>
>
> When combining different sized keys, observing an occasional error in 
> hashtable probes.
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 162046429
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastKeyStore.equalKey(VectorMapJoinFastKeyStore.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashTable.findReadSlot(VectorMapJoinFastBytesHashTable.java:191)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.lookup(VectorMapJoinFastBytesHashMap.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerMultiKeyOperator.process(VectorMapJoinInnerMultiKeyOperator.java:300)
>   ... 26 more
> {code}
> {code}
> // Our reading is positioned to the key.
> writeBuffers.getByteSegmentRefToCurrent(byteSegmentRef, keyLength, 
> readPos);
> byte[] currentBytes = byteSegmentRef.getBytes();
> int currentStart = (int) byteSegmentRef.getOffset();
> for (int i = 0; i < keyLength; i++) {
>   if (currentBytes[currentStart + i] != keyBytes[keyStart + i]) {
> // LOG.debug("VectorMapJoinFastKeyStore equalKey no match on bytes");
> return false;
>   }
> }
> {code}
> This needs an identical fix to match 
> {code}
> // Rare case of buffer boundary. Unfortunately we'd have to copy some 
> bytes.
>// Rare case of buffer boundary. Unfortunately we'd have to copy some 
> bytes.
> byte[] bytes = new byte[length];
> int destOffset = 0;
> while (destOffset < length) {
>   ponderNextBufferToRead(readPos);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12257) Enhance ORC FileDump utility to handle flush_length files

2015-11-19 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-12257:
-
Attachment: HIVE-12257.9.patch

Addressed review comments.

> Enhance ORC FileDump utility to handle flush_length files
> -
>
> Key: HIVE-12257
> URL: https://issues.apache.org/jira/browse/HIVE-12257
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12257-branch-1.patch, HIVE-12257.1.patch, 
> HIVE-12257.2.patch, HIVE-12257.3.patch, HIVE-12257.4.patch, 
> HIVE-12257.6.patch, HIVE-12257.7.patch, HIVE-12257.8.patch, HIVE-12257.9.patch
>
>
> ORC file dump utility currently does not handle delta directories that 
> contain *_flush_length files. These files contains offsets to footer in the 
> corresponding delta file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12257) Enhance ORC FileDump utility to handle flush_length files and recovery

2015-11-19 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-12257:
-
Summary: Enhance ORC FileDump utility to handle flush_length files and 
recovery  (was: Enhance ORC FileDump utility to handle flush_length files)

> Enhance ORC FileDump utility to handle flush_length files and recovery
> --
>
> Key: HIVE-12257
> URL: https://issues.apache.org/jira/browse/HIVE-12257
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12257-branch-1.patch, HIVE-12257.1.patch, 
> HIVE-12257.2.patch, HIVE-12257.3.patch, HIVE-12257.4.patch, 
> HIVE-12257.6.patch, HIVE-12257.7.patch, HIVE-12257.8.patch, HIVE-12257.9.patch
>
>
> ORC file dump utility currently does not handle delta directories that 
> contain *_flush_length files. These files contains offsets to footer in the 
> corresponding delta file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12463) VectorMapJoinFastKeyStore has Array OOB errors

2015-11-19 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-12463:
---
Attachment: HIVE-12463.1.patch

> VectorMapJoinFastKeyStore has Array OOB errors
> --
>
> Key: HIVE-12463
> URL: https://issues.apache.org/jira/browse/HIVE-12463
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 1.2.1, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-12463.1.patch
>
>
> When combining different sized keys, observing an occasional error in 
> hashtable probes.
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 162046429
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastKeyStore.equalKey(VectorMapJoinFastKeyStore.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashTable.findReadSlot(VectorMapJoinFastBytesHashTable.java:191)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.lookup(VectorMapJoinFastBytesHashMap.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerMultiKeyOperator.process(VectorMapJoinInnerMultiKeyOperator.java:300)
>   ... 26 more
> {code}
> {code}
> // Our reading is positioned to the key.
> writeBuffers.getByteSegmentRefToCurrent(byteSegmentRef, keyLength, 
> readPos);
> byte[] currentBytes = byteSegmentRef.getBytes();
> int currentStart = (int) byteSegmentRef.getOffset();
> for (int i = 0; i < keyLength; i++) {
>   if (currentBytes[currentStart + i] != keyBytes[keyStart + i]) {
> // LOG.debug("VectorMapJoinFastKeyStore equalKey no match on bytes");
> return false;
>   }
> }
> {code}
> This needs an identical fix to match 
> {code}
> // Rare case of buffer boundary. Unfortunately we'd have to copy some 
> bytes.
>// Rare case of buffer boundary. Unfortunately we'd have to copy some 
> bytes.
> byte[] bytes = new byte[length];
> int destOffset = 0;
> while (destOffset < length) {
>   ponderNextBufferToRead(readPos);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12464) Inconsistent behavior between MapReduce and Spark engine on bucketed mapjoin

2015-11-19 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-12464:
-
Description: 
Steps to reproduce:
1,prepare the table and data
{noformat}
create table if not exists lxw_test(imei string,sndaid string,data_time string)
CLUSTERED BY(imei) SORTED BY(imei) INTO 10 BUCKETS;
create table if not exists lxw_test1(imei string,sndaid string,data_time string)
CLUSTERED BY(imei) SORTED BY(imei) INTO 5 BUCKETS;
set hive.enforce.bucketing = true;
set hive.enforce.sorting = true;
insert overwrite table lxw_test
values(1,1,1),(2,2,2),(3,3,3),(4,4,4),(5,5,5),(6,6,6),(7,7,7),(8,8,8),(9,9,9),(10,10,10);
insert overwrite table lxw_test1
values 
(1,1,1),(2,2,2),(3,3,3),(4,4,4),(5,5,5),(6,6,6),(7,7,7),(8,8,8),(9,9,9),(10,10,10);
set hive.enforce.bucketing;
insert into table lxw_test1 select * from lxw_test;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
{noformat}
2,the following sql will success :
{noformat}
set hive.execution.engine=mr;
explain select  count(1) 
from lxw_test1 a 
join lxw_test b 
on a.imei = b.imei ;
{noformat}
3,this one will fail :
{noformat}
set hive.execution.engine=spark;
explain select  count(1) 
from lxw_test1 a 
join lxw_test b 
on a.imei = b.imei ;
{noformat}
On spark,the query returns this error:
{noformat}
Error: Error while compiling statement: FAILED: SemanticException [Error 
10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of 
buckets for table lxw_test1 is 5, whereas the number of files is 10 
(state=42000,code=10141)
{noformat}
After set hive.ignore.mapjoin.hint=false and use mapjoin hint,the MapReduce 
engine return the same error.
{noformat}
set hive.execution.engine=mr;
set hive.ignore.mapjoin.hint=false;
explain
select /*+ mapjoin(b) */ count(1) 
from lxw_test1 a 
join lxw_test b 
on a.imei = b.imei ;
{noformat}


  was:
Steps to reproduce:
1,prepare the table and data
{noformat}
create table if not exists lxw_test(imei string,sndaid string,data_time string)
CLUSTERED BY(imei) SORTED BY(imei) INTO 10 BUCKETS;
create table if not exists lxw_test1(imei string,sndaid string,data_time string)
CLUSTERED BY(imei) SORTED BY(imei) INTO 5 BUCKETS;
set hive.enforce.bucketing = true;
set hive.enforce.sorting = true;
insert overwrite table lxw_test
values(1,1,1),(2,2,2),(3,3,3),(4,4,4),(5,5,5),(6,6,6),(7,7,7),(8,8,8),(9,9,9),(10,10,10);
insert overwrite table lxw_test1
values 
(1,1,1),(2,2,2),(3,3,3),(4,4,4),(5,5,5),(6,6,6),(7,7,7),(8,8,8),(9,9,9),(10,10,10);
set hive.enforce.bucketing;
insert into table lxw_test1 select * from lxw_test;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
{noformat}
2,the following sql will success :
{noformat}
set hive.execution.engine=mr;
select  count(1) 
from lxw_test1 a 
join lxw_test b 
on a.imei = b.imei ;
{noformat}
3,this one will fail :
{noformat}
set hive.execution.engine=spark;
select  count(1) 
from lxw_test1 a 
join lxw_test b 
on a.imei = b.imei ;
{noformat}
On spark,the query returns this error:
{noformat}
Error: Error while compiling statement: FAILED: SemanticException [Error 
10141]: Bucketed table metadata is not correct. Fix the metadata or don't use 
bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of 
buckets for table lxw_test1 is 5, whereas the number of files is 10 
(state=42000,code=10141)
{noformat}
After set hive.ignore.mapjoin.hint=false and use mapjoin hint,the MapReduce 
engine return the same error.
{noformat}
set hive.execution.engine=mr;
set hive.ignore.mapjoin.hint=false;
explain
select /*+ mapjoin(b) */ count(1) 
from lxw_test1 a 
join lxw_test b 
on a.imei = b.imei ;
{noformat}



> Inconsistent behavior between MapReduce and Spark engine on bucketed mapjoin
> 
>
> Key: HIVE-12464
> URL: https://issues.apache.org/jira/browse/HIVE-12464
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Spark
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
>
> Steps to reproduce:
> 1,prepare the table and data
> {noformat}
> create table if not exists lxw_test(imei string,sndaid string,data_time 
> string)
> CLUSTERED BY(imei) SORTED BY(imei) INTO 10 BUCKETS;
> create table if not exists lxw_test1(imei string,sndaid string,data_time 
> string)
> CLUSTERED BY(imei) SORTED BY(imei) INTO 5 BUCKETS;
> set hive.enforce.bucketing = true;
> set hive.enforce.sorting = true;
> insert overwrite table lxw_test
> values(1,1,1),(2,2,2),(3,3,3),(4,4,4),(5,5,5),(6,6,6),(7,7,7),(8,8,8),(9,9,9),(10,10,10);

[jira] [Commented] (HIVE-12008) Make last two tests added by HIVE-11384 pass when hive.in.test is false

2015-11-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013243#comment-15013243
 ] 

Hive QA commented on HIVE-12008:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772864/HIVE-12008.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 9865 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_null_projection
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_null_projection
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_view
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6073/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6073/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6073/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772864 - PreCommit-HIVE-TRUNK-Build

> Make last two tests added by HIVE-11384 pass when hive.in.test is false
> ---
>
> Key: HIVE-12008
> URL: https://issues.apache.org/jira/browse/HIVE-12008
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-12008.1.patch, HIVE-12008.2.patch, 
> HIVE-12008.3.patch
>
>
> The last two qfile unit tests fail when hive.in.test is false. It may relate 
> how we handle prunelist for select. When select include every column in a 
> table, the prunelist for the select is empty. It may cause issues to 
> calculate its parent's prunelist.. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12469) Bump Commons-Collections dependency from 3.2.1 to 3.2.2. to address vulnerability

2015-11-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014084#comment-15014084
 ] 

Ashutosh Chauhan commented on HIVE-12469:
-

Thanks, [~sircodesalot] for heads up. Is there any changelog for 3.2.1-3.2.2 ? 
Want to make sure if there are any gotchas in migration.

> Bump Commons-Collections dependency from 3.2.1 to 3.2.2. to address 
> vulnerability
> -
>
> Key: HIVE-12469
> URL: https://issues.apache.org/jira/browse/HIVE-12469
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Reuben Kuhnert
>Assignee: Reuben Kuhnert
>Priority: Blocker
>
> Currently the commons-collections (3.2.1) library allows for invocation of 
> arbitrary code through {{InvokerTransformer}}, need to bump the version of 
> commons-collections from 3.2.1 to 3.2.2 to resolve this issue.
> Results of {{mvn dependency:tree}}:
> {code}
> [INFO] 
> 
> [INFO] Building Hive HPL/SQL 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] 
> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-hplsql ---
> [INFO] org.apache.hive:hive-hplsql:jar:2.0.0-SNAPSHOT
> [INFO] +- com.google.guava:guava:jar:14.0.1:compile
> [INFO] +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}
> {code}
> [INFO] 
> 
> [INFO] Building Hive Packaging 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] +- org.apache.hive:hive-hbase-handler:jar:2.0.0-SNAPSHOT:compile
> [INFO] |  +- org.apache.hbase:hbase-server:jar:1.1.1:compile
> [INFO] |  |  +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}
> {code}
> [INFO] 
> 
> [INFO] Building Hive Common 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] 
> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-common ---
> [INFO] +- org.apache.hadoop:hadoop-common:jar:2.6.0:compile
> [INFO] |  +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}
> {{Hadoop-Common}} dependency also found in: LLAP, Serde, Storage,  Shims, 
> Shims Common, Shims Scheduler)
> {code}
> [INFO] 
> 
> [INFO] Building Hive Ant Utilities 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] 
> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-ant ---
> [INFO] |  +- commons-collections:commons-collections:jar:3.1:compile
> {code}
> {code}
> [INFO]
>  
> [INFO] 
> 
> [INFO] Building Hive Accumulo Handler 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile
> [INFO] |  +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12045) ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)

2015-11-19 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014165#comment-15014165
 ] 

Xuefu Zhang commented on HIVE-12045:


Patch looks good. I left a minor comment on RB, but that can be handled as a 
separated JIRA. I'm not sure if the two test failures in 
TestSparkSessionManagerImpl are related. Otherwise, +1.

> ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)
> --
>
> Key: HIVE-12045
> URL: https://issues.apache.org/jira/browse/HIVE-12045
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
> Environment: Cloudera QuickStart VM - CDH5.4.2
> beeline
>Reporter: Zsolt Tóth
>Assignee: Rui Li
> Attachments: HIVE-12045.1-spark.patch, HIVE-12045.2-spark.patch, 
> HIVE-12045.2-spark.patch, HIVE-12045.3-spark.patch, HIVE-12045.patch, 
> example.jar, genUDF.patch, hive.log.gz
>
>
> If I execute the following query in beeline, I get ClassNotFoundException for 
> the UDF class.
> {code}
> drop function myGenericUdf;
> create function myGenericUdf as 'org.example.myGenericUdf' using jar 
> 'hdfs:///tmp/myudf.jar';
> select distinct myGenericUdf(1,2,1) from mytable;
> {code}
> In my example, myGenericUdf just looks for the 1st argument's value in the 
> others and returns the index. I don't think this is related to the actual 
> GenericUDF function.
> Note that:
> "select myGenericUdf(1,2,1) from mytable;" succeeds
> If I use the non-generic implementation of the same UDF, the select distinct 
> call succeeds.
> StackTrace:
> {code}
> 15/10/06 05:20:25 ERROR exec.Utilities: Failed to load plan: 
> hdfs://quickstart.cloudera:8020/tmp/hive/hive/f9de3f09-c12d-4528-9ee6-1f12932a14ae/hive_2015-10-06_05-20-07_438_6519207588897968406-20/-mr-10003/27cd7226-3e22-46f4-bddd-fb8fd4aa4b8d/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> 

[jira] [Commented] (HIVE-12469) Bump Commons-Collections dependency from 3.2.1 to 3.2.2. to address vulnerability

2015-11-19 Thread Dmitry Tolpeko (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014202#comment-15014202
 ] 

Dmitry Tolpeko commented on HIVE-12469:
---

I will remove the dependency and try to compile. The exact version was 
specified in hplsql/pom.xml since there was not ${commons-collections.version} 
at the top pom.

> Bump Commons-Collections dependency from 3.2.1 to 3.2.2. to address 
> vulnerability
> -
>
> Key: HIVE-12469
> URL: https://issues.apache.org/jira/browse/HIVE-12469
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Reuben Kuhnert
>Assignee: Reuben Kuhnert
>Priority: Blocker
>
> Currently the commons-collections (3.2.1) library allows for invocation of 
> arbitrary code through {{InvokerTransformer}}, need to bump the version of 
> commons-collections from 3.2.1 to 3.2.2 to resolve this issue.
> Results of {{mvn dependency:tree}}:
> {code}
> [INFO] 
> 
> [INFO] Building Hive HPL/SQL 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] 
> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-hplsql ---
> [INFO] org.apache.hive:hive-hplsql:jar:2.0.0-SNAPSHOT
> [INFO] +- com.google.guava:guava:jar:14.0.1:compile
> [INFO] +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}
> {code}
> [INFO] 
> 
> [INFO] Building Hive Packaging 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] +- org.apache.hive:hive-hbase-handler:jar:2.0.0-SNAPSHOT:compile
> [INFO] |  +- org.apache.hbase:hbase-server:jar:1.1.1:compile
> [INFO] |  |  +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}
> {code}
> [INFO] 
> 
> [INFO] Building Hive Common 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] 
> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-common ---
> [INFO] +- org.apache.hadoop:hadoop-common:jar:2.6.0:compile
> [INFO] |  +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}
> {{Hadoop-Common}} dependency also found in: LLAP, Serde, Storage,  Shims, 
> Shims Common, Shims Scheduler)
> {code}
> [INFO] 
> 
> [INFO] Building Hive Ant Utilities 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] 
> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-ant ---
> [INFO] |  +- commons-collections:commons-collections:jar:3.1:compile
> {code}
> {code}
> [INFO]
>  
> [INFO] 
> 
> [INFO] Building Hive Accumulo Handler 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile
> [INFO] |  +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12459) Tez startup dislikes spaces in classpath on Windows

2015-11-19 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014254#comment-15014254
 ] 

Vikram Dixit K commented on HIVE-12459:
---

Hive derives the location of the hive-exec jar from its classpath much like MR. 
Not sure why this works in MR and not in Tez. Will investigate and get back to 
you on this.

> Tez startup dislikes spaces in classpath on Windows
> ---
>
> Key: HIVE-12459
> URL: https://issues.apache.org/jira/browse/HIVE-12459
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.2.1
>Reporter: john lilley
>Assignee: Vikram Dixit K
>
> We are seeing an issue that I summarize as “Tez doesn’t like spaces in the 
> classpath”, but I wanted to check with the group before submitting a JIRA.  
> This is showing when we try to access Hive on HDP 2.3 from a Windows client, 
> where we’ve put the client jars in a classpath that contains spaces.  
> The causing line in our code is:
>   state = SessionState.start(hiveConf);
> where SessionState is in org.apache.hadoop.hive.ql.session
> The exception stack is:
> net/redpoint/hiveclient/DMHCatClient.newInstance:java.lang.RuntimeException: 
> java.io.FileNotFoundException: File 
> file:/C:/Program%20Files/RedPointDM7/hadoop/clusters/hds-cent6/lib/hive-exec-1.2.1.2.3.0.0-2557.jar
>  does not exist
> Additional message: 
> 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:535)
> 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:466)
> net.redpoint.hiveclient.DMHCatClient.(DMHCatClient.java:255)
> net.redpoint.hiveclient.DMHCatClient.newInstance(DMHCatClient.java:59)
> Caused by: java.io.FileNotFoundException: File 
> file:/C:/Program%20Files/RedPointDM7/hadoop/clusters/hds-cent6/lib/hive-exec-1.2.1.2.3.0.0-2557.jar
>  does not exist
> 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606)
> 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:819)
> 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:596)
> 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
> 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:140)
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
> org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767)
> 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.getSha(TezSessionState.java:356)
> 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createJarLocalResource(TezSessionState.java:332)
> 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:151)
> 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:116)
> 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:532)
> 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:466)
> net.redpoint.hiveclient.DMHCatClient.(DMHCatClient.java:255)
> net.redpoint.hiveclient.DMHCatClient.newInstance(DMHCatClient.java:59)
> It sure looks like something in the client code is turning “C:/Program Files” 
> into “C:/Program%20Files”.  I am certain that it is not our code, because we 
> otherwise access all of the jars and Java classes just fine.
> Furthermore, disabling Tez for client-side Hive query in the configuration 
> seems to fix or avoid the issue:
> theConfiguration.set("hive.execution.engine", "mr");
> The stack trace doesn’t make sense to me, because we use FileSystem all over 
> the place and it doesn’t run into this problem when accessing HDFS, only when 
> connecting to Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12017) Do not disable CBO by default when number of joins in a query is equal or less than 1

2015-11-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014074#comment-15014074
 ] 

Ashutosh Chauhan commented on HIVE-12017:
-

HIVE-12465 is important one. Lets continue it there. Good that after this patch 
goes in, we will be doing right thing in default config atleast. We need to fix 
it regardless, though.
Joining on different key types is rare enough that we can take that up later. 
We might get that for free in return path anyway.
Rest of changes LGTM. +1 pending QA run.

> Do not disable CBO by default when number of joins in a query is equal or 
> less than 1
> -
>
> Key: HIVE-12017
> URL: https://issues.apache.org/jira/browse/HIVE-12017
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12017.01.patch, HIVE-12017.02.patch, 
> HIVE-12017.03.patch, HIVE-12017.04.patch, HIVE-12017.05.patch, 
> HIVE-12017.06.patch, HIVE-12017.07.patch, HIVE-12017.08.patch, 
> HIVE-12017.09.patch, HIVE-12017.10.patch, HIVE-12017.11.patch, 
> HIVE-12017.12.patch, HIVE-12017.13.patch
>
>
> Instead, we could disable some parts of CBO that are not relevant if the 
> query contains 1 or 0 joins. Implementation should be able to define easily 
> other query patterns for which we might disable some parts of CBO (in case we 
> want to do it in the future).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-2750) Hive multi group by single reducer optimization causes invalid column reference error

2015-11-19 Thread John P. Petrakis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014110#comment-15014110
 ] 

John P. Petrakis commented on HIVE-2750:


Unless we turn off group-by optimization, the query as described in Jira 12412, 
which worked fine in Hive 1.0 fails in Hive 1.1 and later.

> Hive multi group by single reducer optimization causes invalid column 
> reference error
> -
>
> Key: HIVE-2750
> URL: https://issues.apache.org/jira/browse/HIVE-2750
> Project: Hive
>  Issue Type: Bug
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.9.0
>
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2750.D1455.1.patch
>
>
> After the optimization, if two query blocks have the same distinct clause and 
> the same group by keys, but the first query block does not reference all the 
> rows the second query block does, an invalid column reference error is raised 
> for the columns unreferenced in the first query block.
> E.g.
> FROM src
> INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT 
> src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
> INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT 
> src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY 
> substr(src.key,1,1);
> This results in an invalid column reference error on src.value



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10937) LLAP: make ObjectCache for plans work properly in the daemon

2015-11-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014153#comment-15014153
 ] 

Sergey Shelukhin commented on HIVE-10937:
-

Test failures not related.

> LLAP: make ObjectCache for plans work properly in the daemon
> 
>
> Key: HIVE-10937
> URL: https://issues.apache.org/jira/browse/HIVE-10937
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-10937.01.patch, HIVE-10937.02.patch, 
> HIVE-10937.03.patch, HIVE-10937.04.patch, HIVE-10937.05.patch, 
> HIVE-10937.patch
>
>
> There's perf hit otherwise, esp. when stupid planner creates 1009 reducers of 
> 4Mb each.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12469) Bump Commons-Collections dependency from 3.2.1 to 3.2.2. to address vulnerability

2015-11-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014176#comment-15014176
 ] 

Ashutosh Chauhan commented on HIVE-12469:
-

Actually, direct dependency declared in hplsql is redundant and is not used 
anywhere. We can simply remove it. [~dmtolpeko] Can you confirm?

> Bump Commons-Collections dependency from 3.2.1 to 3.2.2. to address 
> vulnerability
> -
>
> Key: HIVE-12469
> URL: https://issues.apache.org/jira/browse/HIVE-12469
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Reuben Kuhnert
>Assignee: Reuben Kuhnert
>Priority: Blocker
>
> Currently the commons-collections (3.2.1) library allows for invocation of 
> arbitrary code through {{InvokerTransformer}}, need to bump the version of 
> commons-collections from 3.2.1 to 3.2.2 to resolve this issue.
> Results of {{mvn dependency:tree}}:
> {code}
> [INFO] 
> 
> [INFO] Building Hive HPL/SQL 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] 
> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-hplsql ---
> [INFO] org.apache.hive:hive-hplsql:jar:2.0.0-SNAPSHOT
> [INFO] +- com.google.guava:guava:jar:14.0.1:compile
> [INFO] +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}
> {code}
> [INFO] 
> 
> [INFO] Building Hive Packaging 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] +- org.apache.hive:hive-hbase-handler:jar:2.0.0-SNAPSHOT:compile
> [INFO] |  +- org.apache.hbase:hbase-server:jar:1.1.1:compile
> [INFO] |  |  +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}
> {code}
> [INFO] 
> 
> [INFO] Building Hive Common 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] 
> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-common ---
> [INFO] +- org.apache.hadoop:hadoop-common:jar:2.6.0:compile
> [INFO] |  +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}
> {{Hadoop-Common}} dependency also found in: LLAP, Serde, Storage,  Shims, 
> Shims Common, Shims Scheduler)
> {code}
> [INFO] 
> 
> [INFO] Building Hive Ant Utilities 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] 
> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-ant ---
> [INFO] |  +- commons-collections:commons-collections:jar:3.1:compile
> {code}
> {code}
> [INFO]
>  
> [INFO] 
> 
> [INFO] Building Hive Accumulo Handler 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile
> [INFO] |  +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12462) DPP: DPP optimizers need to run on the TS predicate not FIL

2015-11-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014205#comment-15014205
 ] 

Sergey Shelukhin commented on HIVE-12462:
-

+1 pending tests

> DPP: DPP optimizers need to run on the TS predicate not FIL 
> 
>
> Key: HIVE-12462
> URL: https://issues.apache.org/jira/browse/HIVE-12462
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Critical
> Attachments: HIVE-12462.1.patch
>
>
> HIVE-11398 + HIVE-11791, the partition-condition-remover became more 
> effective.
> This removes predicates from the FilterExpression which involve partition 
> columns, causing a miss for dynamic-partition pruning if the DPP relies on 
> FilterDesc.
> The TS desc will have the correct predicate in that condition.
> {code}
> $hdt$_0:$hdt$_1:a
>   TableScan (TS_2)
> alias: a
> filterExpr: (((account_id = 22) and year(dt) is not null) and (year(dt)) 
> IN (RS[6])) (type: boolean)
> Filter Operator (FIL_20)
>   predicate: ((account_id = 22) and year(dt) is not null) (type: boolean)
>   Select Operator (SEL_4)
> expressions: dt (type: date)
> outputColumnNames: _col1
> Reduce Output Operator (RS_8)
>   key expressions: year(_col1) (type: int)
>   sort order: +
>   Map-reduce partition columns: year(_col1) (type: int)
>   Join Operator (JOIN_9)
> condition map:
>  Inner Join 0 to 1
> keys:
>   0 year(_col1) (type: int)
>   1 year(_col1) (type: int)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12469) Bump Commons-Collections dependency from 3.2.1 to 3.2.2. to address vulnerability

2015-11-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014133#comment-15014133
 ] 

Ashutosh Chauhan commented on HIVE-12469:
-

I think we can upgrade direct dependency. For, hadoop-common we can just simply 
exclude it, since at run time hadoop jars and deps are present in classpath are 
available by default, the way Hive works.

> Bump Commons-Collections dependency from 3.2.1 to 3.2.2. to address 
> vulnerability
> -
>
> Key: HIVE-12469
> URL: https://issues.apache.org/jira/browse/HIVE-12469
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Reuben Kuhnert
>Assignee: Reuben Kuhnert
>Priority: Blocker
>
> Currently the commons-collections (3.2.1) library allows for invocation of 
> arbitrary code through {{InvokerTransformer}}, need to bump the version of 
> commons-collections from 3.2.1 to 3.2.2 to resolve this issue.
> Results of {{mvn dependency:tree}}:
> {code}
> [INFO] 
> 
> [INFO] Building Hive HPL/SQL 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] 
> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-hplsql ---
> [INFO] org.apache.hive:hive-hplsql:jar:2.0.0-SNAPSHOT
> [INFO] +- com.google.guava:guava:jar:14.0.1:compile
> [INFO] +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}
> {code}
> [INFO] 
> 
> [INFO] Building Hive Packaging 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] +- org.apache.hive:hive-hbase-handler:jar:2.0.0-SNAPSHOT:compile
> [INFO] |  +- org.apache.hbase:hbase-server:jar:1.1.1:compile
> [INFO] |  |  +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}
> {code}
> [INFO] 
> 
> [INFO] Building Hive Common 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] 
> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-common ---
> [INFO] +- org.apache.hadoop:hadoop-common:jar:2.6.0:compile
> [INFO] |  +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}
> {{Hadoop-Common}} dependency also found in: LLAP, Serde, Storage,  Shims, 
> Shims Common, Shims Scheduler)
> {code}
> [INFO] 
> 
> [INFO] Building Hive Ant Utilities 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] 
> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-ant ---
> [INFO] |  +- commons-collections:commons-collections:jar:3.1:compile
> {code}
> {code}
> [INFO]
>  
> [INFO] 
> 
> [INFO] Building Hive Accumulo Handler 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile
> [INFO] |  +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12461) Branch-1 -Phadoop-1 build is broken

2015-11-19 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014168#comment-15014168
 ] 

Xuefu Zhang commented on HIVE-12461:


+1

> Branch-1 -Phadoop-1 build is broken
> ---
>
> Key: HIVE-12461
> URL: https://issues.apache.org/jira/browse/HIVE-12461
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Xuefu Zhang
>Assignee: Aleksei Statkevich
> Attachments: HIVE-12461-branch-1.patch
>
>
> {code}
> [INFO] Executed tasks
> [INFO] 
> [INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ hive-exec 
> ---
> [INFO] Compiling 2423 source files to 
> /Users/xzhang/apache/hive-git-commit/ql/target/classes
> [INFO] -
> [ERROR] COMPILATION ERROR : 
> [INFO] -
> [ERROR] 
> /Users/xzhang/apache/hive-git-commit/ql/src/java/org/apache/hadoop/hive/ql/Context.java:[352,10]
>  error: cannot find symbol
> [INFO] 1 error
> [INFO] -
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Hive ... SUCCESS [  2.636 
> s]
> [INFO] Hive Shims Common .. SUCCESS [  3.270 
> s]
> [INFO] Hive Shims 0.20S ... SUCCESS [  1.052 
> s]
> [INFO] Hive Shims 0.23  SUCCESS [  3.550 
> s]
> [INFO] Hive Shims Scheduler ... SUCCESS [  1.076 
> s]
> [INFO] Hive Shims . SUCCESS [  1.472 
> s]
> [INFO] Hive Common  SUCCESS [  5.989 
> s]
> [INFO] Hive Serde . SUCCESS [  6.923 
> s]
> [INFO] Hive Metastore . SUCCESS [ 19.424 
> s]
> [INFO] Hive Ant Utilities . SUCCESS [  0.516 
> s]
> [INFO] Spark Remote Client  SUCCESS [  3.305 
> s]
> [INFO] Hive Query Language  FAILURE [ 34.276 
> s]
> [INFO] Hive Service ... SKIPPED
> {code}
> Part of the code that's being complained:
> {code}
> 343   /**
> 344* Remove any created scratch directories.
> 345*/
> 346   public void removeScratchDir() {
> 347 for (Map.Entry entry : fsScratchDirs.entrySet()) {
> 348   try {
> 349 Path p = entry.getValue();
> 350 FileSystem fs = p.getFileSystem(conf);
> 351 fs.delete(p, true);
> 352 fs.cancelDeleteOnExit(p);
> 353   } catch (Exception e) {
> 354 LOG.warn("Error Removing Scratch: "
> 355 + StringUtils.stringifyException(e));
> 356   }
> {code}
> might be related to HIVE-12268.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12463) VectorMapJoinFastKeyStore has Array OOB errors

2015-11-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014174#comment-15014174
 ] 

Sergey Shelukhin commented on HIVE-12463:
-

+1 pending test run

> VectorMapJoinFastKeyStore has Array OOB errors
> --
>
> Key: HIVE-12463
> URL: https://issues.apache.org/jira/browse/HIVE-12463
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 1.2.1, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-12463.1.patch, HIVE-12463.2.patch
>
>
> When combining different sized keys, observing an occasional error in 
> hashtable probes.
> {code}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 162046429
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastKeyStore.equalKey(VectorMapJoinFastKeyStore.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashTable.findReadSlot(VectorMapJoinFastBytesHashTable.java:191)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.lookup(VectorMapJoinFastBytesHashMap.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerMultiKeyOperator.process(VectorMapJoinInnerMultiKeyOperator.java:300)
>   ... 26 more
> {code}
> {code}
> // Our reading is positioned to the key.
> writeBuffers.getByteSegmentRefToCurrent(byteSegmentRef, keyLength, 
> readPos);
> byte[] currentBytes = byteSegmentRef.getBytes();
> int currentStart = (int) byteSegmentRef.getOffset();
> for (int i = 0; i < keyLength; i++) {
>   if (currentBytes[currentStart + i] != keyBytes[keyStart + i]) {
> // LOG.debug("VectorMapJoinFastKeyStore equalKey no match on bytes");
> return false;
>   }
> }
> {code}
> This needs an identical fix to match 
> {code}
> // Rare case of buffer boundary. Unfortunately we'd have to copy some 
> bytes.
>// Rare case of buffer boundary. Unfortunately we'd have to copy some 
> bytes.
> byte[] bytes = new byte[length];
> int destOffset = 0;
> while (destOffset < length) {
>   ponderNextBufferToRead(readPos);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-2750) Hive multi group by single reducer optimization causes invalid column reference error

2015-11-19 Thread John P. Petrakis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014115#comment-15014115
 ] 

John P. Petrakis commented on HIVE-2750:


Sorry... wrong link.

> Hive multi group by single reducer optimization causes invalid column 
> reference error
> -
>
> Key: HIVE-2750
> URL: https://issues.apache.org/jira/browse/HIVE-2750
> Project: Hive
>  Issue Type: Bug
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.9.0
>
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2750.D1455.1.patch
>
>
> After the optimization, if two query blocks have the same distinct clause and 
> the same group by keys, but the first query block does not reference all the 
> rows the second query block does, an invalid column reference error is raised 
> for the columns unreferenced in the first query block.
> E.g.
> FROM src
> INSERT OVERWRITE TABLE dest_g2 SELECT substr(src.key,1,1), count(DISTINCT 
> src.key) WHERE substr(src.key,1,1) >= 5 GROUP BY substr(src.key,1,1)
> INSERT OVERWRITE TABLE dest_g3 SELECT substr(src.key,1,1), count(DISTINCT 
> src.key), count(src.value) WHERE substr(src.key,1,1) < 5 GROUP BY 
> substr(src.key,1,1);
> This results in an invalid column reference error on src.value



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12466) SparkCounter not initialized error

2015-11-19 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12466:
---
Attachment: HIVE-12466-spark.patch

> SparkCounter not initialized error
> --
>
> Key: HIVE-12466
> URL: https://issues.apache.org/jira/browse/HIVE-12466
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Xuefu Zhang
> Attachments: HIVE-12466-spark.patch
>
>
> During a query, lots of the following error found in executor's log:
> {noformat}
> 03:47:28.759 [Executor task launch worker-0] ERROR 
> org.apache.hive.spark.counter.SparkCounters - counter[HIVE, RECORDS_OUT_0] 
> has not initialized before.
> 03:47:28.762 [Executor task launch worker-1] ERROR 
> org.apache.hive.spark.counter.SparkCounters - counter[HIVE, RECORDS_OUT_0] 
> has not initialized before.
> 03:47:30.707 [Executor task launch worker-1] ERROR 
> org.apache.hive.spark.counter.SparkCounters - counter[HIVE, 
> RECORDS_OUT_1_default.tmp_tmp] has not initialized before.
> 03:47:33.385 [Executor task launch worker-1] ERROR 
> org.apache.hive.spark.counter.SparkCounters - counter[HIVE, 
> RECORDS_OUT_1_default.test_table] has not initialized before.
> 03:47:33.388 [Executor task launch worker-0] ERROR 
> org.apache.hive.spark.counter.SparkCounters - counter[HIVE, 
> RECORDS_OUT_1_default.test_table] has not initialized before.
> 03:47:33.495 [Executor task launch worker-0] ERROR 
> org.apache.hive.spark.counter.SparkCounters - counter[HIVE, 
> RECORDS_OUT_1_default.test_table] has not initialized before.
> 03:47:35.141 [Executor task launch worker-1] ERROR 
> org.apache.hive.spark.counter.SparkCounters - counter[HIVE, 
> RECORDS_OUT_1_default.test_table] has not initialized before.
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12469) Bump Commons-Collections dependency from 3.2.1 to 3.2.2. to address vulnerability

2015-11-19 Thread Reuben Kuhnert (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014124#comment-15014124
 ] 

Reuben Kuhnert commented on HIVE-12469:
---

Looks like there is only one direct dependency, but numerous downstream 
references (a number of them to {{hadoop-common}}). Any suggestions on how we 
want to fix this?

> Bump Commons-Collections dependency from 3.2.1 to 3.2.2. to address 
> vulnerability
> -
>
> Key: HIVE-12469
> URL: https://issues.apache.org/jira/browse/HIVE-12469
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Reuben Kuhnert
>Assignee: Reuben Kuhnert
>Priority: Blocker
>
> Currently the commons-collections (3.2.1) library allows for invocation of 
> arbitrary code through {{InvokerTransformer}}, need to bump the version of 
> commons-collections from 3.2.1 to 3.2.2 to resolve this issue.
> Results of {{mvn dependency:tree}}:
> {code}
> [INFO] 
> 
> [INFO] Building Hive HPL/SQL 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] 
> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-hplsql ---
> [INFO] org.apache.hive:hive-hplsql:jar:2.0.0-SNAPSHOT
> [INFO] +- com.google.guava:guava:jar:14.0.1:compile
> [INFO] +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}
> {code}
> [INFO] 
> 
> [INFO] Building Hive Packaging 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] +- org.apache.hive:hive-hbase-handler:jar:2.0.0-SNAPSHOT:compile
> [INFO] |  +- org.apache.hbase:hbase-server:jar:1.1.1:compile
> [INFO] |  |  +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}
> {code}
> [INFO] 
> 
> [INFO] Building Hive Common 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] 
> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-common ---
> [INFO] +- org.apache.hadoop:hadoop-common:jar:2.6.0:compile
> [INFO] |  +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}
> {{Hadoop-Common}} dependency also found in: LLAP, Serde, Storage,  Shims, 
> Shims Common, Shims Scheduler)
> {code}
> [INFO] 
> 
> [INFO] Building Hive Ant Utilities 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] 
> [INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ hive-ant ---
> [INFO] |  +- commons-collections:commons-collections:jar:3.1:compile
> {code}
> {code}
> [INFO]
>  
> [INFO] 
> 
> [INFO] Building Hive Accumulo Handler 2.0.0-SNAPSHOT
> [INFO] 
> 
> [INFO] +- org.apache.accumulo:accumulo-core:jar:1.6.0:compile
> [INFO] |  +- commons-collections:commons-collections:jar:3.2.1:compile
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8396) Hive CliDriver command splitting can be broken when comments are present

2015-11-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014166#comment-15014166
 ] 

Sergey Shelukhin commented on HIVE-8396:


+1 pending test run (just in case CliDriver tests are affected)

> Hive CliDriver command splitting can be broken when comments are present
> 
>
> Key: HIVE-8396
> URL: https://issues.apache.org/jira/browse/HIVE-8396
> Project: Hive
>  Issue Type: Bug
>  Components: Parser, Query Processor
>Affects Versions: 0.14.0
>Reporter: Sergey Shelukhin
>Assignee: Elliot West
> Attachments: HIVE-8396.0.patch
>
>
> {noformat}
> -- SORT_QUERY_RESULTS
> set hive.cbo.enable=true;
> ... commands ...
> {noformat}
> causes
> {noformat}
> 2014-10-07 18:55:57,193 ERROR ql.Driver (SessionState.java:printError(825)) - 
> FAILED: ParseException line 2:4 missing KW_ROLE at 'hive' near 'hive'
> {noformat}
> If the comment is moved after the command it works.
> I noticed this earlier when I comment out parts of some random q file for 
> debugging purposes, and it starts failing. This is annoying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12466) SparkCounter not initialized error

2015-11-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014419#comment-15014419
 ] 

Hive QA commented on HIVE-12466:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12773297/HIVE-12466-spark.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 9786 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1006/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1006/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-1006/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12773297 - PreCommit-HIVE-SPARK-Build

> SparkCounter not initialized error
> --
>
> Key: HIVE-12466
> URL: https://issues.apache.org/jira/browse/HIVE-12466
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Xuefu Zhang
> Attachments: HIVE-12466-spark.patch
>
>
> During a query, lots of the following error found in executor's log:
> {noformat}
> 03:47:28.759 [Executor task launch worker-0] ERROR 
> org.apache.hive.spark.counter.SparkCounters - counter[HIVE, RECORDS_OUT_0] 
> has not initialized before.
> 03:47:28.762 [Executor task launch worker-1] ERROR 
> org.apache.hive.spark.counter.SparkCounters - counter[HIVE, RECORDS_OUT_0] 
> has not initialized before.
> 03:47:30.707 [Executor task launch worker-1] ERROR 
> org.apache.hive.spark.counter.SparkCounters - counter[HIVE, 
> RECORDS_OUT_1_default.tmp_tmp] has not initialized before.
> 03:47:33.385 [Executor task launch worker-1] ERROR 
> org.apache.hive.spark.counter.SparkCounters - counter[HIVE, 
> RECORDS_OUT_1_default.test_table] has not initialized before.
> 03:47:33.388 [Executor task launch worker-0] ERROR 
> org.apache.hive.spark.counter.SparkCounters - counter[HIVE, 
> RECORDS_OUT_1_default.test_table] has not initialized before.
> 03:47:33.495 [Executor task launch worker-0] ERROR 
> org.apache.hive.spark.counter.SparkCounters - counter[HIVE, 
> RECORDS_OUT_1_default.test_table] has not initialized before.
> 03:47:35.141 [Executor task launch worker-1] ERROR 
> org.apache.hive.spark.counter.SparkCounters - counter[HIVE, 
> RECORDS_OUT_1_default.test_table] has not initialized before.
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12420) WebHCat server throws NPE when you run command with -d user.name.

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12420:

Assignee: Thejas M Nair

> WebHCat server throws NPE when you run command with -d user.name.
> -
>
> Key: HIVE-12420
> URL: https://issues.apache.org/jira/browse/HIVE-12420
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 2.0.0
>Reporter: Takahiko Saito
>Assignee: Thejas M Nair
>
> When you run with '-d user.name', it failed with:
> {noformat}
> [hrt_qa@os-r6-bccslu-hive-1-r-5 ~]$ curl -s -d user.name=hrt_qa -d 
> execute="drop table if exists templetontest_tab2;" 
> http://os-r6-bccslu-hive-1-r-3.novalocal:20111/templeton/v1/ddl
> 
> 
> 
> Error 500 Server Error
> 
> 
> HTTP ERROR: 500
> Problem accessing /templeton/v1/ddl. Reason:
> Server Error
> Powered by Jetty://
> 
> 
> {noformat}
> server log shows:
> {noformat}
> WARN  | 16 Nov 2015 19:48:22,738 | org.eclipse.jetty.servlet.ServletHandler | 
> /templeton/v1/ddl
> java.lang.NullPointerException
>   at 
> org.apache.http.client.utils.URLEncodedUtils.parse(URLEncodedUtils.java:235) 
> ~[hive-jdbc-1.2.1.2.3.5.0-13-standalone.jar:1.2.1.2.3.5.0-13]
>   at 
> org.apache.hadoop.security.authentication.server.PseudoAuthenticationHandler.getUserName(PseudoAuthenticationHandler.java:143)
>  ~[hadoop-auth-2.6.0.jar:?]
>   at 
> org.apache.hadoop.security.authentication.server.PseudoAuthenticationHandler.authenticate(PseudoAuthenticationHandler.java:179)
>  ~[hadoop-auth-2.6.0.jar:?]
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:507)
>  ~[hadoop-auth-2.6.0.jar:?]
>   at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:88) 
> ~[hadoop-hdfs-2.7.1.2.3.5.0-13.jar:?]
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1331)
>  ~[jetty-all-7.6.0.v20120127.jar:7.6.0.v20120127]
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:477) 
> [jetty-all-7.6.0.v20120127.jar:7.6.0.v20120127]
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1031)
>  [jetty-all-7.6.0.v20120127.jar:7.6.0.v20120127]
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:406) 
> [jetty-all-7.6.0.v20120127.jar:7.6.0.v20120127]
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:965)
>  [jetty-all-7.6.0.v20120127.jar:7.6.0.v20120127]
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) 
> [jetty-all-7.6.0.v20120127.jar:7.6.0.v20120127]
>   at 
> org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47) 
> [jetty-all-7.6.0.v20120127.jar:7.6.0.v20120127]
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
>  [jetty-all-7.6.0.v20120127.jar:7.6.0.v20120127]
>   at org.eclipse.jetty.server.Server.handle(Server.java:349) 
> [jetty-all-7.6.0.v20120127.jar:7.6.0.v20120127]
>   at 
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:449)
>  [jetty-all-7.6.0.v20120127.jar:7.6.0.v20120127]
>   at 
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:925)
>  [jetty-all-7.6.0.v20120127.jar:7.6.0.v20120127]
>   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857) 
> [jetty-all-7.6.0.v20120127.jar:7.6.0.v20120127]
>   at 
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) 
> [jetty-all-7.6.0.v20120127.jar:7.6.0.v20120127]
>   at 
> org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:76)
>  [jetty-all-7.6.0.v20120127.jar:7.6.0.v20120127]
>   at 
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:609)
>  [jetty-all-7.6.0.v20120127.jar:7.6.0.v20120127]
>   at 
> org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:45)
>  [jetty-all-7.6.0.v20120127.jar:7.6.0.v20120127]
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
>  [jetty-all-7.6.0.v20120127.jar:7.6.0.v20120127]
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
>  [jetty-all-7.6.0.v20120127.jar:7.6.0.v20120127]
>   at java.lang.Thread.run(Thread.java:745) [?:1.8.0_45]
> WARN  | 16 Nov 2015 19:48:22,738 | org.eclipse.jetty.servlet.ServletHandler | 
> /templeton/v1/ddl
> java.lang.NullPointerException
>   at 
> org.apache.http.client.utils.URLEncodedUtils.parse(URLEncodedUtils.java:235) 
> ~[hive-jdbc-1.2.1.2.3.5.0-13-standalone.jar:1.2.1.2.3.5.0-13]
>   at 
> 

[jira] [Updated] (HIVE-12473) DPP: UDFs on the partition column side does not evaluate correctly

2015-11-19 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-12473:
---
Description: 
Related to HIVE-12462

{code}

select count(1) from accounts a, transactions t where year(a.dt) = year(t.dt) 
and account_id = 22;

$hdt$_0:$hdt$_1:a
  TableScan (TS_2)
alias: a
filterExpr: (((account_id = 22) and year(dt) is not null) and (year(dt)) IN 
(RS[6])) (type: boolean)
{code}

Ends up being evaluated as {{year(cast(dt as int))}} because the pruner only 
checks for final type, not the column type.

{code}
ObjectInspector oi =

PrimitiveObjectInspectorFactory.getPrimitiveWritableObjectInspector(TypeInfoFactory
.getPrimitiveTypeInfo(si.fieldInspector.getTypeName()));

Converter converter =
ObjectInspectorConverters.getConverter(
PrimitiveObjectInspectorFactory.javaStringObjectInspector, oi);
{code}

  was:
Related to HIVE-12462

{code}
$hdt$_0:$hdt$_1:a
  TableScan (TS_2)
alias: a
filterExpr: (((account_id = 22) and year(dt) is not null) and (year(dt)) IN 
(RS[6])) (type: boolean)
{code}

Ends up being evaluated as {{year(cast(dt as int))}} because the pruner only 
checks for final type, not the column type.

{code}
ObjectInspector oi =

PrimitiveObjectInspectorFactory.getPrimitiveWritableObjectInspector(TypeInfoFactory
.getPrimitiveTypeInfo(si.fieldInspector.getTypeName()));

Converter converter =
ObjectInspectorConverters.getConverter(
PrimitiveObjectInspectorFactory.javaStringObjectInspector, oi);
{code}


> DPP: UDFs on the partition column side does not evaluate correctly
> --
>
> Key: HIVE-12473
> URL: https://issues.apache.org/jira/browse/HIVE-12473
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.3.0, 1.2.1, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>
> Related to HIVE-12462
> {code}
> select count(1) from accounts a, transactions t where year(a.dt) = year(t.dt) 
> and account_id = 22;
> $hdt$_0:$hdt$_1:a
>   TableScan (TS_2)
> alias: a
> filterExpr: (((account_id = 22) and year(dt) is not null) and (year(dt)) 
> IN (RS[6])) (type: boolean)
> {code}
> Ends up being evaluated as {{year(cast(dt as int))}} because the pruner only 
> checks for final type, not the column type.
> {code}
> ObjectInspector oi =
> 
> PrimitiveObjectInspectorFactory.getPrimitiveWritableObjectInspector(TypeInfoFactory
> .getPrimitiveTypeInfo(si.fieldInspector.getTypeName()));
> Converter converter =
> ObjectInspectorConverters.getConverter(
> PrimitiveObjectInspectorFactory.javaStringObjectInspector, oi);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12473) DPP: UDFs on the partition column side does not evaluate correctly

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-12473:
---

Assignee: Sergey Shelukhin  (was: Gopal V)

> DPP: UDFs on the partition column side does not evaluate correctly
> --
>
> Key: HIVE-12473
> URL: https://issues.apache.org/jira/browse/HIVE-12473
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.3.0, 1.2.1, 2.0.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
>
> Related to HIVE-12462
> {code}
> select count(1) from accounts a, transactions t where year(a.dt) = year(t.dt) 
> and account_id = 22;
> $hdt$_0:$hdt$_1:a
>   TableScan (TS_2)
> alias: a
> filterExpr: (((account_id = 22) and year(dt) is not null) and (year(dt)) 
> IN (RS[6])) (type: boolean)
> {code}
> Ends up being evaluated as {{year(cast(dt as int))}} because the pruner only 
> checks for final type, not the column type.
> {code}
> ObjectInspector oi =
> 
> PrimitiveObjectInspectorFactory.getPrimitiveWritableObjectInspector(TypeInfoFactory
> .getPrimitiveTypeInfo(si.fieldInspector.getTypeName()));
> Converter converter =
> ObjectInspectorConverters.getConverter(
> PrimitiveObjectInspectorFactory.javaStringObjectInspector, oi);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12075) add analyze command to explictly cache file metadata in HBase metastore

2015-11-19 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014785#comment-15014785
 ] 

Alan Gates commented on HIVE-12075:
---

Left some comments on RB.

> add analyze command to explictly cache file metadata in HBase metastore
> ---
>
> Key: HIVE-12075
> URL: https://issues.apache.org/jira/browse/HIVE-12075
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12075.01.nogen.patch, HIVE-12075.01.patch, 
> HIVE-12075.02.patch, HIVE-12075.nogen.patch, HIVE-12075.patch
>
>
> ANALYZE TABLE (spec as usual) CACHE METADATA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12039) Fix TestSSL#testSSLVersion

2015-11-19 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-12039:

Target Version/s: 2.0.0

> Fix TestSSL#testSSLVersion 
> ---
>
> Key: HIVE-12039
> URL: https://issues.apache.org/jira/browse/HIVE-12039
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-12039.1.patch
>
>
> Looks like it's only run on Linux and failing after HIVE-11720.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12474) ORDER BY should handle column refs in parantheses

2015-11-19 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-12474:
---
Description: 
CREATE TABLE test(a INT, b INT, c INT)
COMMENT 'This is a test table';

hive>
select lead(c) over (order by (a,b)) from test limit 10;
FAILED: ParseException line 1:31 missing ) at ',' near ')'
line 1:34 missing EOF at ')' near ')'

hive>
select lead(c) over (order by a,b) from test limit 10;

- Works as expected.

It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows 
this:
https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129

For example, this syntax is still valid:
select lead(c) over (sort by (a,b)) from test limit 10;


  was:
CREATE TABLE test(a INT, b INT, c INT)
COMMENT 'This is a test table';

hive>
select lead(c) over (order by (a,b)) from test limit 10;
FAILED: ParseException line 1:31 missing ) at ',' near ')'
line 1:34 missing EOF at ')' near ')'

hive>
select lead(c) over (order by a,b) from test limit 10;

- Works as expected.

It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows 
this:
https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129

For example, this syntax is still valid:
select lead(c) over (sort by (a,b)) from test limit 10;

This is related to changes that were made as a part of HIVE-6617


> ORDER BY should handle column refs in parantheses
> -
>
> Key: HIVE-12474
> URL: https://issues.apache.org/jira/browse/HIVE-12474
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.0.0, 1.2.1
>Reporter: Aaron Tokhy
>Assignee: Pengcheng Xiong
>Priority: Minor
>
> CREATE TABLE test(a INT, b INT, c INT)
> COMMENT 'This is a test table';
> hive>
> select lead(c) over (order by (a,b)) from test limit 10;
> FAILED: ParseException line 1:31 missing ) at ',' near ')'
> line 1:34 missing EOF at ')' near ')'
> hive>
> select lead(c) over (order by a,b) from test limit 10;
> - Works as expected.
> It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows 
> this:
> https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129
> For example, this syntax is still valid:
> select lead(c) over (sort by (a,b)) from test limit 10;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12474) ORDER BY should handle column refs in parantheses

2015-11-19 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-12474:
---
Description: 
CREATE TABLE test(a INT, b INT, c INT)
COMMENT 'This is a test table';

hive>
select lead(c) over (order by (a,b)) from test limit 10;
FAILED: ParseException line 1:31 missing ) at ',' near ')'
line 1:34 missing EOF at ')' near ')'

hive>
select lead(c) over (order by a,b) from test limit 10;

- Works as expected.

It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows 
this:
https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129

For example, this syntax is still valid:
select lead(c) over (sort by (a,b)) from test limit 10;

This is related to changes that were made as a part of HIVE-6617

  was:
CREATE TABLE test(a INT, b INT, c INT)
COMMENT 'This is a test table';

hive>
select lead(c) over (order by (a,b)) from test limit 10;
FAILED: ParseException line 1:31 missing ) at ',' near ')'
line 1:34 missing EOF at ')' near ')'

hive>
select lead(c) over (order by a,b) from test limit 10;

- Works as expected.

It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows 
this:
https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129

For example, this syntax is still valid:
select lead(c) over (sort by (a,b)) from test limit 10;


> ORDER BY should handle column refs in parantheses
> -
>
> Key: HIVE-12474
> URL: https://issues.apache.org/jira/browse/HIVE-12474
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.0.0, 1.2.1
>Reporter: Aaron Tokhy
>Assignee: Pengcheng Xiong
>Priority: Minor
>
> CREATE TABLE test(a INT, b INT, c INT)
> COMMENT 'This is a test table';
> hive>
> select lead(c) over (order by (a,b)) from test limit 10;
> FAILED: ParseException line 1:31 missing ) at ',' near ')'
> line 1:34 missing EOF at ')' near ')'
> hive>
> select lead(c) over (order by a,b) from test limit 10;
> - Works as expected.
> It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows 
> this:
> https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129
> For example, this syntax is still valid:
> select lead(c) over (sort by (a,b)) from test limit 10;
> This is related to changes that were made as a part of HIVE-6617



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12474) ORDER BY should handle column refs in parantheses

2015-11-19 Thread Dongwook Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014793#comment-15014793
 ] 

Dongwook Kwon commented on HIVE-12474:
--

https://issues.apache.org/jira/browse/HIVE-5607

It seems changed by design.
Release note from HIVE-5607

>From 0.10.0 to 0.13.0, the following syntax for order by is allowed (though 
>the doc doesn't specify):

select * from table order by (expr1, exp2);

>From 0.14, the above syntax is illegal. Instead, the following should be used:

select * from table order by expr1, exp2; 

No?

> ORDER BY should handle column refs in parantheses
> -
>
> Key: HIVE-12474
> URL: https://issues.apache.org/jira/browse/HIVE-12474
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.0.0, 1.2.1
>Reporter: Aaron Tokhy
>Assignee: Pengcheng Xiong
>Priority: Minor
>
> CREATE TABLE test(a INT, b INT, c INT)
> COMMENT 'This is a test table';
> hive>
> select lead(c) over (order by (a,b)) from test limit 10;
> FAILED: ParseException line 1:31 missing ) at ',' near ')'
> line 1:34 missing EOF at ')' near ')'
> hive>
> select lead(c) over (order by a,b) from test limit 10;
> - Works as expected.
> It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows 
> this:
> https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129
> For example, this syntax is still valid:
> select lead(c) over (sort by (a,b)) from test limit 10;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12475) Parquet schema evolution within array<struct<>> doesn't work

2015-11-19 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-12475:
-
Attachment: HIVE-12475.1.patch

Uploading the patch. Thanks [~spena] for the help.


> Parquet schema evolution within array> doesn't work
> 
>
> Key: HIVE-12475
> URL: https://issues.apache.org/jira/browse/HIVE-12475
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 1.1.0
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-12475.1.patch
>
>
> If we create a table with type array>, and later added a field in 
> the struct, we got the following exception.
> The following SQL statements would recreate the error:
> {quote}
> CREATE TABLE pq_test (f1 array>) STORED AS  PARQUET;
> INSERT INTO TABLE pq_test select array(named_struct("c1",1,"c2",2)) FROM tmp 
> LIMIT 2;
> SELECT * from pq_test;
> ALTER TABLE pq_test REPLACE COLUMNS (f1 
> array>); //* cc
> SELECT * from pq_test;
> {quote}
> Exception:
> {quote}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:142)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:363)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:316)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:199)
> at 
> org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:61)
> at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:236)
> at 
> org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
> at 
> org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:71)
> at 
> org.apache.hadoop.hive.ql.exec.DefaultFetchFormatter.convert(DefaultFetchFormatter.java:40)
> at 
> org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:89)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12450) OrcFileMergeOperator does not use correct compression buffer size

2015-11-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15015022#comment-15015022
 ] 

Hive QA commented on HIVE-12450:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12773101/HIVE-12450.4.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9869 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6077/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6077/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6077/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12773101 - PreCommit-HIVE-TRUNK-Build

> OrcFileMergeOperator does not use correct compression buffer size
> -
>
> Key: HIVE-12450
> URL: https://issues.apache.org/jira/browse/HIVE-12450
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 1.2.0, 1.3.0, 1.2.1, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-12450.1.patch, HIVE-12450.2.patch, 
> HIVE-12450.3.patch, HIVE-12450.4.patch, zlib-hang.png
>
>
> OrcFileMergeOperator checks for compatibility before merging orc files. This 
> compatibility check include checking compression buffer size. But the output 
> file that is created does not honor the compression buffer size and always 
> defaults to 256KB. This will not be a problem when reading the orc file but 
> can create unwanted memory pressure because of wasted space within 
> compression buffer.
> This issue also can make the merged file unreadable under certain cases. For 
> example, if the original compression buffer size is 8KB and if  
> hive.exec.orc.default.buffer.size is set to 4KB. The merge file operator will 
> use 4KB instead of actual 8KB which can result in hanging of ORC reader (more 
> specifically ZlibCodec will wait for more compression buffers). 
> {code:title=jstack output for hanging issue}
> "main" prio=5 tid=0x7fc07300 nid=0x1703 runnable [0x70218000]
>java.lang.Thread.State: RUNNABLE
>   at java.util.zip.Inflater.inflateBytes(Native Method)
>   at java.util.zip.Inflater.inflate(Inflater.java:259)
>   - locked <0x0007f5d5fdc8> (a java.util.zip.ZStreamRef)
>   at 
> org.apache.hadoop.hive.ql.io.orc.ZlibCodec.decompress(ZlibCodec.java:94)
>   at 
> org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:238)
>   at 
> org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:262)
>   at java.io.InputStream.read(InputStream.java:101)
>   at 
> com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737)
>   at 
> com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701)
>   at 
> com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.(OrcProto.java:10661)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.(OrcProto.java:10625)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10730)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10725)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:89)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:95)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.parseFrom(OrcProto.java:10958)
>   at 
> org.apache.hadoop.hive.ql.io.orc.MetadataReaderImpl.readStripeFooter(MetadataReaderImpl.java:114)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:240)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:847)
> 

[jira] [Updated] (HIVE-12465) Hive might produce wrong results when (outer) joins are merged

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12465:

Priority: Blocker  (was: Major)

> Hive might produce wrong results when (outer) joins are merged
> --
>
> Key: HIVE-12465
> URL: https://issues.apache.org/jira/browse/HIVE-12465
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
>
> Consider the following query:
> {noformat}
> select * from
>   (select * from tab where tab.key = 0)a
> full outer join
>   (select * from tab_part where tab_part.key = 98)b
> join
>   tab_part c
> on a.key = b.key and b.key = c.key;
> {noformat}
> Hive should execute the full outer join operation (without ON clause) and 
> then the join operation (ON a.key = b.key and b.key = c.key). Instead, it 
> merges both joins, generating the following plan:
> {noformat}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: tab
> filterExpr: (key = 0) (type: boolean)
> Statistics: Num rows: 242 Data size: 22748 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: (key = 0) (type: boolean)
>   Statistics: Num rows: 121 Data size: 11374 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: 0 (type: int), value (type: string), ds (type: 
> string)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 121 Data size: 11374 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   key expressions: _col0 (type: int)
>   sort order: +
>   Map-reduce partition columns: _col0 (type: int)
>   Statistics: Num rows: 121 Data size: 11374 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col1 (type: string), _col2 (type: 
> string)
>   TableScan
> alias: tab_part
> filterExpr: (key = 98) (type: boolean)
> Statistics: Num rows: 500 Data size: 47000 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: (key = 98) (type: boolean)
>   Statistics: Num rows: 250 Data size: 23500 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: 98 (type: int), value (type: string), ds (type: 
> string)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 250 Data size: 23500 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   key expressions: _col0 (type: int)
>   sort order: +
>   Map-reduce partition columns: _col0 (type: int)
>   Statistics: Num rows: 250 Data size: 23500 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col1 (type: string), _col2 (type: 
> string)
>   TableScan
> alias: c
> Statistics: Num rows: 500 Data size: 47000 Basic stats: COMPLETE 
> Column stats: NONE
> Reduce Output Operator
>   key expressions: key (type: int)
>   sort order: +
>   Map-reduce partition columns: key (type: int)
>   Statistics: Num rows: 500 Data size: 47000 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: value (type: string), ds (type: string)
>   Reduce Operator Tree:
> Join Operator
>   condition map:
>Outer Join 0 to 1
>Inner Join 1 to 2
>   keys:
> 0 _col0 (type: int)
> 1 _col0 (type: int)
> 2 key (type: int)
>   outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, 
> _col7, _col8
>   Statistics: Num rows: 1100 Data size: 103400 Basic stats: COMPLETE 
> Column stats: NONE
>   File Output Operator
> compressed: false
> Statistics: Num rows: 1100 Data size: 103400 Basic stats: 
> COMPLETE Column stats: NONE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> ListSink
> {noformat}
> That plan is equivalent 

[jira] [Commented] (HIVE-12474) ORDER BY should handle column refs in parantheses

2015-11-19 Thread Dongwook Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014794#comment-15014794
 ] 

Dongwook Kwon commented on HIVE-12474:
--

As  Prasad Mujumdar pointed out, it should be documented, if this change was 
intentional. 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy

> ORDER BY should handle column refs in parantheses
> -
>
> Key: HIVE-12474
> URL: https://issues.apache.org/jira/browse/HIVE-12474
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.0.0, 1.2.1
>Reporter: Aaron Tokhy
>Assignee: Pengcheng Xiong
>Priority: Minor
>
> CREATE TABLE test(a INT, b INT, c INT)
> COMMENT 'This is a test table';
> hive>
> select lead(c) over (order by (a,b)) from test limit 10;
> FAILED: ParseException line 1:31 missing ) at ',' near ')'
> line 1:34 missing EOF at ')' near ')'
> hive>
> select lead(c) over (order by a,b) from test limit 10;
> - Works as expected.
> It appears that 'cluster by'/'sort by'/'distribute by'/'partition by' allows 
> this:
> https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g#L129
> For example, this syntax is still valid:
> select lead(c) over (sort by (a,b)) from test limit 10;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12367) Lock/unlock database should add current database to inputs and outputs of authz hook

2015-11-19 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014838#comment-15014838
 ] 

Alan Gates commented on HIVE-12367:
---

The test failures definitely look relevant and need to be fixed or regenerated. 
 

The patch looks fine.  A comment on why you are selecting DDL_NO_LOCK in the 
lock function would be helpful, since it's confusing.  Also, analyzeLockTable 
and analyzeUnlockTable suffer from the same problem.  I don't know if you want 
to fix those as well.

> Lock/unlock database should add current database to inputs and outputs of 
> authz hook
> 
>
> Key: HIVE-12367
> URL: https://issues.apache.org/jira/browse/HIVE-12367
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.1
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Attachments: HIVE-12367.001.patch, HIVE-12367.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12338) Add webui to HiveServer2

2015-11-19 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-12338:
---
Attachment: HIVE-12338.1.patch

> Add webui to HiveServer2
> 
>
> Key: HIVE-12338
> URL: https://issues.apache.org/jira/browse/HIVE-12338
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Attachments: HIVE-12338.1.patch, hs2-conf.png, hs2-logs.png, 
> hs2-metrics.png, hs2-webui.png
>
>
> A web ui for HiveServer2 can show some useful information such as:
>  
> 1. Sessions,
> 2. Queries that are executing on the HS2, their states, starting time, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12338) Add webui to HiveServer2

2015-11-19 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014339#comment-15014339
 ] 

Jimmy Xiang commented on HIVE-12338:


I attached some screen shots and the first patch, which is also on RB: 
https://reviews.apache.org/r/40500/

> Add webui to HiveServer2
> 
>
> Key: HIVE-12338
> URL: https://issues.apache.org/jira/browse/HIVE-12338
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Attachments: HIVE-12338.1.patch, hs2-conf.png, hs2-logs.png, 
> hs2-metrics.png, hs2-webui.png
>
>
> A web ui for HiveServer2 can show some useful information such as:
>  
> 1. Sessions,
> 2. Queries that are executing on the HS2, their states, starting time, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12465) Hive might produce wrong results when (outer) joins are merged

2015-11-19 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-12465:
---
Description: 
Consider the following query:

{noformat}
select * from
  (select * from tab where tab.key = 0)a
full outer join
  (select * from tab_part where tab_part.key = 98)b
join
  tab_part c
on a.key = b.key and b.key = c.key;
{noformat}

Hive should execute the full outer join operation (without ON clause) and then 
the join operation (ON a.key = b.key and b.key = c.key). Instead, it merges 
both joins, generating the following plan:

{noformat}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: tab
filterExpr: (key = 0) (type: boolean)
Statistics: Num rows: 242 Data size: 22748 Basic stats: COMPLETE 
Column stats: NONE
Filter Operator
  predicate: (key = 0) (type: boolean)
  Statistics: Num rows: 121 Data size: 11374 Basic stats: COMPLETE 
Column stats: NONE
  Select Operator
expressions: 0 (type: int), value (type: string), ds (type: 
string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 121 Data size: 11374 Basic stats: 
COMPLETE Column stats: NONE
Reduce Output Operator
  key expressions: _col0 (type: int)
  sort order: +
  Map-reduce partition columns: _col0 (type: int)
  Statistics: Num rows: 121 Data size: 11374 Basic stats: 
COMPLETE Column stats: NONE
  value expressions: _col1 (type: string), _col2 (type: string)
  TableScan
alias: tab_part
filterExpr: (key = 98) (type: boolean)
Statistics: Num rows: 500 Data size: 47000 Basic stats: COMPLETE 
Column stats: NONE
Filter Operator
  predicate: (key = 98) (type: boolean)
  Statistics: Num rows: 250 Data size: 23500 Basic stats: COMPLETE 
Column stats: NONE
  Select Operator
expressions: 98 (type: int), value (type: string), ds (type: 
string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 250 Data size: 23500 Basic stats: 
COMPLETE Column stats: NONE
Reduce Output Operator
  key expressions: _col0 (type: int)
  sort order: +
  Map-reduce partition columns: _col0 (type: int)
  Statistics: Num rows: 250 Data size: 23500 Basic stats: 
COMPLETE Column stats: NONE
  value expressions: _col1 (type: string), _col2 (type: string)
  TableScan
alias: c
Statistics: Num rows: 500 Data size: 47000 Basic stats: COMPLETE 
Column stats: NONE
Reduce Output Operator
  key expressions: key (type: int)
  sort order: +
  Map-reduce partition columns: key (type: int)
  Statistics: Num rows: 500 Data size: 47000 Basic stats: COMPLETE 
Column stats: NONE
  value expressions: value (type: string), ds (type: string)
  Reduce Operator Tree:
Join Operator
  condition map:
   Outer Join 0 to 1
   Inner Join 1 to 2
  keys:
0 _col0 (type: int)
1 _col0 (type: int)
2 key (type: int)
  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, 
_col7, _col8
  Statistics: Num rows: 1100 Data size: 103400 Basic stats: COMPLETE 
Column stats: NONE
  File Output Operator
compressed: false
Statistics: Num rows: 1100 Data size: 103400 Basic stats: COMPLETE 
Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
ListSink
{noformat}

That plan is equivalent to the following query, which is different than the 
original one:
{noformat}
select * from
  (select * from tab where tab.key = 0)a
full outer join
  (select * from tab_part where tab_part.key = 98)b
on a.key = b.key
join
  tab_part c
on b.key = c.key;
{noformat}

It seems to be a problem in the recognition of join operations that can be 
merged into a single multijoin operator.

  was:
Consider the following query:

{noformat}
select * from
  (select * from tab where tab.key = 0)a
full outer join
  (select * from tab_part where tab_part.key = 98)b
join
  tab_part c
on a.key = b.key and b.key = c.key;
{noformat}

Hive should execute the full outer join operation 

[jira] [Commented] (HIVE-12313) Turn hive.optimize.union.remove on by default

2015-11-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014362#comment-15014362
 ] 

Hive QA commented on HIVE-12313:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12772900/HIVE-12313.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 9865 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt21
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union33
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6076/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6076/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6076/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12772900 - PreCommit-HIVE-TRUNK-Build

> Turn hive.optimize.union.remove on by default
> -
>
> Key: HIVE-12313
> URL: https://issues.apache.org/jira/browse/HIVE-12313
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12313.1.patch, HIVE-12313.2.patch, HIVE-12313.patch
>
>
> This optimization always helps. It should be on by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12338) Add webui to HiveServer2

2015-11-19 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-12338:
---
Attachment: hs2-webui.png
hs2-metrics.png
hs2-logs.png
hs2-conf.png

> Add webui to HiveServer2
> 
>
> Key: HIVE-12338
> URL: https://issues.apache.org/jira/browse/HIVE-12338
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Attachments: HIVE-12338.1.patch, hs2-conf.png, hs2-logs.png, 
> hs2-metrics.png, hs2-webui.png
>
>
> A web ui for HiveServer2 can show some useful information such as:
>  
> 1. Sessions,
> 2. Queries that are executing on the HS2, their states, starting time, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12460) Fix branch-1 build

2015-11-19 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014448#comment-15014448
 ] 

Jimmy Xiang commented on HIVE-12460:


+1

> Fix branch-1 build
> --
>
> Key: HIVE-12460
> URL: https://issues.apache.org/jira/browse/HIVE-12460
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 1.3.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12460-branch-1.patch
>
>
> Caused by a merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12341) LLAP: add security to daemon protocol endpoint (excluding shuffle)

2015-11-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014616#comment-15014616
 ] 

Sergey Shelukhin commented on HIVE-12341:
-

[~vikram.dixit] maybe you can review?

> LLAP: add security to daemon protocol endpoint (excluding shuffle)
> --
>
> Key: HIVE-12341
> URL: https://issues.apache.org/jira/browse/HIVE-12341
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12341.01.patch, HIVE-12341.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12437) SMB join in tez fails when one of the tables is empty

2015-11-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014679#comment-15014679
 ] 

Sergey Shelukhin commented on HIVE-12437:
-

Should I commit this?

> SMB join in tez fails when one of the tables is empty
> -
>
> Key: HIVE-12437
> URL: https://issues.apache.org/jira/browse/HIVE-12437
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.0.1, 1.2.1
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>Priority: Critical
> Attachments: HIVE-12437.1.patch, HIVE-12437.2.patch
>
>
> It looks like a better check for empty tables is to depend on the existence 
> of the record reader for the input from tez. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12472) Add test case for HIVE-10592

2015-11-19 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014687#comment-15014687
 ] 

Prasanth Jayachandran commented on HIVE-12472:
--

I don't think we need a full precommit test run for this.

> Add test case for HIVE-10592
> 
>
> Key: HIVE-12472
> URL: https://issues.apache.org/jira/browse/HIVE-12472
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12472.patch
>
>
> HIVE-10592 has a fix for the following NPE issue (table should have all 
> columns values as null for timestamp and date columns)
> {code:title=query}
> set hive.optimize.index.filter=true;
> select count(*) from orctable where timestamp_col is null;
> select count(*) from orctable where date_col is null;
> {code}
> {code:title=exception}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$TimestampStatisticsImpl.getMinimum(ColumnStatisticsImpl.java:845)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.getMin(RecordReaderImpl.java:308)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.evaluatePredicateProto(RecordReaderImpl.java:332)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:710)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.pickRowGroups(RecordReaderImpl.java:751)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:777)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:205)
>   at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:598)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:183)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.(OrcRawRecordMerger.java:226)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:437)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1235)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1117)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
>   ... 26 more
> ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 
> killedTasks:1, Vertex vertex_1446768202865_0008_5_00 [Map 1] killed/failed 
> due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. 
> failedVertices:1 killedVertices:0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11603) IndexOutOfBoundsException thrown when accessing a union all subquery and filtering on a column which does not exist in all underlying tables

2015-11-19 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014688#comment-15014688
 ] 

Laljo John Pullokkaran commented on HIVE-11603:
---

Patch needs to be revised and is blocked by  HIVE-12355.
I hope to get to it in next week or so.

> IndexOutOfBoundsException thrown when accessing a union all subquery and 
> filtering on a column which does not exist in all underlying tables
> 
>
> Key: HIVE-11603
> URL: https://issues.apache.org/jira/browse/HIVE-11603
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.3.0, 1.2.1
> Environment: Hadoop 2.6
>Reporter: Nicholas Brenwald
>Assignee: Laljo John Pullokkaran
>Priority: Minor
> Attachments: HIVE-11603.1.patch, HIVE-11603.2.patch
>
>
> Create two empty tables t1 and t2
> {code}
> CREATE TABLE t1(c1 STRING);
> CREATE TABLE t2(c1 STRING, c2 INT);
> {code}
> Create a view on these two tables
> {code}
> CREATE VIEW v1 AS 
> SELECT c1, c2 
> FROM (
> SELECT c1, CAST(NULL AS INT) AS c2 FROM t1
> UNION ALL
> SELECT c1, c2 FROM t2
> ) x;
> {code}
> Then run
> {code}
> SELECT COUNT(*) from v1 
> WHERE c2 = 0;
> {code}
> We expect to get a result of zero, but instead the query fails with stack 
> trace:
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:119)
>   ... 22 more
> {code}
> Workarounds include disabling ppd,
> {code}
> set hive.optimize.ppd=false;
> {code}
> Or changing the view so that column c2 is null cast to double:
> {code}
> CREATE VIEW v1_workaround AS 
> SELECT c1, c2 
> FROM (
> SELECT c1, CAST(NULL AS DOUBLE) AS c2 FROM t1
> UNION ALL
> SELECT c1, c2 FROM t2
> ) x;
> {code}
> The problem seems to occur in branch-1.1, branch-1.2, branch-1 but seems to 
> be resolved in master (2.0.0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12356) Capture if a rule mutated the plan.

2015-11-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014706#comment-15014706
 ] 

Sergey Shelukhin commented on HIVE-12356:
-

Needed for 2.0?

> Capture if a rule mutated the plan.
> ---
>
> Key: HIVE-12356
> URL: https://issues.apache.org/jira/browse/HIVE-12356
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
>
> Currently Hive Optimizer doesn't capture if a rule mutated the plan.
> This info could be useful in:
> 1. determining if a subsequent optimization rule needs to be run or not
>(Ex if Constant propagation didn't mutate plan, then don't run subsequent 
> PPD)
> 2. Explain can contain info about which optimizations are applied effectively 
> on the query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12437) SMB join in tez fails when one of the tables is empty

2015-11-19 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014707#comment-15014707
 ] 

Vikram Dixit K commented on HIVE-12437:
---

Unit test results haven't come back yet. We should wait for that.

> SMB join in tez fails when one of the tables is empty
> -
>
> Key: HIVE-12437
> URL: https://issues.apache.org/jira/browse/HIVE-12437
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.0.1, 1.2.1
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>Priority: Critical
> Attachments: HIVE-12437.1.patch, HIVE-12437.2.patch
>
>
> It looks like a better check for empty tables is to depend on the existence 
> of the record reader for the input from tez. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12424) Make use of Kryo's Object-to-Object deep copy

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12424:

Target Version/s:   (was: 2.0.0)

> Make use of Kryo's Object-to-Object deep copy
> -
>
> Key: HIVE-12424
> URL: https://issues.apache.org/jira/browse/HIVE-12424
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Currently, plan serialization and operator tree serialization uses Object -> 
> bytes -> Object approach for deep copy. It also uses ByteArrayOutputStream as 
> intermediate buffer whose write method is synchronized. Similarly read from 
> ByteArrayInputStream is also synchronized. Also Utilities.clonePlan() creates 
> a new HiveConf object that scans through conf directories and adds site.xml 
> which is an expensive operation. All these can be avoided using Kryo's Object 
> -> Object deep copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12020) Revert log4j2 xml configuration to properties based configuration

2015-11-19 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014611#comment-15014611
 ] 

Prasanth Jayachandran commented on HIVE-12020:
--

Upgrade pains will still be there as old properties based config and new 
properties config are not compatible. But with this it will be relatively 
easier to write migration tool.

> Revert log4j2 xml configuration to properties based configuration
> -
>
> Key: HIVE-12020
> URL: https://issues.apache.org/jira/browse/HIVE-12020
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Log4j 2.4 release brought back properties based configuration. We should 
> revert XML based configuration and use properties based configuration instead 
> (less verbose and will be similar to old log4j properties). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12369) Faster Vector GroupBy

2015-11-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014612#comment-15014612
 ] 

Sergey Shelukhin commented on HIVE-12369:
-

Should this target 2.0? The tests have failed... 
I can continue the review today/tomorrow.

> Faster Vector GroupBy
> -
>
> Key: HIVE-12369
> URL: https://issues.apache.org/jira/browse/HIVE-12369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12369.01.patch
>
>
> Implement fast Vector GroupBy using fast hash table technology developed for 
> Native Vector MapJoin and vector key handling developed for recent HIVE-12290 
> Native Vector ReduceSink JIRA.
> (Patch also includes making Native Vector MapJoin use Hybrid Grace -- but 
> that can be separated out)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12369) Faster Vector GroupBy

2015-11-19 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014659#comment-15014659
 ] 

Matt McCline commented on HIVE-12369:
-

[~sershe] I have a new version coming -- so hold on for a moment on the review 
-- thanks.

> Faster Vector GroupBy
> -
>
> Key: HIVE-12369
> URL: https://issues.apache.org/jira/browse/HIVE-12369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12369.01.patch
>
>
> Implement fast Vector GroupBy using fast hash table technology developed for 
> Native Vector MapJoin and vector key handling developed for recent HIVE-12290 
> Native Vector ReduceSink JIRA.
> (Patch also includes making Native Vector MapJoin use Hybrid Grace -- but 
> that can be separated out)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11481) hive incorrectly set extended ACLs for unnamed group for new databases/tables with inheritPerms enabled

2015-11-19 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014672#comment-15014672
 ] 

Szehon Ho commented on HIVE-11481:
--

Hi Carita, I spent some time reading up on default ACL's and taking a deeper 
look and have some review questions.

1.  Shouldn't we also set default ACL's on the child, if they are a directory?  
This code maybe called in situation where input is a nested directory (like 
multi-column partition tables).  "When a directory is created inside a 
directory that has a default ACL, the new directory inherits the parent 
directory's default ACL both as its access ACL and default ACL."


2.  Do we still need to remove the base ACL's regardless of whether there are 
no defaults?  If I recall correctly it was to prevent some duplicates (as you 
are again setting USER and OTHER). 

3.  Can you write a test case that uses DEFAULT Acl's?  The test you added 
seems to use AclEntryScope.ACCESS but not DEFAULT.

> hive incorrectly set extended ACLs for unnamed group for new databases/tables 
> with inheritPerms enabled
> ---
>
> Key: HIVE-11481
> URL: https://issues.apache.org/jira/browse/HIVE-11481
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.2.1
>Reporter: Carita Ou
>Assignee: Carita Ou
>Priority: Minor
> Attachments: HIVE-11481.1.patch, HIVE-11481.2.patch
>
>
> $ hadoop fs -chmod 700 /user/hive/warehouse
> $ hadoop fs -setfacl -m user:user1:rwx /user/hive/warehouse
> $ hadoop fs -setfacl -m default:user::rwx /user/hive/warehouse
> $ hadoop fs -ls /user/hive
> Found 1 items
> drwxrwx---+  - hive hadoop  0 2015-08-05 10:29 /user/hive/warehouse
> $ hadoop fs -getfacl /user/hive/warehouse
> # file: /user/hive/warehouse
> # owner: hive
> # group: hadoop
> user::rwx
> user:user1:rwx
> group::---
> mask::rwx
> other::---
> default:user::rwx
> default:group::---
> default:other::---
> In hive cli> create database testing;
> $ hadoop fs -ls /user/hive/warehouse
> Found 1 items
> drwxrwx---+  - hive hadoop  0 2015-08-05 10:44 
> /user/hive/warehouse/testing.db
> $hadoop fs -getfacl /user/hive/warehouse/testing.db
> # file: /user/hive/warehouse/testing.db
> # owner: hive
> # group: hadoop
> user::rwx
> user:user1:rwx
> group::rwx
> mask::rwx
> other::---
> default:user::rwx
> default:group::---
> default:other::---
> Since the warehouse directory has default group permission set to ---, the 
> group permissions for testing.db should also be ---
> The warehouse directory permissions show drwxrwx---+ which corresponds to 
> user:mask:other. The subdirectory group ACL is set by calling 
> FsPermission.getGroupAction() from Hadoop, which retrieves the file status 
> permission rwx instead of the actual ACL permission, which is ---. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6113) Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-11-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014691#comment-15014691
 ] 

Sergey Shelukhin commented on HIVE-6113:


Wrt the patch, it looks like all the tests have failed above.
Also given the code changes, I assume there's no way to downgrade to old DN 
jars on a running Hive build if some issue is discovered? I love it when people 
change public APIs to beautify some package name or whatever. Maybe we can try 
to find the classes by string name to be able to use both jars?

> Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
> --
>
> Key: HIVE-6113
> URL: https://issues.apache.org/jira/browse/HIVE-6113
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 0.12.0, 0.13.0, 0.14.0, 1.0.0, 1.2.1
> Environment: hadoop-0.20.2-cdh3u3,hive-0.12.0
>Reporter: William Stone
>Assignee: Oleksiy Sayankin
>Priority: Critical
>  Labels: HiveMetaStoreClient, metastore, unable_instantiate
> Attachments: HIVE-6113-2.patch, HIVE-6113.patch
>
>
> When I exccute SQL "use fdm; desc formatted fdm.tableName;"  in python, throw 
> Error as followed.
> but when I tryit again , It will success.
> 2013-12-25 03:01:32,290 ERROR exec.DDLTask (DDLTask.java:execute(435)) - 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
> Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
>   at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1143)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128)
>   at 
> org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3479)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:260)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:217)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:507)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:875)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:769)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:708)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> Caused by: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1217)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:62)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2372)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2383)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1139)
>   ... 20 more
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1210)
>   ... 25 more
> Caused by: javax.jdo.JDODataStoreException: Exception thrown flushing changes 
> to datastore
> NestedThrowables:
> java.sql.BatchUpdateException: Duplicate entry 'default' for key 
> 'UNIQUE_DATABASE'
>   at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
>   at 
> org.datanucleus.api.jdo.JDOTransaction.commit(JDOTransaction.java:165)

[jira] [Commented] (HIVE-12257) Enhance ORC FileDump utility to handle flush_length files and recovery

2015-11-19 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014690#comment-15014690
 ] 

Prasanth Jayachandran commented on HIVE-12257:
--

[~ekoifman] Could you take another look at the patch? Addressed all your review 
comments.

> Enhance ORC FileDump utility to handle flush_length files and recovery
> --
>
> Key: HIVE-12257
> URL: https://issues.apache.org/jira/browse/HIVE-12257
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12257-branch-1.patch, HIVE-12257.1.patch, 
> HIVE-12257.2.patch, HIVE-12257.3.patch, HIVE-12257.4.patch, 
> HIVE-12257.6.patch, HIVE-12257.7.patch, HIVE-12257.8.patch, HIVE-12257.9.patch
>
>
> ORC file dump utility currently does not handle delta directories that 
> contain *_flush_length files. These files contains offsets to footer in the 
> corresponding delta file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12356) Capture if a rule mutated the plan.

2015-11-19 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014712#comment-15014712
 ] 

Laljo John Pullokkaran commented on HIVE-12356:
---

Yes, this is to reduce Hive Compilation cost.

> Capture if a rule mutated the plan.
> ---
>
> Key: HIVE-12356
> URL: https://issues.apache.org/jira/browse/HIVE-12356
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
>
> Currently Hive Optimizer doesn't capture if a rule mutated the plan.
> This info could be useful in:
> 1. determining if a subsequent optimization rule needs to be run or not
>(Ex if Constant propagation didn't mutate plan, then don't run subsequent 
> PPD)
> 2. Explain can contain info about which optimizations are applied effectively 
> on the query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11358) LLAP: move LlapConfiguration into HiveConf and document the settings

2015-11-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014656#comment-15014656
 ] 

Sergey Shelukhin commented on HIVE-11358:
-

[~leftylev] can you look at the lastest patch? thanks!

> LLAP: move LlapConfiguration into HiveConf and document the settings
> 
>
> Key: HIVE-11358
> URL: https://issues.apache.org/jira/browse/HIVE-11358
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11358.01.patch, HIVE-11358.02.patch, 
> HIVE-11358.patch
>
>
> Hive uses HiveConf for configuration. LlapConfiguration should be replaced 
> with parameters in HiveConf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12472) Add test case for HIVE-10592

2015-11-19 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014686#comment-15014686
 ] 

Prasanth Jayachandran commented on HIVE-12472:
--

[~ashutoshc] Can you take a look at this one? This just adds a test case to 
already fixed bug.

> Add test case for HIVE-10592
> 
>
> Key: HIVE-12472
> URL: https://issues.apache.org/jira/browse/HIVE-12472
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12472.patch
>
>
> HIVE-10592 has a fix for the following NPE issue (table should have all 
> columns values as null for timestamp and date columns)
> {code:title=query}
> set hive.optimize.index.filter=true;
> select count(*) from orctable where timestamp_col is null;
> select count(*) from orctable where date_col is null;
> {code}
> {code:title=exception}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.io.orc.ColumnStatisticsImpl$TimestampStatisticsImpl.getMinimum(ColumnStatisticsImpl.java:845)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.getMin(RecordReaderImpl.java:308)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.evaluatePredicateProto(RecordReaderImpl.java:332)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:710)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.pickRowGroups(RecordReaderImpl.java:751)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:777)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:205)
>   at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:598)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.(OrcRawRecordMerger.java:183)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$OriginalReaderPair.(OrcRawRecordMerger.java:226)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:437)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1235)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1117)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:249)
>   ... 26 more
> ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 
> killedTasks:1, Vertex vertex_1446768202865_0008_5_00 [Map 1] killed/failed 
> due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. 
> failedVertices:1 killedVertices:0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12450) OrcFileMergeOperator does not use correct compression buffer size

2015-11-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014699#comment-15014699
 ] 

Sergey Shelukhin commented on HIVE-12450:
-

+1 pending tests

> OrcFileMergeOperator does not use correct compression buffer size
> -
>
> Key: HIVE-12450
> URL: https://issues.apache.org/jira/browse/HIVE-12450
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 1.2.0, 1.3.0, 1.2.1, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: HIVE-12450.1.patch, HIVE-12450.2.patch, 
> HIVE-12450.3.patch, HIVE-12450.4.patch, zlib-hang.png
>
>
> OrcFileMergeOperator checks for compatibility before merging orc files. This 
> compatibility check include checking compression buffer size. But the output 
> file that is created does not honor the compression buffer size and always 
> defaults to 256KB. This will not be a problem when reading the orc file but 
> can create unwanted memory pressure because of wasted space within 
> compression buffer.
> This issue also can make the merged file unreadable under certain cases. For 
> example, if the original compression buffer size is 8KB and if  
> hive.exec.orc.default.buffer.size is set to 4KB. The merge file operator will 
> use 4KB instead of actual 8KB which can result in hanging of ORC reader (more 
> specifically ZlibCodec will wait for more compression buffers). 
> {code:title=jstack output for hanging issue}
> "main" prio=5 tid=0x7fc07300 nid=0x1703 runnable [0x70218000]
>java.lang.Thread.State: RUNNABLE
>   at java.util.zip.Inflater.inflateBytes(Native Method)
>   at java.util.zip.Inflater.inflate(Inflater.java:259)
>   - locked <0x0007f5d5fdc8> (a java.util.zip.ZStreamRef)
>   at 
> org.apache.hadoop.hive.ql.io.orc.ZlibCodec.decompress(ZlibCodec.java:94)
>   at 
> org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:238)
>   at 
> org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:262)
>   at java.io.InputStream.read(InputStream.java:101)
>   at 
> com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737)
>   at 
> com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701)
>   at 
> com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.(OrcProto.java:10661)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.(OrcProto.java:10625)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10730)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10725)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:89)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:95)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.parseFrom(OrcProto.java:10958)
>   at 
> org.apache.hadoop.hive.ql.io.orc.MetadataReaderImpl.readStripeFooter(MetadataReaderImpl.java:114)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:240)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:847)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:818)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:1033)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1068)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.(RecordReaderImpl.java:217)
>   at 
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:638)
>   at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rows(ReaderImpl.java:625)
>   at 
> org.apache.hadoop.hive.ql.io.orc.FileDump.printMetaData(FileDump.java:162)
>   at org.apache.hadoop.hive.ql.io.orc.FileDump.main(FileDump.java:110)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12393) Simplify ColumnPruner when CBO optimizes the query

2015-11-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014703#comment-15014703
 ] 

Sergey Shelukhin commented on HIVE-12393:
-

Is it a big perf gain?

> Simplify ColumnPruner when CBO optimizes the query
> --
>
> Key: HIVE-12393
> URL: https://issues.apache.org/jira/browse/HIVE-12393
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> The plan for any given query optimized by CBO will always contain a Project 
> operator on top of the TS that prunes that columns that are not needed.
> Thus, there is no need for Hive optimizer to traverse the whole plan to check 
> which columns can be pruned. In fact, Hive ColumnPruner optimizer only needs 
> to match TS operators when CBO optimized the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12393) Simplify ColumnPruner when CBO optimizes the query

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12393:

Target Version/s:   (was: 2.0.0)

> Simplify ColumnPruner when CBO optimizes the query
> --
>
> Key: HIVE-12393
> URL: https://issues.apache.org/jira/browse/HIVE-12393
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> The plan for any given query optimized by CBO will always contain a Project 
> operator on top of the TS that prunes that columns that are not needed.
> Thus, there is no need for Hive optimizer to traverse the whole plan to check 
> which columns can be pruned. In fact, Hive ColumnPruner optimizer only needs 
> to match TS operators when CBO optimized the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12387) Bug with logging improvements in ATS

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12387:

Target Version/s:   (was: 2.0.0)

> Bug with logging improvements in ATS
> 
>
> Key: HIVE-12387
> URL: https://issues.apache.org/jira/browse/HIVE-12387
> Project: Hive
>  Issue Type: Bug
>  Components: Diagnosability
>Affects Versions: 1.2.1
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>
> When indexing in ATS, the space in the value is not useful. We need to change 
> to use the hive query id throughout the logging phase and also add 
> information about what config the user passed in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12410) HiveConf should make use of Validator.validate()

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12410:

Target Version/s:   (was: 2.0.0)

> HiveConf should make use of Validator.validate()
> 
>
> Key: HIVE-12410
> URL: https://issues.apache.org/jira/browse/HIVE-12410
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 0.10.0
>Reporter: Eugene Koifman
>
> Hive supports Validator class which allows you to specify valid ranges for 
> various configuration variables.  
> There is Validator.validate() which checks that the value specified is 
> actually within the constraints.
> Unfortunately validate() is not called in general.
> Should make validate() throw IllegalArgumentException and override
> Configuration.set() to make sure validate() is always called.
> This would not be backwards compatible but we should do what we can to ensure 
> that Hive is configured properly before starting a service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12424) Make use of Kryo's Object-to-Object deep copy

2015-11-19 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014748#comment-15014748
 ] 

Prasanth Jayachandran commented on HIVE-12424:
--

Yes. It is blocked on Kryo. So far no response from kryo community. I think we 
won't be able to target 2.0.0. If anything changes I will update this bug.

> Make use of Kryo's Object-to-Object deep copy
> -
>
> Key: HIVE-12424
> URL: https://issues.apache.org/jira/browse/HIVE-12424
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Currently, plan serialization and operator tree serialization uses Object -> 
> bytes -> Object approach for deep copy. It also uses ByteArrayOutputStream as 
> intermediate buffer whose write method is synchronized. Similarly read from 
> ByteArrayInputStream is also synchronized. Also Utilities.clonePlan() creates 
> a new HiveConf object that scans through conf directories and adds site.xml 
> which is an expensive operation. All these can be avoided using Kryo's Object 
> -> Object deep copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12405) Comparison bug in HiveSplitGenerator.InputSplitComparator#compare()

2015-11-19 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-12405:

Component/s: Tez

> Comparison bug in HiveSplitGenerator.InputSplitComparator#compare()
> ---
>
> Key: HIVE-12405
> URL: https://issues.apache.org/jira/browse/HIVE-12405
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.0.0
>Reporter: Aleksei Statkevich
>Assignee: Aleksei Statkevich
> Attachments: HIVE-12405.patch
>
>
> "compare()" method in HiveSplitGenerator.InputSplitComparator has the 
> following condition on line 281 which is always false and is most likely a 
> typo:
> {code}
> if (startPos1 > startPos1) {
> {code}
> As a result, in certain conditions splits might be sorted in incorrect order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12405) Comparison bug in HiveSplitGenerator.InputSplitComparator#compare()

2015-11-19 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-12405:

Release Note:   (was: Pushed to master. Thanks, Aleksei!)

> Comparison bug in HiveSplitGenerator.InputSplitComparator#compare()
> ---
>
> Key: HIVE-12405
> URL: https://issues.apache.org/jira/browse/HIVE-12405
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.0.0
>Reporter: Aleksei Statkevich
>Assignee: Aleksei Statkevich
> Attachments: HIVE-12405.patch
>
>
> "compare()" method in HiveSplitGenerator.InputSplitComparator has the 
> following condition on line 281 which is always false and is most likely a 
> typo:
> {code}
> if (startPos1 > startPos1) {
> {code}
> As a result, in certain conditions splits might be sorted in incorrect order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12405) Comparison bug in HiveSplitGenerator.InputSplitComparator#compare()

2015-11-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014523#comment-15014523
 ] 

Ashutosh Chauhan commented on HIVE-12405:
-

Pushed to master. Thanks, Aleksei!

> Comparison bug in HiveSplitGenerator.InputSplitComparator#compare()
> ---
>
> Key: HIVE-12405
> URL: https://issues.apache.org/jira/browse/HIVE-12405
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.0.0
>Reporter: Aleksei Statkevich
>Assignee: Aleksei Statkevich
> Attachments: HIVE-12405.patch
>
>
> "compare()" method in HiveSplitGenerator.InputSplitComparator has the 
> following condition on line 281 which is always false and is most likely a 
> typo:
> {code}
> if (startPos1 > startPos1) {
> {code}
> As a result, in certain conditions splits might be sorted in incorrect order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-11-19 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014538#comment-15014538
 ] 

Yin Huai commented on HIVE-3454:


Seems this issue has not been fixe completely. I still see
{code}
hive> SELECT CAST(CAST(-1200.0 AS TIMESTAMP) AS DOUBLE);
OK
-1200.0
Time taken: 0.047 seconds, Fetched: 1 row(s)
hive> SELECT CAST(CAST(-1200 AS TIMESTAMP) AS INT);
OK
-2
Time taken: 0.044 seconds, Fetched: 1 row(s)
{code}

> Problem with CAST(BIGINT as TIMESTAMP)
> --
>
> Key: HIVE-3454
> URL: https://issues.apache.org/jira/browse/HIVE-3454
> Project: Hive
>  Issue Type: Bug
>  Components: Types, UDF
>Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
> 0.13.1
>Reporter: Ryan Harris
>Assignee: Aihua Xu
>  Labels: newbie, newdev, patch
> Fix For: 1.2.0
>
> Attachments: HIVE-3454.1.patch.txt, HIVE-3454.3.patch, HIVE-3454.patch
>
>
> Ran into an issue while working with timestamp conversion.
> CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
> time from the BIGINT returned by unix_timestamp()
> Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12367) Lock/unlock database should add current database to inputs and outputs of authz hook

2015-11-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014641#comment-15014641
 ] 

Sergey Shelukhin commented on HIVE-12367:
-

Also, tests failed. Do files need the out file update?

> Lock/unlock database should add current database to inputs and outputs of 
> authz hook
> 
>
> Key: HIVE-12367
> URL: https://issues.apache.org/jira/browse/HIVE-12367
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.1
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Attachments: HIVE-12367.001.patch, HIVE-12367.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12438) Separate LLAP client side and server side config parameters

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12438:

Target Version/s:   (was: 2.0.0)

> Separate LLAP client side and server side config parameters
> ---
>
> Key: HIVE-12438
> URL: https://issues.apache.org/jira/browse/HIVE-12438
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Siddharth Seth
>
> Potentially separate out the files used as well. llap-daemon-site vs 
> llap-client-site.
> Most llap parameters are server side only. For ones which are required in 
> clients / AM - add an equivalent client side parameter for these.
> Also - parameters which enable the llap cache could be renamed.
> cc [~sershe]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12236) LLAP: Prevent metadata queries from thrashing LLAP cache

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12236:

Target Version/s: 2.0.0

> LLAP: Prevent metadata queries from thrashing LLAP cache
> 
>
> Key: HIVE-12236
> URL: https://issues.apache.org/jira/browse/HIVE-12236
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12236.WIP.patch
>
>
> Currently, metadata queries fired by BI tools tend to thrash LLAP's cache.
> Bypass the cache and process metadata queries directly from HiveServer2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11603) IndexOutOfBoundsException thrown when accessing a union all subquery and filtering on a column which does not exist in all underlying tables

2015-11-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014585#comment-15014585
 ] 

Sergey Shelukhin commented on HIVE-11603:
-

What is the status of this?

> IndexOutOfBoundsException thrown when accessing a union all subquery and 
> filtering on a column which does not exist in all underlying tables
> 
>
> Key: HIVE-11603
> URL: https://issues.apache.org/jira/browse/HIVE-11603
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.3.0, 1.2.1
> Environment: Hadoop 2.6
>Reporter: Nicholas Brenwald
>Assignee: Laljo John Pullokkaran
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HIVE-11603.1.patch, HIVE-11603.2.patch
>
>
> Create two empty tables t1 and t2
> {code}
> CREATE TABLE t1(c1 STRING);
> CREATE TABLE t2(c1 STRING, c2 INT);
> {code}
> Create a view on these two tables
> {code}
> CREATE VIEW v1 AS 
> SELECT c1, c2 
> FROM (
> SELECT c1, CAST(NULL AS INT) AS c2 FROM t1
> UNION ALL
> SELECT c1, c2 FROM t2
> ) x;
> {code}
> Then run
> {code}
> SELECT COUNT(*) from v1 
> WHERE c2 = 0;
> {code}
> We expect to get a result of zero, but instead the query fails with stack 
> trace:
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:119)
>   ... 22 more
> {code}
> Workarounds include disabling ppd,
> {code}
> set hive.optimize.ppd=false;
> {code}
> Or changing the view so that column c2 is null cast to double:
> {code}
> CREATE VIEW v1_workaround AS 
> SELECT c1, c2 
> FROM (
> SELECT c1, CAST(NULL AS DOUBLE) AS c2 FROM t1
> UNION ALL
> SELECT c1, c2 FROM t2
> ) x;
> {code}
> The problem seems to occur in branch-1.1, branch-1.2, branch-1 but seems to 
> be resolved in master (2.0.0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12300) deprecate MR in Hive 2.0

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12300:

Target Version/s: 2.0.0
   Fix Version/s: (was: 2.0.0)

> deprecate MR in Hive 2.0
> 
>
> Key: HIVE-12300
> URL: https://issues.apache.org/jira/browse/HIVE-12300
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Configuration, Documentation
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12300.01.patch, HIVE-12300.02.patch, 
> HIVE-12300.patch
>
>
> As suggested in the thread on dev alias



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12020) Revert log4j2 xml configuration to properties based configuration

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12020:

Target Version/s: 2.0.0
Priority: Blocker  (was: Major)

> Revert log4j2 xml configuration to properties based configuration
> -
>
> Key: HIVE-12020
> URL: https://issues.apache.org/jira/browse/HIVE-12020
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Log4j 2.4 release brought back properties based configuration. We should 
> revert XML based configuration and use properties based configuration instead 
> (less verbose and will be similar to old log4j properties). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12020) Revert log4j2 xml configuration to properties based configuration

2015-11-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014586#comment-15014586
 ] 

Sergey Shelukhin commented on HIVE-12020:
-

I think we should do this before releasing to avoid upgrade pains

> Revert log4j2 xml configuration to properties based configuration
> -
>
> Key: HIVE-12020
> URL: https://issues.apache.org/jira/browse/HIVE-12020
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 2.0.0
>
>
> Log4j 2.4 release brought back properties based configuration. We should 
> revert XML based configuration and use properties based configuration instead 
> (less verbose and will be similar to old log4j properties). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11866) Add framework to enable testing using LDAPServer using LDAP protocol

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11866:

Fix Version/s: (was: 2.0.0)

Removing fixed version (2.0) from Unresolved JIRA in preparation for the 
release. Please use target version field instead (if not already set) if you 
think this should be shipped as part of 2.0

> Add framework to enable testing using LDAPServer using LDAP protocol
> 
>
> Key: HIVE-11866
> URL: https://issues.apache.org/jira/browse/HIVE-11866
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.3.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-11866.2.patch, HIVE-11866.patch
>
>
> Currently there is no unit test coverage for HS2's LDAP Atn provider using a 
> LDAP Server on the backend. This prevents testing of the LDAPAtnProvider with 
> some realistic usecases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12300) deprecate MR in Hive 2.0

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12300:


Removing fixed version (2.0) from Unresolved JIRA in preparation for the 
release. Please use target version field instead (if not already set) if you 
think this should be shipped as part of 2.0

> deprecate MR in Hive 2.0
> 
>
> Key: HIVE-12300
> URL: https://issues.apache.org/jira/browse/HIVE-12300
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI, Configuration, Documentation
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12300.01.patch, HIVE-12300.02.patch, 
> HIVE-12300.patch
>
>
> As suggested in the thread on dev alias



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11417) Create shims for the row by row read path that is backed by VectorizedRowBatch

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11417:

Fix Version/s: (was: 2.0.0)

Removing fixed version (2.0) from Unresolved JIRA in preparation for the 
release. Please use target version field instead (if not already set) if you 
think this should be shipped as part of 2.0

> Create shims for the row by row read path that is backed by VectorizedRowBatch
> --
>
> Key: HIVE-11417
> URL: https://issues.apache.org/jira/browse/HIVE-11417
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> I'd like to make the default path for reading and writing ORC files to be 
> vectorized. To ensure that Hive can still read row by row, we'll need shims 
> to support the old API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6712) HS2 JDBC driver is inconsistent w.r.t. auto commit

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6712:
---
Fix Version/s: (was: 2.0.0)

Removing fixed version (2.0) from Unresolved JIRA in preparation for the 
release. Please use target version field instead (if not already set) if you 
think this should be shipped as part of 2.0

> HS2 JDBC driver is inconsistent w.r.t. auto commit
> --
>
> Key: HIVE-6712
> URL: https://issues.apache.org/jira/browse/HIVE-6712
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Xuefu Zhang
>Assignee: David McWhorter
>  Labels: jdbc
> Attachments: HIVE-6712.patch
>
>
> I see an inconsistency in HS2 JDBC driver code:
> {code}
>   @Override
>   public void setAutoCommit(boolean autoCommit) throws SQLException {
> if (autoCommit) {
>   throw new SQLException("enabling autocommit is not supported");
> }
>   }
> {code}
> From above, it seems that auto commit is not supported. However, 
> {code}
>   @Override
>   public boolean getAutoCommit() throws SQLException {
> return true;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12279) Testcase to verify session temporary files are removed after HIVE-11768

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12279:

Fix Version/s: (was: 2.0.0)

Removing fixed version (2.0) from Unresolved JIRA in preparation for the 
release. Please use target version field instead (if not already set) if you 
think this should be shipped as part of 2.0

> Testcase to verify session temporary files are removed after HIVE-11768
> ---
>
> Key: HIVE-12279
> URL: https://issues.apache.org/jira/browse/HIVE-12279
> Project: Hive
>  Issue Type: Test
>  Components: HiveServer2, Test
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-12279.1.patch
>
>
> We need to make sure HS2 session temporary files are removed after session 
> ends.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11603) IndexOutOfBoundsException thrown when accessing a union all subquery and filtering on a column which does not exist in all underlying tables

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11603:

Fix Version/s: (was: 2.0.0)

Removing fixed version (2.0) from Unresolved JIRA in preparation for the 
release. Please use target version field instead (if not already set) if you 
think this should be shipped as part of 2.0

> IndexOutOfBoundsException thrown when accessing a union all subquery and 
> filtering on a column which does not exist in all underlying tables
> 
>
> Key: HIVE-11603
> URL: https://issues.apache.org/jira/browse/HIVE-11603
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.3.0, 1.2.1
> Environment: Hadoop 2.6
>Reporter: Nicholas Brenwald
>Assignee: Laljo John Pullokkaran
>Priority: Minor
> Attachments: HIVE-11603.1.patch, HIVE-11603.2.patch
>
>
> Create two empty tables t1 and t2
> {code}
> CREATE TABLE t1(c1 STRING);
> CREATE TABLE t2(c1 STRING, c2 INT);
> {code}
> Create a view on these two tables
> {code}
> CREATE VIEW v1 AS 
> SELECT c1, c2 
> FROM (
> SELECT c1, CAST(NULL AS INT) AS c2 FROM t1
> UNION ALL
> SELECT c1, c2 FROM t2
> ) x;
> {code}
> Then run
> {code}
> SELECT COUNT(*) from v1 
> WHERE c2 = 0;
> {code}
> We expect to get a result of zero, but instead the query fails with stack 
> trace:
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:119)
>   ... 22 more
> {code}
> Workarounds include disabling ppd,
> {code}
> set hive.optimize.ppd=false;
> {code}
> Or changing the view so that column c2 is null cast to double:
> {code}
> CREATE VIEW v1_workaround AS 
> SELECT c1, c2 
> FROM (
> SELECT c1, CAST(NULL AS DOUBLE) AS c2 FROM t1
> UNION ALL
> SELECT c1, c2 FROM t2
> ) x;
> {code}
> The problem seems to occur in branch-1.1, branch-1.2, branch-1 but seems to 
> be resolved in master (2.0.0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12236) LLAP: Prevent metadata queries from thrashing LLAP cache

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12236:

Fix Version/s: (was: 2.0.0)
   (was: 1.3.0)

Removing fixed version (2.0) from Unresolved JIRA in preparation for the 
release. Please use target version field instead (if not already set) if you 
think this should be shipped as part of 2.0

> LLAP: Prevent metadata queries from thrashing LLAP cache
> 
>
> Key: HIVE-12236
> URL: https://issues.apache.org/jira/browse/HIVE-12236
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-12236.WIP.patch
>
>
> Currently, metadata queries fired by BI tools tend to thrash LLAP's cache.
> Bypass the cache and process metadata queries directly from HiveServer2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12020) Revert log4j2 xml configuration to properties based configuration

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12020:

Fix Version/s: (was: 2.0.0)

Removing fixed version (2.0) from Unresolved JIRA in preparation for the 
release. Please use target version field instead (if not already set) if you 
think this should be shipped as part of 2.0

> Revert log4j2 xml configuration to properties based configuration
> -
>
> Key: HIVE-12020
> URL: https://issues.apache.org/jira/browse/HIVE-12020
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Log4j 2.4 release brought back properties based configuration. We should 
> revert XML based configuration and use properties based configuration instead 
> (less verbose and will be similar to old log4j properties). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11800) GenMapRedWalker in MapReduceCompiler.java seems not efficient

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11800:

Fix Version/s: (was: 2.0.0)

Removing fixed version (2.0) from Unresolved JIRA in preparation for the 
release. Please use target version field instead (if not already set) if you 
think this should be shipped as part of 2.0

> GenMapRedWalker in MapReduceCompiler.java seems not efficient
> -
>
> Key: HIVE-11800
> URL: https://issues.apache.org/jira/browse/HIVE-11800
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Aihua Xu
>
> Investigate the implementation of GenMapRedWalker in MapReduceCompiler.java 
> to see if the performance can be improved. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11935) Access HiveMetaStoreClient.currentMetaVars should be synchronized

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11935:

Fix Version/s: (was: 2.0.0)
   (was: 1.3.0)

Removing fixed version (2.0) from Unresolved JIRA in preparation for the 
release. Please use target version field instead (if not already set) if you 
think this should be shipped as part of 2.0

> Access HiveMetaStoreClient.currentMetaVars should be synchronized
> -
>
> Key: HIVE-11935
> URL: https://issues.apache.org/jira/browse/HIVE-11935
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-11935.1.patch, HIVE-11935.2.patch
>
>
> We saw intermittent failure of the following stack:
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.isCompatibleWith(HiveMetaStoreClient.java:287)
> at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
> at com.sun.proxy.$Proxy9.isCompatibleWith(Unknown Source)
> at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:206)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.createHiveDB(BaseSemanticAnalyzer.java:205)
> at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.(DDLSemanticAnalyzer.java:223)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:259)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:409)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1116)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:110)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:181)
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:257)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:388)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:375)
> at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
> at com.sun.proxy.$Proxy20.executeStatementAsync(Unknown Source)
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:274)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:486)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at org.apache.thrift.server.TServlet.doPost(TServlet.java:83)
> at 
> org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:171)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> 

[jira] [Updated] (HIVE-11799) The output of explain query for multiple lateral views is huge

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11799:

Fix Version/s: (was: 2.0.0)

Removing fixed version (2.0) from Unresolved JIRA in preparation for the 
release. Please use target version field instead (if not already set) if you 
think this should be shipped as part of 2.0

> The output of explain query for multiple lateral views is huge
> --
>
> Key: HIVE-11799
> URL: https://issues.apache.org/jira/browse/HIVE-11799
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11799.patch
>
>
> Execute the following query
> {noformat}
> CREATE TABLE `t1`(`pattern` array);
>   
> explain select * from t1 
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1
> lateral view explode(pattern) tbl1 as col1;
> {noformat}
> After HIVE-11617 gets fixed, the explain output still takes forever since we 
> are recursively printing operator info which could be an issue if the ops 
> could have multiple children and parents, like lateral view case. Right now, 
> if a node has multiple parents, then it including its descendants will be 
> printed multiple times. We should print once and probably print just a 
> reference later on. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12055) Create row-by-row shims for the write path

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12055:

Fix Version/s: (was: 2.0.0)

Removing fixed version (2.0) from Unresolved JIRA in preparation for the 
release. Please use target version field instead (if not already set) if you 
think this should be shipped as part of 2.0

> Create row-by-row shims for the write path 
> ---
>
> Key: HIVE-12055
> URL: https://issues.apache.org/jira/browse/HIVE-12055
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-12055.patch, HIVE-12055.patch
>
>
> As part of removing the row-by-row writer, we'll need to shim out the higher 
> level API (OrcSerde and OrcOutputFormat) so that we maintain backwards 
> compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7693) Invalid column ref error in order by when using column alias in select clause and using having

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-7693:
---
Fix Version/s: (was: 2.0.0)

Removing fixed version (2.0) from Unresolved JIRA in preparation for the 
release. Please use target version field instead (if not already set) if you 
think this should be shipped as part of 2.0

> Invalid column ref error in order by when using column alias in select clause 
> and using having
> --
>
> Key: HIVE-7693
> URL: https://issues.apache.org/jira/browse/HIVE-7693
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Deepesh Khandelwal
>Assignee: Pengcheng Xiong
> Attachments: HIVE-7693.01.patch, HIVE-7693.02.patch, 
> HIVE-7693.03.patch, HIVE-7693.04.patch, HIVE-7693.05.patch, 
> HIVE-7693.06.patch, HIVE-7693.07.patch
>
>
> Hive CLI session:
> {noformat}
> hive> create table abc(foo int, bar string);
> OK
> Time taken: 0.633 seconds
> hive> select foo as c0, count(*) as c1 from abc group by foo, bar having bar 
> like '%abc%' order by foo;
> FAILED: SemanticException [Error 10004]: Line 1:93 Invalid table alias or 
> column reference 'foo': (possible column names are: c0, c1)
> {noformat}
> Without having clause, the query runs fine, example:
> {code}
> select foo as c0, count(*) as c1 from abc group by foo, bar order by foo;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10171) Create a storage-api module

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10171:

Fix Version/s: (was: 2.0.0)

Removing fixed version (2.0) from Unresolved JIRA in preparation for the 
release. Please use target version field instead (if not already set) if you 
think this should be shipped as part of 2.0

> Create a storage-api module
> ---
>
> Key: HIVE-10171
> URL: https://issues.apache.org/jira/browse/HIVE-10171
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> To support high performance file formats, I'd like to propose that we move 
> the minimal set of classes that are required to integrate with Hive into a 
> new module named "storage-api". This module will include VectorizedRowBatch, 
> the various ColumnVector classes, and the SARG classes. It will form the 
> start of an API that high performance storage formats can use to integrate 
> with Hive. Both ORC and Parquet can use the new API to support vectorization 
> and SARGs without performance destroying shims.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12159) Create vectorized readers for the complex types

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12159:

Fix Version/s: (was: 2.0.0)

Removing fixed version (2.0) from Unresolved JIRA in preparation for the 
release. Please use target version field instead (if not already set) if you 
think this should be shipped as part of 2.0

> Create vectorized readers for the complex types
> ---
>
> Key: HIVE-12159
> URL: https://issues.apache.org/jira/browse/HIVE-12159
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Oleg Zhurakousky
>
> We need vectorized readers for the complex types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12369) Faster Vector GroupBy

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12369:

Fix Version/s: (was: 2.0.0)

Removing fixed version (2.0) from Unresolved JIRA in preparation for the 
release. Please use target version field instead (if not already set) if you 
think this should be shipped as part of 2.0

> Faster Vector GroupBy
> -
>
> Key: HIVE-12369
> URL: https://issues.apache.org/jira/browse/HIVE-12369
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12369.01.patch
>
>
> Implement fast Vector GroupBy using fast hash table technology developed for 
> Native Vector MapJoin and vector key handling developed for recent HIVE-12290 
> Native Vector ReduceSink JIRA.
> (Patch also includes making Native Vector MapJoin use Hybrid Grace -- but 
> that can be separated out)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11175) create function using jar does not work with sql std authorization

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11175:

Fix Version/s: (was: 2.0.0)

Removing fixed version (2.0) from Unresolved JIRA in preparation for the 
release. Please use target version field instead (if not already set) if you 
think this should be shipped as part of 2.0

> create function using jar does not work with sql std authorization
> --
>
> Key: HIVE-11175
> URL: https://issues.apache.org/jira/browse/HIVE-11175
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.0
>Reporter: Olaf Flebbe
>Assignee: Olaf Flebbe
> Attachments: HIVE-11175.1.patch
>
>
> {code:sql}create function xxx as 'xxx' using jar 'file://foo.jar' {code} 
> gives error code for need of accessing a local foo.jar  resource with ADMIN 
> privileges. Same for HDFS (DFS_URI) .
> problem is that the semantic analysis enforces the ADMIN privilege for write 
> but the jar is clearly input not output. 
> Patch und Testcase appendend.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12255) Estimated task count should make use of SplitSizeEstimator

2015-11-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12255:

Fix Version/s: (was: 2.0.0)

Removing fixed version (2.0) from Unresolved JIRA in preparation for the 
release. Please use target version field instead (if not already set) if you 
think this should be shipped as part of 2.0

> Estimated task count should make use of SplitSizeEstimator
> --
>
> Key: HIVE-12255
> URL: https://issues.apache.org/jira/browse/HIVE-12255
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>
> Estimating the number of initial tasks for a query makes use of the size of 
> the splits being processed. This should consider the estimatedSize for 
> columnar files when possible, instead of the fixed size of the split.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12367) Lock/unlock database should add current database to inputs and outputs of authz hook

2015-11-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014639#comment-15014639
 ] 

Sergey Shelukhin commented on HIVE-12367:
-

+0.9 from me. The patch makes sense, [~alangates] can you double check?

> Lock/unlock database should add current database to inputs and outputs of 
> authz hook
> 
>
> Key: HIVE-12367
> URL: https://issues.apache.org/jira/browse/HIVE-12367
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.1
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Attachments: HIVE-12367.001.patch, HIVE-12367.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12316) Improved integration test for Hive

2015-11-19 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-12316:
--
Attachment: HIVE-12316.2.patch

A second version of the patch.  This cleans up a few things and fixes some bugs 
I've found since the last patch.  It also adds support for testing Hive 
streaming.

> Improved integration test for Hive
> --
>
> Key: HIVE-12316
> URL: https://issues.apache.org/jira/browse/HIVE-12316
> Project: Hive
>  Issue Type: New Feature
>  Components: Testing Infrastructure
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-12316.2.patch, HIVE-12316.patch
>
>
> In working with Hive testing I have found there are several issues that are 
> causing problems for developers, testers, and users:
> * Because Hive has many tunable knobs (file format, security, etc.) we end up 
> with tests that cover the same functionality with different permutations of 
> these features.
> * The Hive integration tests (ie qfiles) cannot be run on a cluster.  This 
> means we cannot run any of those tests at scale.  The HBase community by 
> contrast uses the same test suite locally and on a cluster, and has found 
> that this helps them greatly in testing.
> * Golden files are a grievous evil.  Test writers are forced to eyeball 
> results the first time they run a test and decide whether they look 
> reasonable, which is error prone and makes testing at scale impossible.  And 
> changes to one part of Hive often end up changing the plan (and the output of 
> explain) thus breaking many tests that are not related.  This is particularly 
> an issue for people working on the optimizer.  
> * The lack of ability to run on a cluster means that when people test Hive at 
> scale, they are forced to develop custom frameworks which can't then benefit 
> the community.
> * There is no easy mechanism to bring user queries into the test suite.
> I propose we build a new testing capability with the following requirements:
> * One test should be able to run all reasonable permutations (mr/tez/spark, 
> orc/parquet/text/rcfile, secure/non-secure etc.)  This doesn't mean it would 
> run every permutation every time, but that the tester could choose which 
> permutation to run.
> * The same tests should run locally and on a cluster.  The tests should 
> support scaling of input data from Ks to Ts.
> * Expected results should be auto-generated whenever possible, and this 
> should work with the scaling of inputs.  The dev should be able to provide 
> expected results or custom expected result generation in cases where 
> auto-generation doesn't make sense.
> * Access to the query plan should be available as an API in the tests so that 
> golden files of explain output are not required.
> * This should run in maven, junit, and java so that developers do not need to 
> manage yet another framework.
> * It should be possible to simulate user data (based on schema and 
> statistics) and quickly incorporate user queries so that tests from user 
> scenarios can be quickly incorporated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >