date:20161216

[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.

2016-12-16 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14053:
---
Status: Patch Available  (was: Open)

> Hive should report that primary keys can't be null.
> ---
>
> Key: HIVE-14053
> URL: https://issues.apache.org/jira/browse/HIVE-14053
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Pengcheng Xiong
>Priority: Minor
> Attachments: HIVE-14053.01.patch
>
>
> HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. 
> With the right driver in place, tools like Tableau can do join elimination 
> and queries can run much faster.
> Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't 
> work quite right for keys. In particular, primary keys by definition are not 
> null and the metadata should reflect this for improved join elimination.
> In this example that uses the TPC-H schema and its constraints, we sum 
> l_extendedprice and group by l_shipmode. This query should not use more than 
> just the lineitem table.
> With all the constraints in place, Tableau generates this query:
> {code}
> SELECT `lineitem`.`l_shipmode` AS `l_shipmode`,
>   SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok`
> FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem`
>   JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = 
> `orders`.`o_orderkey`)
>   JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = 
> `customer`.`c_custkey`)
>   JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = 
> `nation`.`n_nationkey`)
> WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT 
> (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS 
> NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT 
> (`nation`.`n_regionkey` IS NULL)))
> {code}
> Since these are the primary keys the denormalization and the where condition 
> is unnecessary and this sort of query can be a lot faster by just accessing 
> the lineitem table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14053) Hive should report that primary keys can't be null.

2016-12-16 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14053:
---
Attachment: HIVE-14053.01.patch

> Hive should report that primary keys can't be null.
> ---
>
> Key: HIVE-14053
> URL: https://issues.apache.org/jira/browse/HIVE-14053
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Pengcheng Xiong
>Priority: Minor
> Attachments: HIVE-14053.01.patch
>
>
> HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. 
> With the right driver in place, tools like Tableau can do join elimination 
> and queries can run much faster.
> Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't 
> work quite right for keys. In particular, primary keys by definition are not 
> null and the metadata should reflect this for improved join elimination.
> In this example that uses the TPC-H schema and its constraints, we sum 
> l_extendedprice and group by l_shipmode. This query should not use more than 
> just the lineitem table.
> With all the constraints in place, Tableau generates this query:
> {code}
> SELECT `lineitem`.`l_shipmode` AS `l_shipmode`,
>   SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok`
> FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem`
>   JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = 
> `orders`.`o_orderkey`)
>   JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = 
> `customer`.`c_custkey`)
>   JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = 
> `nation`.`n_nationkey`)
> WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT 
> (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS 
> NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT 
> (`nation`.`n_regionkey` IS NULL)))
> {code}
> Since these are the primary keys the denormalization and the where condition 
> is unnecessary and this sort of query can be a lot faster by just accessing 
> the lineitem table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14053) Hive should report that primary keys can't be null.

2016-12-16 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756576#comment-15756576
 ] 

Pengcheng Xiong commented on HIVE-14053:


[~ashutoshc], could u take a look? Thanks.

> Hive should report that primary keys can't be null.
> ---
>
> Key: HIVE-14053
> URL: https://issues.apache.org/jira/browse/HIVE-14053
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Pengcheng Xiong
>Priority: Minor
> Attachments: HIVE-14053.01.patch
>
>
> HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. 
> With the right driver in place, tools like Tableau can do join elimination 
> and queries can run much faster.
> Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't 
> work quite right for keys. In particular, primary keys by definition are not 
> null and the metadata should reflect this for improved join elimination.
> In this example that uses the TPC-H schema and its constraints, we sum 
> l_extendedprice and group by l_shipmode. This query should not use more than 
> just the lineitem table.
> With all the constraints in place, Tableau generates this query:
> {code}
> SELECT `lineitem`.`l_shipmode` AS `l_shipmode`,
>   SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok`
> FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem`
>   JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = 
> `orders`.`o_orderkey`)
>   JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = 
> `customer`.`c_custkey`)
>   JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = 
> `nation`.`n_nationkey`)
> WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT 
> (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS 
> NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT 
> (`nation`.`n_regionkey` IS NULL)))
> {code}
> Since these are the primary keys the denormalization and the where condition 
> is unnecessary and this sort of query can be a lot faster by just accessing 
> the lineitem table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-14053) Hive should report that primary keys can't be null.

2016-12-16 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-14053:
--

Assignee: Pengcheng Xiong  (was: Hari Sankar Sivarama Subramaniyan)

> Hive should report that primary keys can't be null.
> ---
>
> Key: HIVE-14053
> URL: https://issues.apache.org/jira/browse/HIVE-14053
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>Assignee: Pengcheng Xiong
>Priority: Minor
>
> HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. 
> With the right driver in place, tools like Tableau can do join elimination 
> and queries can run much faster.
> Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't 
> work quite right for keys. In particular, primary keys by definition are not 
> null and the metadata should reflect this for improved join elimination.
> In this example that uses the TPC-H schema and its constraints, we sum 
> l_extendedprice and group by l_shipmode. This query should not use more than 
> just the lineitem table.
> With all the constraints in place, Tableau generates this query:
> {code}
> SELECT `lineitem`.`l_shipmode` AS `l_shipmode`,
>   SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok`
> FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem`
>   JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = 
> `orders`.`o_orderkey`)
>   JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = 
> `customer`.`c_custkey`)
>   JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = 
> `nation`.`n_nationkey`)
> WHERE NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT 
> (`lineitem`.`l_suppkey` IS NULL))) AND ((NOT (`lineitem`.`l_partkey` IS 
> NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL AND (NOT 
> (`nation`.`n_regionkey` IS NULL)))
> {code}
> Since these are the primary keys the denormalization and the where condition 
> is unnecessary and this sort of query can be a lot faster by just accessing 
> the lineitem table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (HIVE-15335) Fast Decimal

2016-12-16 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15335:

Comment: was deleted

(was: Given that the ColumnVector family exposes public members (i.e. vector) 
of its classes, clients are at the compilation level.  They need to recompile 
each release.

If a client uses ORC to read vectorized (VectorizedRowBatch) then what use to 
be a internal non-shared data structure is now public.  What a mess.

I think the answer very well may be Hive and ORC are going to have to stay 
linked together.  We release them together.  The new feature is you can use ORC 
to read ORC files and don’t need to invoke Hive.   But ORC isn’t a fully 
separate project that can release separately.  It always has to release with 
its parent.

I suspect linked ORC and Hive releases and acknowledging client recompile for 
vectorized data will greatly reduce some incompatible API issues.  And, also 
look at somehow narrowing the storage vectorization data produced by readers 
like ORC.  That is what narrower part of HiveDecimal needs to stay the same so 
clients can recompile each time.  I'm still thinking about all of this...
)

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch, 
> HIVE-15335.096.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-16 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756221#comment-15756221
 ] 

Matt McCline commented on HIVE-15335:
-

A good test run (all failures are known and TestVectorizedColumnReaderBase 
doesn't produce an output because it has no tests).

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch, 
> HIVE-15335.096.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15460) Fix ptest2 test failures

2016-12-16 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756211#comment-15756211
 ] 

Hive QA commented on HIVE-15460:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843689/HIVE-15460.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 10806 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=233)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=117)

[timestamp_lazy.q,union29.q,runtime_skewjoin_mapjoin_spark.q,auto_join22.q,union8.q,groupby5_map.q,stats0.q,auto_join29.q,groupby6.q,merge1.q,mapjoin_distinct.q,vector_decimal_mapjoin.q,sample5.q,multi_insert_move_tasks_share_dependencies.q,join_array.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=250)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_nested_subquery]
 (batchId=84)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_shared_alias]
 (batchId=84)
org.apache.hadoop.hive.ql.security.TestStorageBasedMetastoreAuthorizationDrops.testDropPartition
 (batchId=206)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2624/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2624/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2624/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843689 - PreCommit-HIVE-Build

> Fix ptest2 test failures
> 
>
> Key: HIVE-15460
> URL: https://issues.apache.org/jira/browse/HIVE-15460
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-15460.01.patch
>
>
> I see these failures when I try to run tests on ptest2
> {noformat}
> Failed tests:   testBatch(org.apache.hive.ptest.execution.TestScripts): 
> expected:<...yPort=3128"(..)
>   testAlternativeTestJVM(org.apache.hive.ptest.execution.TestScripts): 
> expected:<...yPort=3128"(..)
>   testPrepNone(org.apache.hive.ptest.execution.TestScripts): 
> expected:<...yPort=3128"(..)
>   testPrepGit(org.apache.hive.ptest.execution.TestScripts): 
> expected:<...yPort=3128"(..)
>   testPrepHadoop1(org.apache.hive.ptest.execution.TestScripts): 
> expected:<...yPort=3128"(..)
>   testPrepSvn(org.apache.hive.ptest.execution.TestScripts): 
> expected:<...yPort=3128"(..)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15147) LLAP: use LLAP cache for non-columnar formats in a somewhat general way

2016-12-16 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756154#comment-15756154
 ] 

Sergey Shelukhin edited comment on HIVE-15147 at 12/17/16 3:00 AM:
---

The proper split/torn row handling (at least with LineRecordReader semantics). 
What remains is applicability - the above mentioned vectorization point, as 
well as better handling for what formats are allowed to be sliced into 
sub-split parts, and how to get file positions for that; and cleanup.

I will also push a branch master-15147 shortly, since I will not be working on 
this for some time... contributions welcome ;)


was (Author: sershe):
The proper split/torn row handling (at least with LineRecordReader semantics). 
What remains is applicability - the above mentioned vectorization point, as 
well as better handling for what formats are allowed to be slices and how to 
get file positions); and cleanup.

I will also push a branch master-15147 shortly, since I will not be working on 
this for some time... contributions welcome ;)

> LLAP: use LLAP cache for non-columnar formats in a somewhat general way
> ---
>
> Key: HIVE-15147
> URL: https://issues.apache.org/jira/browse/HIVE-15147
> Project: Hive
>  Issue Type: New Feature
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15147.01.WIP.noout.patch, 
> HIVE-15147.02.WIP.noout.patch, HIVE-15147.04.WIP.noout.patch, 
> HIVE-15147.05.WIP.noout.patch, HIVE-15147.WIP.noout.patch
>
>
> The primary goal for the first pass is caching text files. Nothing would 
> prevent other formats from using the same path, in principle, although, as 
> was originally done with ORC, it may be better to have native caching support 
> optimized for each particular format.
> Given that caching pure text is not smart, and we already have ORC-encoded 
> cache that is columnar due to ORC file structure, we will transform data into 
> columnar ORC.
> The general idea is to treat all the data in the world as merely ORC that was 
> compressed with some poor compression codec, such as csv. Using the original 
> IF and serde, as well as an ORC writer (with some heavyweight optimizations 
> disabled, potentially), we can "uncompress" the csv/whatever data into its 
> "original" ORC representation, then cache it efficiently, by column, and also 
> reuse a lot of the existing code.
> Various other points:
> 1) Caching granularity will have to be somehow determined (i.e. how do we 
> slice the file horizontally, to avoid caching entire columns). As with ORC 
> uncompressed files, the specific offsets don't really matter as long as they 
> are consistent between reads. The problem is that the file offsets will 
> actually need to be propagated to the new reader from the original 
> inputformat. Row counts are easier to use but there's a problem of how to 
> actually map them to missing ranges to read from disk.
> 2) Obviously, for row-based formats, if any one column that is to be read has 
> been evicted or is otherwise missing, "all the columns" have to be read for 
> the corresponding slice to cache and read that one column. The vague plan is 
> to handle this implicitly, similarly to how ORC reader handles CB-RG overlaps 
> - it will just so happen that a missing column in disk range list to retrieve 
> will expand the disk-range-to-read into the whole horizontal slice of the 
> file.
> 3) Granularity/etc. won't work for gzipped text. If anything at all is 
> evicted, the entire file has to be re-read. Gzipped text is a ridiculous 
> feature, so this is by design.
> 4) In future, it would be possible to also build some form or 
> metadata/indexes for this cached data to do PPD, etc. This is out of the 
> scope for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15147) LLAP: use LLAP cache for non-columnar formats in a somewhat general way

2016-12-16 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756154#comment-15756154
 ] 

Sergey Shelukhin edited comment on HIVE-15147 at 12/17/16 2:57 AM:
---

The proper split/torn row handling (at least with LineRecordReader semantics). 
What remains is applicability - the above mentioned vectorization point, as 
well as better handling for what formats are allowed to be slices and how to 
get file positions); and cleanup.

I will also push a branch master-15147 shortly, since I will not be working on 
this for some time... contributions welcome ;)


was (Author: sershe):
The proper split/torn row handling (at least with LineRecordReader semantics). 
What remains is applicability - the above mentioned vectorization point, as 
well as better handling for what formats are allowed to be slices and how to 
get file positions); and cleanup.

I will also push a branch master-15147 shortly, since I will be away for a 
while... contributions welcome ;)

> LLAP: use LLAP cache for non-columnar formats in a somewhat general way
> ---
>
> Key: HIVE-15147
> URL: https://issues.apache.org/jira/browse/HIVE-15147
> Project: Hive
>  Issue Type: New Feature
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15147.01.WIP.noout.patch, 
> HIVE-15147.02.WIP.noout.patch, HIVE-15147.04.WIP.noout.patch, 
> HIVE-15147.05.WIP.noout.patch, HIVE-15147.WIP.noout.patch
>
>
> The primary goal for the first pass is caching text files. Nothing would 
> prevent other formats from using the same path, in principle, although, as 
> was originally done with ORC, it may be better to have native caching support 
> optimized for each particular format.
> Given that caching pure text is not smart, and we already have ORC-encoded 
> cache that is columnar due to ORC file structure, we will transform data into 
> columnar ORC.
> The general idea is to treat all the data in the world as merely ORC that was 
> compressed with some poor compression codec, such as csv. Using the original 
> IF and serde, as well as an ORC writer (with some heavyweight optimizations 
> disabled, potentially), we can "uncompress" the csv/whatever data into its 
> "original" ORC representation, then cache it efficiently, by column, and also 
> reuse a lot of the existing code.
> Various other points:
> 1) Caching granularity will have to be somehow determined (i.e. how do we 
> slice the file horizontally, to avoid caching entire columns). As with ORC 
> uncompressed files, the specific offsets don't really matter as long as they 
> are consistent between reads. The problem is that the file offsets will 
> actually need to be propagated to the new reader from the original 
> inputformat. Row counts are easier to use but there's a problem of how to 
> actually map them to missing ranges to read from disk.
> 2) Obviously, for row-based formats, if any one column that is to be read has 
> been evicted or is otherwise missing, "all the columns" have to be read for 
> the corresponding slice to cache and read that one column. The vague plan is 
> to handle this implicitly, similarly to how ORC reader handles CB-RG overlaps 
> - it will just so happen that a missing column in disk range list to retrieve 
> will expand the disk-range-to-read into the whole horizontal slice of the 
> file.
> 3) Granularity/etc. won't work for gzipped text. If anything at all is 
> evicted, the entire file has to be re-read. Gzipped text is a ridiculous 
> feature, so this is by design.
> 4) In future, it would be possible to also build some form or 
> metadata/indexes for this cached data to do PPD, etc. This is out of the 
> scope for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15147) LLAP: use LLAP cache for non-columnar formats in a somewhat general way

2016-12-16 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15147:

Attachment: HIVE-15147.05.WIP.noout.patch

The proper split/torn row handling (at least with LineRecordReader semantics). 
What remains is applicability - the above mentioned vectorization point, as 
well as better handling for what formats are allowed to be slices and how to 
get file positions); and cleanup.

I will also push a branch master-15147 shortly, since I will be away for a 
while... contributions welcome ;)

> LLAP: use LLAP cache for non-columnar formats in a somewhat general way
> ---
>
> Key: HIVE-15147
> URL: https://issues.apache.org/jira/browse/HIVE-15147
> Project: Hive
>  Issue Type: New Feature
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15147.01.WIP.noout.patch, 
> HIVE-15147.02.WIP.noout.patch, HIVE-15147.04.WIP.noout.patch, 
> HIVE-15147.05.WIP.noout.patch, HIVE-15147.WIP.noout.patch
>
>
> The primary goal for the first pass is caching text files. Nothing would 
> prevent other formats from using the same path, in principle, although, as 
> was originally done with ORC, it may be better to have native caching support 
> optimized for each particular format.
> Given that caching pure text is not smart, and we already have ORC-encoded 
> cache that is columnar due to ORC file structure, we will transform data into 
> columnar ORC.
> The general idea is to treat all the data in the world as merely ORC that was 
> compressed with some poor compression codec, such as csv. Using the original 
> IF and serde, as well as an ORC writer (with some heavyweight optimizations 
> disabled, potentially), we can "uncompress" the csv/whatever data into its 
> "original" ORC representation, then cache it efficiently, by column, and also 
> reuse a lot of the existing code.
> Various other points:
> 1) Caching granularity will have to be somehow determined (i.e. how do we 
> slice the file horizontally, to avoid caching entire columns). As with ORC 
> uncompressed files, the specific offsets don't really matter as long as they 
> are consistent between reads. The problem is that the file offsets will 
> actually need to be propagated to the new reader from the original 
> inputformat. Row counts are easier to use but there's a problem of how to 
> actually map them to missing ranges to read from disk.
> 2) Obviously, for row-based formats, if any one column that is to be read has 
> been evicted or is otherwise missing, "all the columns" have to be read for 
> the corresponding slice to cache and read that one column. The vague plan is 
> to handle this implicitly, similarly to how ORC reader handles CB-RG overlaps 
> - it will just so happen that a missing column in disk range list to retrieve 
> will expand the disk-range-to-read into the whole horizontal slice of the 
> file.
> 3) Granularity/etc. won't work for gzipped text. If anything at all is 
> evicted, the entire file has to be re-read. Gzipped text is a ridiculous 
> feature, so this is by design.
> 4) In future, it would be possible to also build some form or 
> metadata/indexes for this cached data to do PPD, etc. This is out of the 
> scope for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15419) Separate out storage-api to be released independently

2016-12-16 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756121#comment-15756121
 ] 

Hive QA commented on HIVE-15419:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843675/HIVE-15419.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2623/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2623/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2623/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2016-12-17 02:29:22.278
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-2623/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2016-12-17 02:29:22.281
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 7f46c8d HIVE-15420: LLAP UI: Relativize resources to allow 
proxied/secured views (Gopal V, reviewed by Rajesh Balamohan)
+ git clean -f -d
Removing ql/src/test/queries/clientpositive/subquery_nested_subquery.q
Removing ql/src/test/queries/clientpositive/subquery_shared_alias.q
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 7f46c8d HIVE-15420: LLAP UI: Relativize resources to allow 
proxied/secured views (Gopal V, reviewed by Rajesh Balamohan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2016-12-17 02:29:23.243
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: common/pom.xml:42
error: common/pom.xml: patch does not apply
error: patch failed: pom.xml:507
error: pom.xml: patch does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843675 - PreCommit-HIVE-Build

> Separate out storage-api to be released independently
> -
>
> Key: HIVE-15419
> URL: https://issues.apache.org/jira/browse/HIVE-15419
> Project: Hive
>  Issue Type: Task
>  Components: storage-api
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-15419.patch
>
>
> Currently, the Hive project releases a single monolithic release, but this 
> makes file formats reading directly into Hive's vector row batches a circular 
> dependence. Storage-api is a small module with the vectorized row batches and 
> SearchArgument that are necessary for efficient vectorized read and write. By 
> releasing storage-api independently, we can make an interface that the file 
> formats can read and write from.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15459) Fix unit test failures on master

2016-12-16 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756118#comment-15756118
 ] 

Hive QA commented on HIVE-15459:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843672/HIVE-15459.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10821 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_1] 
(batchId=92)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2622/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2622/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2622/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843672 - PreCommit-HIVE-Build

> Fix unit test failures on master
> 
>
> Key: HIVE-15459
> URL: https://issues.apache.org/jira/browse/HIVE-15459
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
> Attachments: HIVE-15459.patch
>
>
> Golden file updates missed in HIVE-15397 and HIVE-15192



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15460) Fix ptest2 test failures

2016-12-16 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-15460:
---
Status: Patch Available  (was: Open)

> Fix ptest2 test failures
> 
>
> Key: HIVE-15460
> URL: https://issues.apache.org/jira/browse/HIVE-15460
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-15460.01.patch
>
>
> I see these failures when I try to run tests on ptest2
> {noformat}
> Failed tests:   testBatch(org.apache.hive.ptest.execution.TestScripts): 
> expected:<...yPort=3128"(..)
>   testAlternativeTestJVM(org.apache.hive.ptest.execution.TestScripts): 
> expected:<...yPort=3128"(..)
>   testPrepNone(org.apache.hive.ptest.execution.TestScripts): 
> expected:<...yPort=3128"(..)
>   testPrepGit(org.apache.hive.ptest.execution.TestScripts): 
> expected:<...yPort=3128"(..)
>   testPrepHadoop1(org.apache.hive.ptest.execution.TestScripts): 
> expected:<...yPort=3128"(..)
>   testPrepSvn(org.apache.hive.ptest.execution.TestScripts): 
> expected:<...yPort=3128"(..)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15460) Fix ptest2 test failures

2016-12-16 Thread Vihang Karajgaonkar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-15460:
---
Attachment: HIVE-15460.01.patch

Hi [~sseth] Can you please review this patch. Some of the recent changes in 
batch-exec.vm and source-prep.vm seems to have broken the ptest2 tests. Thanks!

> Fix ptest2 test failures
> 
>
> Key: HIVE-15460
> URL: https://issues.apache.org/jira/browse/HIVE-15460
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-15460.01.patch
>
>
> I see these failures when I try to run tests on ptest2
> {noformat}
> Failed tests:   testBatch(org.apache.hive.ptest.execution.TestScripts): 
> expected:<...yPort=3128"(..)
>   testAlternativeTestJVM(org.apache.hive.ptest.execution.TestScripts): 
> expected:<...yPort=3128"(..)
>   testPrepNone(org.apache.hive.ptest.execution.TestScripts): 
> expected:<...yPort=3128"(..)
>   testPrepGit(org.apache.hive.ptest.execution.TestScripts): 
> expected:<...yPort=3128"(..)
>   testPrepHadoop1(org.apache.hive.ptest.execution.TestScripts): 
> expected:<...yPort=3128"(..)
>   testPrepSvn(org.apache.hive.ptest.execution.TestScripts): 
> expected:<...yPort=3128"(..)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-16 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756031#comment-15756031
 ] 

Hive QA commented on HIVE-15335:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843671/HIVE-15335.096.patch

{color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10907 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=233)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=250)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_nested_subquery]
 (batchId=84)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_shared_alias]
 (batchId=84)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2621/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2621/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2621/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843671 - PreCommit-HIVE-Build

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch, 
> HIVE-15335.096.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15297) Hive should not split semicolon within quoted string literals

2016-12-16 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755953#comment-15755953
 ] 

Hive QA commented on HIVE-15297:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843659/HIVE-15297.04.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 414 failed/errored test(s), 10822 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=233)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=250)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_joins] 
(batchId=217)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_predicate_pushdown]
 (batchId=217)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert]
 (batchId=217)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[ctas] 
(batchId=229)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_dynamic_partitions]
 (batchId=229)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_table]
 (batchId=229)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_directory]
 (batchId=229)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions]
 (batchId=229)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_table]
 (batchId=229)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[write_final_output_blobstore]
 (batchId=229)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_subquery] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[analyze_tbl_part] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_join_pkfk]
 (batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avrocountemptytbl] 
(batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input19] (batchId=79)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_overwrite_directory]
 (batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join46] (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_emit_interval] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin46] (batchId=53)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[nested_column_pruning] 
(batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample5] (batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[serde_opencsv] 
(batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_46] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[specialChar] (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[uniquejoin] (batchId=63)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_after_multiple_inserts]
 (batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_after_multiple_inserts_special_characters]
 (batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_all_types] 
(batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_orig_table] 
(batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[varchar_1] (batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[varchar_2] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[varchar_join1] 
(batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[varchar_nested_types] 
(batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[varchar_serde] 
(batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[varchar_union1] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_aggregate_9] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_binary_join_groupby]
 (batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_cast_constant] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_2] 
(batchId=63)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_mapjoin1] 
(batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_char_simple] 
(batchId=42)

[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-16 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755929#comment-15755929
 ] 

Sergey Shelukhin commented on HIVE-15335:
-

+1, I am assuming it is on by default so there's test coverage also from q 
files.
Remaining nits can be fixed on commit.
Also a lot of trailing white space that can  be fixed on commit.

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch, 
> HIVE-15335.096.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15419) Separate out storage-api to be released independently

2016-12-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755908#comment-15755908
 ] 

ASF GitHub Bot commented on HIVE-15419:
---

GitHub user omalley opened a pull request:

https://github.com/apache/hive/pull/125

HIVE-15419 Separate storage-api so that it can be released separately.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/omalley/hive hive-15419

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/125.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #125


commit 3a6ad18baa4dd9d13543200a01e15a374609c561
Author: Owen O'Malley 
Date:   2016-07-10T17:08:57Z

HIVE-14007. Replace hive-orc module with ORC 1.2.3 release.

commit 2f2c89e3017f4740c5645e672d2cee5b770ffa54
Author: Owen O'Malley 
Date:   2016-12-17T00:11:02Z

HIVE-15419. Separate storage-api to be released independently.




> Separate out storage-api to be released independently
> -
>
> Key: HIVE-15419
> URL: https://issues.apache.org/jira/browse/HIVE-15419
> Project: Hive
>  Issue Type: Task
>  Components: storage-api
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-15419.patch
>
>
> Currently, the Hive project releases a single monolithic release, but this 
> makes file formats reading directly into Hive's vector row batches a circular 
> dependence. Storage-api is a small module with the vectorized row batches and 
> SearchArgument that are necessary for efficient vectorized read and write. By 
> releasing storage-api independently, we can make an interface that the file 
> formats can read and write from.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15419) Separate out storage-api to be released independently

2016-12-16 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-15419:
-
Attachment: HIVE-15419.patch

This patch separates out the storage-api to make it releasable without the rest 
of Hive. In particular,
* It makes the storage-api/pom use apache as its parent pom rather than hive
* It sets the version for storage-api to 2.3.0-SNAPSHOT so that we catch any 
problems where the reference is to the 2.2.0-SNAPSHOT.
* It creates a variable in hive's pom.xml for storage-api.version.
* storage-api is still built as part of the build.
* Murmur3 and BloomFilter are moved over to the ORC version.

> Separate out storage-api to be released independently
> -
>
> Key: HIVE-15419
> URL: https://issues.apache.org/jira/browse/HIVE-15419
> Project: Hive
>  Issue Type: Task
>  Components: storage-api
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-15419.patch
>
>
> Currently, the Hive project releases a single monolithic release, but this 
> makes file formats reading directly into Hive's vector row batches a circular 
> dependence. Storage-api is a small module with the vectorized row batches and 
> SearchArgument that are necessary for efficient vectorized read and write. By 
> releasing storage-api independently, we can make an interface that the file 
> formats can read and write from.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15419) Separate out storage-api to be released independently

2016-12-16 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-15419:
-
Assignee: Owen O'Malley
  Status: Patch Available  (was: Open)

> Separate out storage-api to be released independently
> -
>
> Key: HIVE-15419
> URL: https://issues.apache.org/jira/browse/HIVE-15419
> Project: Hive
>  Issue Type: Task
>  Components: storage-api
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> Currently, the Hive project releases a single monolithic release, but this 
> makes file formats reading directly into Hive's vector row batches a circular 
> dependence. Storage-api is a small module with the vectorized row batches and 
> SearchArgument that are necessary for efficient vectorized read and write. By 
> releasing storage-api independently, we can make an interface that the file 
> formats can read and write from.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-16 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755860#comment-15755860
 ] 

Hive QA commented on HIVE-15376:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843655/HIVE-15376.9.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10821 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=233)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=250)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] 
(batchId=93)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_nested_subquery]
 (batchId=84)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_shared_alias]
 (batchId=84)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2619/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2619/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2619/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843655 - PreCommit-HIVE-Build

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch, HIVE-15376.8.patch, HIVE-15376.9.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15441) Provide a config to timeout long compiling queries

2016-12-16 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755837#comment-15755837
 ] 

Chaoyu Tang edited comment on HIVE-15441 at 12/17/16 12:02 AM:
---

[~thejas] No. HIVE-12431 is only helpful before the query acquires the 
compilation lock. What I meant is that after the compilation lock is acquired 
and the query starts compiling, the compilation is hard to be interrupted. The 
CompilationTimeoutThread provided in this patch calls 
threadToInterrupt.interrupt() at timeout, but this Thread.interrupt() API is 
only to set an interrupt flag and it is up to the targeted thread which is 
doing the compilation to catch this flag and stop the processing (see 
https://docs.oracle.com/javase/tutorial/essential/concurrency/interrupt.html). 
I don't think the compilation related code has something like sleep/wait etc 
which can catch this interrupt flag and throw out InterruptedException to stop 
the on-going compilation.
HIVE-4924 has added JDBC QueryTimeout, this timeout counts the time for the 
whole query processing (compilation + execution). That said, HIVE-4924 should 
have covered the compilation timeout here. If a long compiling query has not 
finished by its query timeout, it should be stopped through HIVE-4924.
But HIVE-4924 does not work as expected because the interrupt flag set via 
interrupt() could not be caught (or handled) properly in the target thread 
processing the query. In HIVE-14799, I took the other approach to interrupt the 
query processing when time query timeout is reached during the execution time.
Yeah, we do need a way to interrupt a query when it is being compiled, but 
setting an interrupt flag using Thread.interupt() may not be sufficient to stop 
the processing.



was (Author: ctang.ma):
[~thejas] No. HIVE-12431 is only helpful before the query acquires the 
compilation lock. What I meant is that after the compilation lock is acquired 
and the query starts compiling, the compilation is hard to be interrupted. The 
CompilationTimeoutThread provided in this patch calls 
threadToInterrupt.interrupt() at timeout, but this Thread.interrupt() API is 
only to set an interrupt flag and it is up to the targeted thread which is 
doing the compilation to catch this flag and stop the processing (see 
https://docs.oracle.com/javase/tutorial/essential/concurrency/interrupt.html). 
I don't think the compilation related code has something like sleep/wait etc 
which can catch this interrupt flag and throw out InterruptedException to stop 
the on-going compilation.
HIVE-4924 has added JDBC QueryTimeout, this timeout counts the time for the 
whole query processing (compilation + execution). That said, HIVE-4924 should 
have covered the compilation timeout here. If a long compiling query has not 
finished by its query timeout, it should be stopped through HIVE-4924.
But HIVE-4924 does not work as expected because the interrupt flag set via 
interrupt() could not be caught (or handled) properly in the target thread 
processing the query. In HIVE-14799, I took the other approach to interrupt the 
query processing when time query timeout is reached during the execution time.


> Provide a config to timeout long compiling queries
> --
>
> Key: HIVE-15441
> URL: https://issues.apache.org/jira/browse/HIVE-15441
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15441.1.patch
>
>
> Sometimes Hive users have long compiling queries which may need to scan 
> thousands or even more partitions (perhaps by accident). The compilation 
> process may take a very long time, especially in {{getInputSummary}} where it 
> need to make NN calls to get info about each input path.
> This is bad because it may block many other queries. Parallel compilation may 
> be useful but still {{getInputSummary}} has a global lock. In this case, it 
> makes sense to provide Hive admin with a config to put a timeout limit for 
> compilation, so that these "bad" queries can be blocked.
> Note https://issues.apache.org/jira/browse/HIVE-12431 also tries to address 
> similar issue. However it cancels those queries that are waiting for the 
> compile lock, which I think is not so useful for our case since the *query 
> under compile is the one to be blamed.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14956) Parallelize TestHCatLoader

2016-12-16 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755849#comment-15755849
 ] 

Vaibhav Gumashta commented on HIVE-14956:
-

[~vihangk1] Sure, thanks for taking this up. Assigning to you.

> Parallelize TestHCatLoader
> --
>
> Key: HIVE-14956
> URL: https://issues.apache.org/jira/browse/HIVE-14956
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14956) Parallelize TestHCatLoader

2016-12-16 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-14956:

Assignee: Vihang Karajgaonkar  (was: Vaibhav Gumashta)

> Parallelize TestHCatLoader
> --
>
> Key: HIVE-14956
> URL: https://issues.apache.org/jira/browse/HIVE-14956
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vihang Karajgaonkar
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14956) Parallelize TestHCatLoader

2016-12-16 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755840#comment-15755840
 ] 

Vihang Karajgaonkar edited comment on HIVE-14956 at 12/16/16 11:56 PM:
---

Hi [~vgumashta] Do you need any help to move this forward? I can help you with 
this if you are busy. I am guessing this would be similar to [HIVE-14891 | 
https://issues.apache.org/jira/browse/HIVE-14891] which you worked on earlier.


was (Author: vihangk1):
Hi [~vgumashta] Do you need any help to move this forward? I can help you with 
this if you are busy. I am guessing this would be similar to [HIVE-14956 | 
https://issues.apache.org/jira/browse/HIVE-14956] which you worked on earlier.

> Parallelize TestHCatLoader
> --
>
> Key: HIVE-14956
> URL: https://issues.apache.org/jira/browse/HIVE-14956
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14956) Parallelize TestHCatLoader

2016-12-16 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755840#comment-15755840
 ] 

Vihang Karajgaonkar commented on HIVE-14956:


Hi [~vgumashta] Do you need any help to move this forward? I can help you with 
this if you are busy. I am guessing this would be similar to [HIVE-14956 | 
https://issues.apache.org/jira/browse/HIVE-14956] which you worked on earlier.

> Parallelize TestHCatLoader
> --
>
> Key: HIVE-14956
> URL: https://issues.apache.org/jira/browse/HIVE-14956
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15441) Provide a config to timeout long compiling queries

2016-12-16 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755837#comment-15755837
 ] 

Chaoyu Tang commented on HIVE-15441:


[~thejas] No. HIVE-12431 is only helpful before the query acquires the 
compilation lock. What I meant is that after the compilation lock is acquired 
and the query starts compiling, the compilation is hard to be interrupted. The 
CompilationTimeoutThread provided in this patch calls 
threadToInterrupt.interrupt() at timeout, but this Thread.interrupt() API is 
only to set an interrupt flag and it is up to the targeted thread which is 
doing the compilation to catch this flag and stop the processing (see 
https://docs.oracle.com/javase/tutorial/essential/concurrency/interrupt.html). 
I don't think the compilation related code has something like sleep/wait etc 
which can catch this interrupt flag and throw out InterruptedException to stop 
the on-going compilation.
HIVE-4924 has added JDBC QueryTimeout, this timeout counts the time for the 
whole query processing (compilation + execution). That said, HIVE-4924 should 
have covered the compilation timeout here. If a long compiling query has not 
finished by its query timeout, it should be stopped through HIVE-4924.
But HIVE-4924 does not work as expected because the interrupt flag set via 
interrupt() could not be caught (or handled) properly in the target thread 
processing the query. In HIVE-14799, I took the other approach to interrupt the 
query processing when time query timeout is reached during the execution time.


> Provide a config to timeout long compiling queries
> --
>
> Key: HIVE-15441
> URL: https://issues.apache.org/jira/browse/HIVE-15441
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15441.1.patch
>
>
> Sometimes Hive users have long compiling queries which may need to scan 
> thousands or even more partitions (perhaps by accident). The compilation 
> process may take a very long time, especially in {{getInputSummary}} where it 
> need to make NN calls to get info about each input path.
> This is bad because it may block many other queries. Parallel compilation may 
> be useful but still {{getInputSummary}} has a global lock. In this case, it 
> makes sense to provide Hive admin with a config to put a timeout limit for 
> compilation, so that these "bad" queries can be blocked.
> Note https://issues.apache.org/jira/browse/HIVE-12431 also tries to address 
> similar issue. However it cancels those queries that are waiting for the 
> compile lock, which I think is not so useful for our case since the *query 
> under compile is the one to be blamed.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15459) Fix unit test failures on master

2016-12-16 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15459:

Status: Patch Available  (was: Open)

> Fix unit test failures on master
> 
>
> Key: HIVE-15459
> URL: https://issues.apache.org/jira/browse/HIVE-15459
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
> Attachments: HIVE-15459.patch
>
>
> Golden file updates missed in HIVE-15397 and HIVE-15192



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15459) Fix unit test failures on master

2016-12-16 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15459:

Attachment: HIVE-15459.patch

> Fix unit test failures on master
> 
>
> Key: HIVE-15459
> URL: https://issues.apache.org/jira/browse/HIVE-15459
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
> Attachments: HIVE-15459.patch
>
>
> Golden file updates missed in HIVE-15397 and HIVE-15192



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15335) Fast Decimal

2016-12-16 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15335:

Attachment: HIVE-15335.096.patch

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch, 
> HIVE-15335.096.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15335) Fast Decimal

2016-12-16 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15335:

Status: Patch Available  (was: In Progress)

> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch, 
> HIVE-15335.096.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15200) Support setOp in subQuery with parentheses

2016-12-16 Thread Pengcheng Xiong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755787#comment-15755787
 ] 

Pengcheng Xiong commented on HIVE-15200:


[~ashutoshc], could u take a look? I will create a RB. Thanks.

> Support setOp in subQuery with parentheses
> --
>
> Key: HIVE-15200
> URL: https://issues.apache.org/jira/browse/HIVE-15200
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15200.01.patch, HIVE-15200.02.patch, 
> HIVE-15200.03.patch
>
>
> {code}
> explain select key from ((select key from src) union (select key from 
> src))subq;
> {code}
> will throw
> {code}
> FAILED: ParseException line 1:47 cannot recognize input near 'union' '(' 
> 'select' in subquery source
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15426) Fix order guarantee of event executions for REPL LOAD

2016-12-16 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755784#comment-15755784
 ] 

Thejas M Nair commented on HIVE-15426:
--

+1

> Fix order guarantee of event executions for REPL LOAD
> -
>
> Key: HIVE-15426
> URL: https://issues.apache.org/jira/browse/HIVE-15426
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-15426.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15200) Support setOp in subQuery with parentheses

2016-12-16 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755769#comment-15755769
 ] 

Hive QA commented on HIVE-15200:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843652/HIVE-15200.03.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 10807 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=233)
TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=108)

[union_remove_1.q,ppd_outer_join2.q,groupby1_noskew.q,join20.q,smb_mapjoin_13.q,multi_insert.q,groupby_rollup1.q,temp_table_gb1.q,vector_string_concat.q,smb_mapjoin_6.q,metadata_only_queries.q,auto_sortmerge_join_12.q,groupby_bigdata.q,groupby3_map_multi_distinct.q,innerjoin.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=250)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=39)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=150)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_nested_subquery]
 (batchId=84)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_shared_alias]
 (batchId=84)
org.apache.hadoop.hive.ql.parse.TestIUD.testSelectStarFromAnonymousVirtTable1Row
 (batchId=256)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2618/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2618/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2618/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843652 - PreCommit-HIVE-Build

> Support setOp in subQuery with parentheses
> --
>
> Key: HIVE-15200
> URL: https://issues.apache.org/jira/browse/HIVE-15200
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15200.01.patch, HIVE-15200.02.patch, 
> HIVE-15200.03.patch
>
>
> {code}
> explain select key from ((select key from src) union (select key from 
> src))subq;
> {code}
> will throw
> {code}
> FAILED: ParseException line 1:47 cannot recognize input near 'union' '(' 
> 'select' in subquery source
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15414) Fix batchSize for TestNegativeCliDriver

2016-12-16 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755766#comment-15755766
 ] 

Vihang Karajgaonkar commented on HIVE-15414:


Hi [~spena] I checked the results on couple of the latest pre-commit runs. Each 
batch of TestNegativeCliDriver now takes between ~400-470 sec which is about 
~250-300 sec faster for this batch when compared to before.

[pre-commit-2614 | https://builds.apache.org/job/PreCommit-HIVE-Build/2614/]
{noformat}
2016-12-16 19:22:08,193  INFO [HostExecutor 26] LocalCommand.:45 Starting 
LocalCommandId=273266: ssh -v -i /home/hiveptest/.ssh/hive-ptest-user-key  -l 
hiveptest 107.178.216.213 'bash 
/home/hiveptest/107.178.216.213-hiveptest-1/scratch/hiveptest-84-TestNegativeCliDriver-nopart_insert.q-insert_into_with_schema.q-input41.q-and-397-more.sh'
2016-12-16 19:22:08,924  INFO [HostExecutor 39] LocalCommand.:45 Starting 
LocalCommandId=273268: ssh -v -i /home/hiveptest/.ssh/hive-ptest-user-key  -l 
hiveptest 104.154.143.187 'bash 
/home/hiveptest/104.154.143.187-hiveptest-0/scratch/hiveptest-85-TestNegativeCliDriver-udf_array_contains_wrong2.q-udf_invalid.q-authorization_rolehierarchy_privs.q-and-374-more.sh'
2016-12-16 19:29:02,702  INFO [HostExecutor 39] 
LocalCommand.awaitProcessCompletion:67 Finished LocalCommandId=273268. 
ElapsedTime(ms)=413777
2016-12-16 19:30:00,510  INFO [HostExecutor 26] 
LocalCommand.awaitProcessCompletion:67 Finished LocalCommandId=273266. 
ElapsedTime(ms)=472316
{noformat}

[pre-commit-2613 | https://builds.apache.org/job/PreCommit-HIVE-Build/2613/]
{noformat}
2016-12-16 18:36:15,845  INFO [HostExecutor 36] LocalCommand.:45 Starting 
LocalCommandId=272258: ssh -v -i /home/hiveptest/.ssh/hive-ptest-user-key  -l 
hiveptest 35.184.12.165 'bash 
/home/hiveptest/35.184.12.165-hiveptest-1/scratch/hiveptest-84-TestNegativeCliDriver-nopart_insert.q-insert_into_with_schema.q-input41.q-and-397-more.sh'
2016-12-16 18:36:16,066  INFO [HostExecutor 13] LocalCommand.:45 Starting 
LocalCommandId=272259: ssh -v -i /home/hiveptest/.ssh/hive-ptest-user-key  -l 
hiveptest 104.154.143.187 'bash 
/home/hiveptest/104.154.143.187-hiveptest-0/scratch/hiveptest-85-TestNegativeCliDriver-udf_array_contains_wrong2.q-udf_invalid.q-authorization_rolehierarchy_privs.q-and-370-more.sh'
2016-12-16 18:43:29,647  INFO [HostExecutor 36] 
HostExecutor.executeTestBatch:261 Completed executing tests for batch 
[84-TestNegativeCliDriver-nopart_insert.q-insert_into_with_schema.q-input41.q-and-397-more]
 on host 35.184.12.165. ElapsedTime(ms)=433802
2016-12-16 18:42:53,365  INFO [HostExecutor 13] 
HostExecutor.executeTestBatch:261 Completed executing tests for batch 
[85-TestNegativeCliDriver-udf_array_contains_wrong2.q-udf_invalid.q-authorization_rolehierarchy_privs.q-and-370-more]
 on host 104.154.143.187. ElapsedTime(ms)=397299
{noformat}


> Fix batchSize for TestNegativeCliDriver
> ---
>
> Key: HIVE-15414
> URL: https://issues.apache.org/jira/browse/HIVE-15414
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>
> While analyzing the console output of pre-commit console logs, I noticed that 
> TestNegativeCliDriver batchSize ~770 qfiles which doesn't look right.
> 2016-12-09 22:23:58,945 DEBUG [TestExecutor] ExecutionPhase.execute:96 
> PBatch: QFileTestBatch [batchId=84, size=774, driver=TestNegativeCliDriver, 
> queryFilesProperty=qfile, 
> name=84-TestNegativeCliDriver-nopart_insert.q-input41.q-having1.q-and-771-more..
>   
> I think {{qFileTest.clientNegative.batchSize = 1000}} in 
> {{test-configuration2.properties}} is probably the reason. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-16 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755743#comment-15755743
 ] 

Eugene Koifman edited comment on HIVE-15376 at 12/16/16 11:04 PM:
--

acquireLocksWithHeartbeatDelay() is called from DbTxnManagrer.acquireLocks()  
(in current code)

Regarding Case 4:
But how do you know heartbeat is being sent?  


was (Author: ekoifman):
acquireLocksWithHeartbeatDelay() is called from DbTxnManagrer.acquireLocks()

Regarding Case 4:
But how do you know heartbeat is being sent?  

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch, HIVE-15376.8.patch, HIVE-15376.9.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-16 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755743#comment-15755743
 ] 

Eugene Koifman commented on HIVE-15376:
---

acquireLocksWithHeartbeatDelay() is called from DbTxnManagrer.acquireLocks()

Regarding Case 4:
But how do you know heartbeat is being sent?  

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch, HIVE-15376.8.patch, HIVE-15376.9.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15447) Log session ID in ATSHook

2016-12-16 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755702#comment-15755702
 ] 

Sergey Shelukhin commented on HIVE-15447:
-

+1 if test failures are unrelated

> Log session ID in ATSHook
> -
>
> Key: HIVE-15447
> URL: https://issues.apache.org/jira/browse/HIVE-15447
> Project: Hive
>  Issue Type: Bug
>  Components: Hooks
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15447.1.patch
>
>
> Log the SessionID in addition the log trace ID (which can be different).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15297) Hive should not split semicolon within quoted string literals

2016-12-16 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15297:
---
Status: Patch Available  (was: Open)

> Hive should not split semicolon within quoted string literals
> -
>
> Key: HIVE-15297
> URL: https://issues.apache.org/jira/browse/HIVE-15297
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15297.01.patch, HIVE-15297.02.patch, 
> HIVE-15297.03.patch, HIVE-15297.04.patch
>
>
> String literals in query cannot have reserved symbols. The same set of query 
> works fine in mysql and postgresql. 
> {code}
> hive> CREATE TABLE ts(s varchar(550));
> OK
> Time taken: 0.075 seconds
> hive> INSERT INTO ts VALUES ('Mozilla/5.0 (iPhone; CPU iPhone OS 5_0');
> MismatchedTokenException(14!=326)
>   at 
> org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
>   at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valueRowConstructor(HiveParser_FromClauseParser.java:7271)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesTableConstructor(HiveParser_FromClauseParser.java:7370)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesClause(HiveParser_FromClauseParser.java:7510)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.valuesClause(HiveParser.java:51854)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:45432)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:44578)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:8)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1694)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1176)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:402)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:326)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1169)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1288)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1095)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1083)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> FAILED: ParseException line 1:31 mismatched input '/' expecting ) near 
> 'Mozilla' in value row constructor
> hive>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15297) Hive should not split semicolon within quoted string literals

2016-12-16 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15297:
---
Attachment: HIVE-15297.04.patch

> Hive should not split semicolon within quoted string literals
> -
>
> Key: HIVE-15297
> URL: https://issues.apache.org/jira/browse/HIVE-15297
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15297.01.patch, HIVE-15297.02.patch, 
> HIVE-15297.03.patch, HIVE-15297.04.patch
>
>
> String literals in query cannot have reserved symbols. The same set of query 
> works fine in mysql and postgresql. 
> {code}
> hive> CREATE TABLE ts(s varchar(550));
> OK
> Time taken: 0.075 seconds
> hive> INSERT INTO ts VALUES ('Mozilla/5.0 (iPhone; CPU iPhone OS 5_0');
> MismatchedTokenException(14!=326)
>   at 
> org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
>   at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valueRowConstructor(HiveParser_FromClauseParser.java:7271)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesTableConstructor(HiveParser_FromClauseParser.java:7370)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesClause(HiveParser_FromClauseParser.java:7510)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.valuesClause(HiveParser.java:51854)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:45432)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:44578)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:8)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1694)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1176)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:402)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:326)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1169)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1288)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1095)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1083)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> FAILED: ParseException line 1:31 mismatched input '/' expecting ) near 
> 'Mozilla' in value row constructor
> hive>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15297) Hive should not split semicolon within quoted string literals

2016-12-16 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15297:
---
Status: Open  (was: Patch Available)

> Hive should not split semicolon within quoted string literals
> -
>
> Key: HIVE-15297
> URL: https://issues.apache.org/jira/browse/HIVE-15297
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15297.01.patch, HIVE-15297.02.patch, 
> HIVE-15297.03.patch, HIVE-15297.04.patch
>
>
> String literals in query cannot have reserved symbols. The same set of query 
> works fine in mysql and postgresql. 
> {code}
> hive> CREATE TABLE ts(s varchar(550));
> OK
> Time taken: 0.075 seconds
> hive> INSERT INTO ts VALUES ('Mozilla/5.0 (iPhone; CPU iPhone OS 5_0');
> MismatchedTokenException(14!=326)
>   at 
> org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
>   at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valueRowConstructor(HiveParser_FromClauseParser.java:7271)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesTableConstructor(HiveParser_FromClauseParser.java:7370)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesClause(HiveParser_FromClauseParser.java:7510)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.valuesClause(HiveParser.java:51854)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:45432)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:44578)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:8)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1694)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1176)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:402)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:326)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1169)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1288)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1095)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1083)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> FAILED: ParseException line 1:31 mismatched input '/' expecting ) near 
> 'Mozilla' in value row constructor
> hive>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15297) Hive should not split semicolon within quoted string literals

2016-12-16 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15297:
---
Attachment: HIVE-15297.04.patch

> Hive should not split semicolon within quoted string literals
> -
>
> Key: HIVE-15297
> URL: https://issues.apache.org/jira/browse/HIVE-15297
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15297.01.patch, HIVE-15297.02.patch, 
> HIVE-15297.03.patch
>
>
> String literals in query cannot have reserved symbols. The same set of query 
> works fine in mysql and postgresql. 
> {code}
> hive> CREATE TABLE ts(s varchar(550));
> OK
> Time taken: 0.075 seconds
> hive> INSERT INTO ts VALUES ('Mozilla/5.0 (iPhone; CPU iPhone OS 5_0');
> MismatchedTokenException(14!=326)
>   at 
> org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
>   at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valueRowConstructor(HiveParser_FromClauseParser.java:7271)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesTableConstructor(HiveParser_FromClauseParser.java:7370)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesClause(HiveParser_FromClauseParser.java:7510)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.valuesClause(HiveParser.java:51854)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:45432)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:44578)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:8)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1694)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1176)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:402)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:326)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1169)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1288)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1095)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1083)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> FAILED: ParseException line 1:31 mismatched input '/' expecting ) near 
> 'Mozilla' in value row constructor
> hive>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15297) Hive should not split semicolon within quoted string literals

2016-12-16 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15297:
---
Attachment: (was: HIVE-15297.04.patch)

> Hive should not split semicolon within quoted string literals
> -
>
> Key: HIVE-15297
> URL: https://issues.apache.org/jira/browse/HIVE-15297
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15297.01.patch, HIVE-15297.02.patch, 
> HIVE-15297.03.patch
>
>
> String literals in query cannot have reserved symbols. The same set of query 
> works fine in mysql and postgresql. 
> {code}
> hive> CREATE TABLE ts(s varchar(550));
> OK
> Time taken: 0.075 seconds
> hive> INSERT INTO ts VALUES ('Mozilla/5.0 (iPhone; CPU iPhone OS 5_0');
> MismatchedTokenException(14!=326)
>   at 
> org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
>   at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valueRowConstructor(HiveParser_FromClauseParser.java:7271)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesTableConstructor(HiveParser_FromClauseParser.java:7370)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.valuesClause(HiveParser_FromClauseParser.java:7510)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.valuesClause(HiveParser.java:51854)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:45432)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:44578)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:8)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1694)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1176)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:402)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:326)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1169)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1288)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1095)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1083)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> FAILED: ParseException line 1:31 mismatched input '/' expecting ) near 
> 'Mozilla' in value row constructor
> hive>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15456) Umbrella JIRA to track all subquery redesign related tasks and issues

2016-12-16 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15456:
---
Summary: Umbrella JIRA to track all subquery redesign related tasks and 
issues  (was: This is an umbrella JIRA to track all subquery redesign related 
tasks and issues)

> Umbrella JIRA to track all subquery redesign related tasks and issues
> -
>
> Key: HIVE-15456
> URL: https://issues.apache.org/jira/browse/HIVE-15456
> Project: Hive
>  Issue Type: Task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
>
> With HIVE-15192 we are using Calcite's functionality to decorrelate and plan 
> subqueries. This JIRA is to track all planned improvements, tasks and known 
> bugs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15456) Umbrella JIRA for all subquery redesign related tasks and issues

2016-12-16 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15456:
---
Summary: Umbrella JIRA for all subquery redesign related tasks and issues  
(was: Umbrella JIRA to track all subquery redesign related tasks and issues)

> Umbrella JIRA for all subquery redesign related tasks and issues
> 
>
> Key: HIVE-15456
> URL: https://issues.apache.org/jira/browse/HIVE-15456
> Project: Hive
>  Issue Type: Task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
>
> With HIVE-15192 we are using Calcite's functionality to decorrelate and plan 
> subqueries. This JIRA is to track all planned improvements, tasks and known 
> bugs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15192) Use Calcite to de-correlate and plan subqueries

2016-12-16 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15192:
---
Issue Type: Sub-task  (was: Task)
Parent: HIVE-15456

> Use Calcite to de-correlate and plan subqueries
> ---
>
> Key: HIVE-15192
> URL: https://issues.apache.org/jira/browse/HIVE-15192
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Fix For: 2.2.0
>
> Attachments: HIVE-15192.10.patch, HIVE-15192.2.patch, 
> HIVE-15192.3.patch, HIVE-15192.4.patch, HIVE-15192.5.patch, 
> HIVE-15192.6.patch, HIVE-15192.7.patch, HIVE-15192.8.patch, 
> HIVE-15192.9.patch, HIVE-15192.patch
>
>
> HIVE currently tranform subqueries into SEMI-JOIN or LEFT OUTER JOIN. This 
> transformation occurs on query AST before generating logical plan. These 
> transformations are described at [Link to original spec | 
> https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf]. 
> Such transformations aren't able to handle a lot of subqueries, as a result 
> HIVE imposes various restrictions on the type of queries it could handle e.g. 
> Hive disallows nested subqueries. All current restrictions are detailed in 
> above linked document.
> This patch is 1st phase of getting rid of these transformations and leverage 
> Calcite's functionality to plan such queries. 
> Next phases will be lifting restrictions one by one. 
> Note that this patch already lifts one restriction *Restriction.6.m* (The LHS 
> in a SubQuery must have all its Column References be qualified)
> Known issues with this patch are:
>  * Return path tests fails for various reasons and are currently disabled. We 
> plan to fix and re-enable this later.
>   * Semi-join optimization (HIVE-15227) is disabled by default as it doesn't 
> work with this patch. We plan to fix this and re-enable it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15441) Provide a config to timeout long compiling queries

2016-12-16 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755659#comment-15755659
 ] 

Thejas M Nair commented on HIVE-15441:
--

[~ctang.ma] Are you suggesting that the change in HIVE-12431 would help ? The 
goal here is different, to interrupt the query that is taking too long during 
compile, not to interrupt the one that is waiting on the lock.


> Provide a config to timeout long compiling queries
> --
>
> Key: HIVE-15441
> URL: https://issues.apache.org/jira/browse/HIVE-15441
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15441.1.patch
>
>
> Sometimes Hive users have long compiling queries which may need to scan 
> thousands or even more partitions (perhaps by accident). The compilation 
> process may take a very long time, especially in {{getInputSummary}} where it 
> need to make NN calls to get info about each input path.
> This is bad because it may block many other queries. Parallel compilation may 
> be useful but still {{getInputSummary}} has a global lock. In this case, it 
> makes sense to provide Hive admin with a config to put a timeout limit for 
> compilation, so that these "bad" queries can be blocked.
> Note https://issues.apache.org/jira/browse/HIVE-12431 also tries to address 
> similar issue. However it cancels those queries that are waiting for the 
> compile lock, which I think is not so useful for our case since the *query 
> under compile is the one to be blamed.*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-16 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15376:
-
Status: Patch Available  (was: Open)

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch, HIVE-15376.8.patch, HIVE-15376.9.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15456) This is an umbrella JIRA to track all subquery redesign related tasks and issues

2016-12-16 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-15456:
---
Labels: sub-query  (was: )

> This is an umbrella JIRA to track all subquery redesign related tasks and 
> issues
> 
>
> Key: HIVE-15456
> URL: https://issues.apache.org/jira/browse/HIVE-15456
> Project: Hive
>  Issue Type: Task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
>
> With HIVE-15192 we are using Calcite's functionality to decorrelate and plan 
> subqueries. This JIRA is to track all planned improvements, tasks and known 
> bugs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-16 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15376:
-
Attachment: HIVE-15376.9.patch

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch, HIVE-15376.8.patch, HIVE-15376.9.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-16 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-15376:
-
Status: Open  (was: Patch Available)

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch, HIVE-15376.8.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15448) ChangeManager for replication

2016-12-16 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755645#comment-15755645
 ] 

Hive QA commented on HIVE-15448:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843641/HIVE-15448.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10824 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=233)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=250)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_nested_subquery]
 (batchId=84)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_shared_alias]
 (batchId=84)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2617/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2617/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2617/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843641 - PreCommit-HIVE-Build

> ChangeManager for replication
> -
>
> Key: HIVE-15448
> URL: https://issues.apache.org/jira/browse/HIVE-15448
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15448.1.patch
>
>
> The change manager implementation as described in 
> https://cwiki.apache.org/confluence/display/Hive/HiveReplicationv2Development#HiveReplicationv2Development-Changemanagement.
>  This issue tracks the infrastructure code. Hooking to actions will be 
> tracked in other ticket.
> ReplChangeManager includes:
> * method to generate checksum
> * method to convert file path to cm path
> * method to move table/partition/file into cm
> * thread to clear cm files if expires



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-16 Thread Wei Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755647#comment-15755647
 ] 

Wei Zheng commented on HIVE-15376:
--

acquireLocksWithHeartbeatDelay was previously used only by JUnit test 
TestDbTxnManager2, which now is using openTxn(Context, String, long) as we've 
moved the heartbeater starting logic from acquireLocksWithHeartbeatDelay to 
openTxn.

I moved the RO query heartbeater cancelling logic into if (!atLeastOneLock). 
Thanks for pointing that out.

The idea for the change in TestDbTxnManager.testLockTimeout() was to get rid of 
the influence of heartbeat that is introduced in acquireLocks. But now I 
realized I shouldn't have called openTxn and used delay to accomplish that. I 
made a change by introducing an additional param for not starting heartbeat.

Case 4 in TestDbTxnManager.testHeartbeater() is proving that even when there's 
no open transaction, as long as there's lock required, we will send heartbeat.

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch, HIVE-15376.8.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15298) Unit test failures in TestCliDriver sample[2,4,6,7,9]

2016-12-16 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755643#comment-15755643
 ] 

Jason Dere commented on HIVE-15298:
---

+1

> Unit test failures in TestCliDriver sample[2,4,6,7,9]
> -
>
> Key: HIVE-15298
> URL: https://issues.apache.org/jira/browse/HIVE-15298
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jason Dere
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-15298.1.patch, HIVE-15298.2.patch, 
> HIVE-15298.3.patch, HIVE-15298.4.patch, HIVE-15298.patch
>
>
> Failing for the past 5 builds:
> https://builds.apache.org/job/PreCommit-HIVE-Build/2301/testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-15183) Wrong result with NOT IN involving null values

2016-12-16 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg resolved HIVE-15183.

Resolution: Cannot Reproduce

With subquery redesign in HIVE-15192 this is not reproducible anymore.

> Wrong result with NOT IN involving null values
> --
>
> Key: HIVE-15183
> URL: https://issues.apache.org/jira/browse/HIVE-15183
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
>
>  Reproducer
> {code}
> create table t7(i int, j int);
> create table fixOb(i int, j int);
> insert into t7 values(null, 5);
> insert into t7 values(4, 15);
> insert into fixOb values(-1, 5);
> insert into fixOb values(-1, 15);
> {code}
> Query:
> {code} select * from fixOb where j NOT IN (select i from t7 where 
> t7.j=fixOb.j);  {code}
> Expected Result
> {noformat}
> i  | j  
> +
>  -1 | 15
> {noformat}
> Actual Result
> {noformat} No result {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15200) Support setOp in subQuery with parentheses

2016-12-16 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15200:
---
Status: Patch Available  (was: Open)

> Support setOp in subQuery with parentheses
> --
>
> Key: HIVE-15200
> URL: https://issues.apache.org/jira/browse/HIVE-15200
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15200.01.patch, HIVE-15200.02.patch, 
> HIVE-15200.03.patch
>
>
> {code}
> explain select key from ((select key from src) union (select key from 
> src))subq;
> {code}
> will throw
> {code}
> FAILED: ParseException line 1:47 cannot recognize input near 'union' '(' 
> 'select' in subquery source
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15200) Support setOp in subQuery with parentheses

2016-12-16 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15200:
---
Attachment: HIVE-15200.03.patch

> Support setOp in subQuery with parentheses
> --
>
> Key: HIVE-15200
> URL: https://issues.apache.org/jira/browse/HIVE-15200
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15200.01.patch, HIVE-15200.02.patch, 
> HIVE-15200.03.patch
>
>
> {code}
> explain select key from ((select key from src) union (select key from 
> src))subq;
> {code}
> will throw
> {code}
> FAILED: ParseException line 1:47 cannot recognize input near 'union' '(' 
> 'select' in subquery source
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15200) Support setOp in subQuery with parentheses

2016-12-16 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15200:
---
Status: Open  (was: Patch Available)

> Support setOp in subQuery with parentheses
> --
>
> Key: HIVE-15200
> URL: https://issues.apache.org/jira/browse/HIVE-15200
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15200.01.patch, HIVE-15200.02.patch, 
> HIVE-15200.03.patch
>
>
> {code}
> explain select key from ((select key from src) union (select key from 
> src))subq;
> {code}
> will throw
> {code}
> FAILED: ParseException line 1:47 cannot recognize input near 'union' '(' 
> 'select' in subquery source
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-15443) There is a table stored as orc format, it contains a column with the type of array.When each cell of this column contains tens of strings, the queries reported A

2016-12-16 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-15443.
--
Resolution: Duplicate

Closing it as duplicate of HIVE-14483
Please reopen it if HIVE-14483 doesn't fix it. 

> There is a table stored as orc format, it contains a column with the type of 
> array.When each cell of this column contains tens of strings, the 
> queries reported ArrayIndexOutOfBoundsException.
> ---
>
> Key: HIVE-15443
> URL: https://issues.apache.org/jira/browse/HIVE-15443
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
> Environment: centos+hive2.1.0+hadoop2.7.2
>Reporter: mortalee
>  Labels: patch
> Attachments: orc_array.patch
>
>
> java.lang.Exception: java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 1024
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
> Caused by: java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 1024
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:230)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:140)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 1024
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:106)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:42)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:228)
>   ... 12 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
>   at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
>   at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
>   at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
>   at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
>   at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
>   at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
>   at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
>   at 
>

[jira] [Commented] (HIVE-15443) There is a table stored as orc format, it contains a column with the type of array.When each cell of this column contains tens of strings, the queries reported

2016-12-16 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755565#comment-15755565
 ] 

Prasanth Jayachandran commented on HIVE-15443:
--

Duplicate of HIVE-14483 ?

> There is a table stored as orc format, it contains a column with the type of 
> array.When each cell of this column contains tens of strings, the 
> queries reported ArrayIndexOutOfBoundsException.
> ---
>
> Key: HIVE-15443
> URL: https://issues.apache.org/jira/browse/HIVE-15443
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
> Environment: centos+hive2.1.0+hadoop2.7.2
>Reporter: mortalee
>  Labels: patch
> Attachments: orc_array.patch
>
>
> java.lang.Exception: java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 1024
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
> Caused by: java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 1024
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:230)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:140)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 1024
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:106)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:42)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:228)
>   ... 12 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
>   at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
>   at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
>   at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
>   at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
>   at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
>   at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
>   at 
> org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
>   at 
>

[jira] [Commented] (HIVE-15447) Log session ID in ATSHook

2016-12-16 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755544#comment-15755544
 ] 

Hive QA commented on HIVE-15447:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843631/HIVE-15447.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10821 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=233)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=250)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=150)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_nested_subquery]
 (batchId=84)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_shared_alias]
 (batchId=84)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2616/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2616/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2616/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843631 - PreCommit-HIVE-Build

> Log session ID in ATSHook
> -
>
> Key: HIVE-15447
> URL: https://issues.apache.org/jira/browse/HIVE-15447
> Project: Hive
>  Issue Type: Bug
>  Components: Hooks
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15447.1.patch
>
>
> Log the SessionID in addition the log trace ID (which can be different).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15426) Fix order guarantee of event executions for REPL LOAD

2016-12-16 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755516#comment-15755516
 ] 

Sushanth Sowmyan commented on HIVE-15426:
-

Created jiras for all the failing tests in HIVE-14547.

> Fix order guarantee of event executions for REPL LOAD
> -
>
> Key: HIVE-15426
> URL: https://issues.apache.org/jira/browse/HIVE-15426
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-15426.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15450) Failing tests : testCliDriver.sample[24679]

2016-12-16 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-15450:

Summary: Failing tests : testCliDriver.sample[24679]  (was: Flaky tests : 
testCliDriver.sample[24679])

> Failing tests : testCliDriver.sample[24679]
> ---
>
> Key: HIVE-15450
> URL: https://issues.apache.org/jira/browse/HIVE-15450
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sushanth Sowmyan
>
> Noted during ptests, the .q out seems to be erroring.
> There seems to be a difference in ordering of output that is causing this 
> failure.
> See https://builds.apache.org/job/PreCommit-HIVE-Build/2615/#showFailuresLink 
> for a new-ish job with these failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15420) LLAP UI: Relativize resources to allow proxied/secured views

2016-12-16 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-15420:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
 Release Note: LLAP UI: Relativize resources to allow proxied/secured views 
(Gopal V, reviewed by Rajesh Balamohan)
   Status: Resolved  (was: Patch Available)

> LLAP UI: Relativize resources to allow proxied/secured views 
> -
>
> Key: HIVE-15420
> URL: https://issues.apache.org/jira/browse/HIVE-15420
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Web UI
>Reporter: Gopal V
>Assignee: Gopal V
> Fix For: 2.2.0
>
> Attachments: HIVE-15420.1.patch
>
>
> If the UI is secured behind a gateway firewall instance, this allows for the 
> UI to function with a base URL like http:///proxy/
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15446) Hive fails in recursive debug

2016-12-16 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755485#comment-15755485
 ] 

Chaoyu Tang commented on HIVE-15446:


The test failures are not related to his patch.

> Hive fails in recursive debug
> -
>
> Key: HIVE-15446
> URL: https://issues.apache.org/jira/browse/HIVE-15446
> Project: Hive
>  Issue Type: Bug
>  Components: Diagnosability
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>Priority: Minor
> Attachments: HIVE-15446.patch
>
>
> When running hive recursive debug mode, for example,
> ./bin/hive --debug:port=10008,childSuspend=y
> It fails with error msg:
> --
> ERROR: Cannot load this JVM TI agent twice, check your java command line for 
> duplicate jdwp options.Error occurred during initialization of VM
> agent library failed to init: jdwp
> --
> It is because HADOOP_OPTS and HADOOP_CLIENT_OPTS both have jvm debug options 
> when invoking HADOOP.sh for the child process. The HADOOP_CLIENT_OPTS is 
> appended to HADOOP_OPTS in HADOOP.sh which leads to the duplicated debug 
> options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15426) Fix order guarantee of event executions for REPL LOAD

2016-12-16 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755467#comment-15755467
 ] 

Sushanth Sowmyan commented on HIVE-15426:
-

Attached review board link over at : https://reviews.apache.org/r/54818/

Going through the list of tests under HIVE-14547 and reconciling.

> Fix order guarantee of event executions for REPL LOAD
> -
>
> Key: HIVE-15426
> URL: https://issues.apache.org/jira/browse/HIVE-15426
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-15426.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15446) Hive fails in recursive debug

2016-12-16 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755452#comment-15755452
 ] 

Hive QA commented on HIVE-15446:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843628/HIVE-15446.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10807 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=233)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=250)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=150)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_nested_subquery]
 (batchId=84)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_shared_alias]
 (batchId=84)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=110)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2615/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2615/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2615/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843628 - PreCommit-HIVE-Build

> Hive fails in recursive debug
> -
>
> Key: HIVE-15446
> URL: https://issues.apache.org/jira/browse/HIVE-15446
> Project: Hive
>  Issue Type: Bug
>  Components: Diagnosability
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>Priority: Minor
> Attachments: HIVE-15446.patch
>
>
> When running hive recursive debug mode, for example,
> ./bin/hive --debug:port=10008,childSuspend=y
> It fails with error msg:
> --
> ERROR: Cannot load this JVM TI agent twice, check your java command line for 
> duplicate jdwp options.Error occurred during initialization of VM
> agent library failed to init: jdwp
> --
> It is because HADOOP_OPTS and HADOOP_CLIENT_OPTS both have jvm debug options 
> when invoking HADOOP.sh for the child process. The HADOOP_CLIENT_OPTS is 
> appended to HADOOP_OPTS in HADOOP.sh which leads to the duplicated debug 
> options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15426) Fix order guarantee of event executions for REPL LOAD

2016-12-16 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755444#comment-15755444
 ] 

Thejas M Nair commented on HIVE-15426:
--

[~sushanth]
Can you please add a review board link/pull request ?
Also, if the test failures are unrelated, can  you please confirm if the failed 
tests are already tracked under HIVE-14547 ?

> Fix order guarantee of event executions for REPL LOAD
> -
>
> Key: HIVE-15426
> URL: https://issues.apache.org/jira/browse/HIVE-15426
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-15426.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15448) ChangeManager for replication

2016-12-16 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755423#comment-15755423
 ] 

Thejas M Nair commented on HIVE-15448:
--

[~daijy] Can you also please include a reviewboard link or pull request ?


> ChangeManager for replication
> -
>
> Key: HIVE-15448
> URL: https://issues.apache.org/jira/browse/HIVE-15448
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15448.1.patch
>
>
> The change manager implementation as described in 
> https://cwiki.apache.org/confluence/display/Hive/HiveReplicationv2Development#HiveReplicationv2Development-Changemanagement.
>  This issue tracks the infrastructure code. Hooking to actions will be 
> tracked in other ticket.
> ReplChangeManager includes:
> * method to generate checksum
> * method to convert file path to cm path
> * method to move table/partition/file into cm
> * thread to clear cm files if expires



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15335) Fast Decimal

2016-12-16 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755395#comment-15755395
 ] 

Sergey Shelukhin edited comment on HIVE-15335 at 12/16/16 8:08 PM:
---

Done with RB up to iter #7. Phew! Left a few comments... thanks for the added 
code comments :)
Reviewing FastHiveDecimalImpl, I eventually went into an "eyeball the code" 
mode (esp. with e.g. multiplication/division), parsing it all in detail might 
take longer than writing it ;) So I hope it has good test coverage. If someone 
wants to read these in great detail you are welcome to...



was (Author: sershe):
Done with RB up to iter #7. Phew! Left a few comments... thanks for the added 
code comments :)
Reviewing FastHiveDecimalImpl, I eventually went into an "eyeball the code" 
mode (esp. with e.g. multiplication/division), reading it all in detail might 
take longer than writing it ;) So I hope it has good test coverage. If someone 
wants to read these in great detail you are welcome to...


> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15335) Fast Decimal

2016-12-16 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755395#comment-15755395
 ] 

Sergey Shelukhin commented on HIVE-15335:
-

Done with RB up to iter #7. Phew! Left a few comments... thanks for the added 
code comments :)
Reviewing FastHiveDecimalImpl, I eventually went into an "eyeball the code" 
mode (esp. with e.g. multiplication/division), reading it all in detail might 
take longer than writing it ;) So I hope it has good test coverage. If someone 
wants to read these in great detail you are welcome to...


> Fast Decimal
> 
>
> Key: HIVE-15335
> URL: https://issues.apache.org/jira/browse/HIVE-15335
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, 
> HIVE-15335.03.patch, HIVE-15335.04.patch, HIVE-15335.05.patch, 
> HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, 
> HIVE-15335.09.patch, HIVE-15335.091.patch, HIVE-15335.092.patch, 
> HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal 
> internally as a BigDecimal with a faster version that does not allocate extra 
> objects
> Replace HiveDecimalWritable implementation with a faster version that has new 
> mutable* calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and 
> stores the result as a fast decimal instead of a slow byte array containing a 
> serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15448) ChangeManager for replication

2016-12-16 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-15448:
--
Status: Patch Available  (was: Open)

> ChangeManager for replication
> -
>
> Key: HIVE-15448
> URL: https://issues.apache.org/jira/browse/HIVE-15448
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15448.1.patch
>
>
> The change manager implementation as described in 
> https://cwiki.apache.org/confluence/display/Hive/HiveReplicationv2Development#HiveReplicationv2Development-Changemanagement.
>  This issue tracks the infrastructure code. Hooking to actions will be 
> tracked in other ticket.
> ReplChangeManager includes:
> * method to generate checksum
> * method to convert file path to cm path
> * method to move table/partition/file into cm
> * thread to clear cm files if expires



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15448) ChangeManager for replication

2016-12-16 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-15448:
--
Attachment: HIVE-15448.1.patch

> ChangeManager for replication
> -
>
> Key: HIVE-15448
> URL: https://issues.apache.org/jira/browse/HIVE-15448
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15448.1.patch
>
>
> The change manager implementation as described in 
> https://cwiki.apache.org/confluence/display/Hive/HiveReplicationv2Development#HiveReplicationv2Development-Changemanagement.
>  This issue tracks the infrastructure code. Hooking to actions will be 
> tracked in other ticket.
> ReplChangeManager includes:
> * method to generate checksum
> * method to convert file path to cm path
> * method to move table/partition/file into cm
> * thread to clear cm files if expires



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15439) Support INSERT OVERWRITE for internal druid datasources.

2016-12-16 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755356#comment-15755356
 ] 

Hive QA commented on HIVE-15439:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843622/HIVE-15439.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 10821 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=233)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=250)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=217)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert]
 (batchId=217)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_queries] 
(batchId=89)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_single_sourced_multi_insert]
 (batchId=90)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbasestats] 
(batchId=88)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a]
 (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=150)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_nested_subquery]
 (batchId=84)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_shared_alias]
 (batchId=84)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2614/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2614/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2614/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843622 - PreCommit-HIVE-Build

> Support INSERT OVERWRITE for internal druid datasources.
> 
>
> Key: HIVE-15439
> URL: https://issues.apache.org/jira/browse/HIVE-15439
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15439.patch
>
>
> Add support for SQL statement INSERT OVERWRITE TABLE druid_internal_table.
> In order to add this support will need to add new post insert hook to update 
> the druid metadata. Creation of the segment will be the same as CTAS.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15428) HoS DPP doesn't remove cyclic dependency

2016-12-16 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755343#comment-15755343
 ] 

Chao Sun commented on HIVE-15428:
-

+1

> HoS DPP doesn't remove cyclic dependency
> 
>
> Key: HIVE-15428
> URL: https://issues.apache.org/jira/browse/HIVE-15428
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-15428.1.patch
>
>
> More details in HIVE-15357



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15445) Subquery failing with ClassCastException

2016-12-16 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755306#comment-15755306
 ] 

Jesus Camacho Rodriguez commented on HIVE-15445:


Fail is still present after HIVE-15192. It makes sense since it fails at 
parsing time. Let me take another look at it: solution might be as simple as 
adding a check for constant null value.

> Subquery failing with ClassCastException
> 
>
> Key: HIVE-15445
> URL: https://issues.apache.org/jira/browse/HIVE-15445
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15445.patch
>
>
> To reproduce:
> {code:sql}
> CREATE TABLE table_7 (int_col INT);
> SELECT
> (t1.int_col) * (t1.int_col) AS int_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) t1
> WHERE
> (False) NOT IN (SELECT
> False AS boolean_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) tt1
> WHERE
> (t1.int_col) = (tt1.int_col));
> {code}
> The problem seems to be in the method that tries to resolve the subquery 
> column _MIN(NULL)_. It checks the column inspector and ends up returning a 
> constant expression instead of a column expression for _min(null)_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15445) Subquery failing with ClassCastException

2016-12-16 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755292#comment-15755292
 ] 

Vineet Garg commented on HIVE-15445:


I looked at the patch. I don't believe subquery patch touches this part of 
code. You should see the same issue with subquery patch

> Subquery failing with ClassCastException
> 
>
> Key: HIVE-15445
> URL: https://issues.apache.org/jira/browse/HIVE-15445
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15445.patch
>
>
> To reproduce:
> {code:sql}
> CREATE TABLE table_7 (int_col INT);
> SELECT
> (t1.int_col) * (t1.int_col) AS int_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) t1
> WHERE
> (False) NOT IN (SELECT
> False AS boolean_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) tt1
> WHERE
> (t1.int_col) = (tt1.int_col));
> {code}
> The problem seems to be in the method that tries to resolve the subquery 
> column _MIN(NULL)_. It checks the column inspector and ends up returning a 
> constant expression instead of a column expression for _min(null)_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15445) Subquery failing with ClassCastException

2016-12-16 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755284#comment-15755284
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-15445 at 12/16/16 7:19 PM:
--

Thanks for checking [~ashutoshc]. Sure, that sounds good, once the subquery 
code is checked in, we can revisit if problem persists.

Cc [~vgarg]


was (Author: jcamachorodriguez):
Thanks for checking [~ashutoshc]. Sure, that sounds good, once the subquery 
code is checked in, we can revisit if problem persists.

> Subquery failing with ClassCastException
> 
>
> Key: HIVE-15445
> URL: https://issues.apache.org/jira/browse/HIVE-15445
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15445.patch
>
>
> To reproduce:
> {code:sql}
> CREATE TABLE table_7 (int_col INT);
> SELECT
> (t1.int_col) * (t1.int_col) AS int_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) t1
> WHERE
> (False) NOT IN (SELECT
> False AS boolean_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) tt1
> WHERE
> (t1.int_col) = (tt1.int_col));
> {code}
> The problem seems to be in the method that tries to resolve the subquery 
> column _MIN(NULL)_. It checks the column inspector and ends up returning a 
> constant expression instead of a column expression for _min(null)_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15445) Subquery failing with ClassCastException

2016-12-16 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755284#comment-15755284
 ] 

Jesus Camacho Rodriguez commented on HIVE-15445:


Thanks for checking [~ashutoshc]. Sure, that sounds good, once the subquery 
code is checked in, we can revisit if problem persists.

> Subquery failing with ClassCastException
> 
>
> Key: HIVE-15445
> URL: https://issues.apache.org/jira/browse/HIVE-15445
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15445.patch
>
>
> To reproduce:
> {code:sql}
> CREATE TABLE table_7 (int_col INT);
> SELECT
> (t1.int_col) * (t1.int_col) AS int_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) t1
> WHERE
> (False) NOT IN (SELECT
> False AS boolean_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) tt1
> WHERE
> (t1.int_col) = (tt1.int_col));
> {code}
> The problem seems to be in the method that tries to resolve the subquery 
> column _MIN(NULL)_. It checks the column inspector and ends up returning a 
> constant expression instead of a column expression for _min(null)_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13278) Avoid FileNotFoundException when map/reduce.xml is not available

2016-12-16 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755255#comment-15755255
 ] 

Hive QA commented on HIVE-13278:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843621/HIVE-13278.5.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 10820 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] 
(batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] 
(batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_1] 
(batchId=92)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=93)
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testCombinationInputFormat
 (batchId=254)
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testCombinationInputFormatWithAcid
 (batchId=254)
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testVectorReaderFooterSerialize
 (batchId=254)
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testVectorReaderNoFooterSerialize
 (batchId=254)
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testVectorization 
(batchId=254)
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testVectorizationWithAcid
 (batchId=254)
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testVectorizationWithBuckets
 (batchId=254)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2613/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2613/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2613/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 20 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843621 - PreCommit-HIVE-Build

> Avoid FileNotFoundException when map/reduce.xml is not available
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch, 
> HIVE-13278.3.patch, HIVE-13278.4.patch, HIVE-13278.5.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
>

[jira] [Commented] (HIVE-15445) Subquery failing with ClassCastException

2016-12-16 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755244#comment-15755244
 ] 

Ashutosh Chauhan commented on HIVE-15445:
-

I think we can defer this till subquery work is complete, since this code is in 
flux at the moment.

> Subquery failing with ClassCastException
> 
>
> Key: HIVE-15445
> URL: https://issues.apache.org/jira/browse/HIVE-15445
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15445.patch
>
>
> To reproduce:
> {code:sql}
> CREATE TABLE table_7 (int_col INT);
> SELECT
> (t1.int_col) * (t1.int_col) AS int_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) t1
> WHERE
> (False) NOT IN (SELECT
> False AS boolean_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) tt1
> WHERE
> (t1.int_col) = (tt1.int_col));
> {code}
> The problem seems to be in the method that tries to resolve the subquery 
> column _MIN(NULL)_. It checks the column inspector and ends up returning a 
> constant expression instead of a column expression for _min(null)_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15428) HoS DPP doesn't remove cyclic dependency

2016-12-16 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755226#comment-15755226
 ] 

Chao Sun commented on HIVE-15428:
-

[~lirui] I agree - removing the smaller one doesn't make sense to me either. 
I'll take a look at the patch.

> HoS DPP doesn't remove cyclic dependency
> 
>
> Key: HIVE-15428
> URL: https://issues.apache.org/jira/browse/HIVE-15428
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-15428.1.patch
>
>
> More details in HIVE-15357



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15445) Subquery failing with ClassCastException

2016-12-16 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755220#comment-15755220
 ] 

Ashutosh Chauhan commented on HIVE-15445:
-

If its specific to subquery then HIVE-15192 may have impact on this.

> Subquery failing with ClassCastException
> 
>
> Key: HIVE-15445
> URL: https://issues.apache.org/jira/browse/HIVE-15445
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15445.patch
>
>
> To reproduce:
> {code:sql}
> CREATE TABLE table_7 (int_col INT);
> SELECT
> (t1.int_col) * (t1.int_col) AS int_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) t1
> WHERE
> (False) NOT IN (SELECT
> False AS boolean_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) tt1
> WHERE
> (t1.int_col) = (tt1.int_col));
> {code}
> The problem seems to be in the method that tries to resolve the subquery 
> column _MIN(NULL)_. It checks the column inspector and ends up returning a 
> constant expression instead of a column expression for _min(null)_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15447) Log session ID in ATSHook

2016-12-16 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-15447:
--
Status: Patch Available  (was: Open)

> Log session ID in ATSHook
> -
>
> Key: HIVE-15447
> URL: https://issues.apache.org/jira/browse/HIVE-15447
> Project: Hive
>  Issue Type: Bug
>  Components: Hooks
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15447.1.patch
>
>
> Log the SessionID in addition the log trace ID (which can be different).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15447) Log session ID in ATSHook

2016-12-16 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-15447:
--
Attachment: HIVE-15447.1.patch

> Log session ID in ATSHook
> -
>
> Key: HIVE-15447
> URL: https://issues.apache.org/jira/browse/HIVE-15447
> Project: Hive
>  Issue Type: Bug
>  Components: Hooks
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-15447.1.patch
>
>
> Log the SessionID in addition the log trace ID (which can be different).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15192) Use Calcite to de-correlate and plan subqueries

2016-12-16 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15192:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Vineet!

> Use Calcite to de-correlate and plan subqueries
> ---
>
> Key: HIVE-15192
> URL: https://issues.apache.org/jira/browse/HIVE-15192
> Project: Hive
>  Issue Type: Task
>  Components: Logical Optimizer
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>  Labels: sub-query
> Fix For: 2.2.0
>
> Attachments: HIVE-15192.10.patch, HIVE-15192.2.patch, 
> HIVE-15192.3.patch, HIVE-15192.4.patch, HIVE-15192.5.patch, 
> HIVE-15192.6.patch, HIVE-15192.7.patch, HIVE-15192.8.patch, 
> HIVE-15192.9.patch, HIVE-15192.patch
>
>
> HIVE currently tranform subqueries into SEMI-JOIN or LEFT OUTER JOIN. This 
> transformation occurs on query AST before generating logical plan. These 
> transformations are described at [Link to original spec | 
> https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf]. 
> Such transformations aren't able to handle a lot of subqueries, as a result 
> HIVE imposes various restrictions on the type of queries it could handle e.g. 
> Hive disallows nested subqueries. All current restrictions are detailed in 
> above linked document.
> This patch is 1st phase of getting rid of these transformations and leverage 
> Calcite's functionality to plan such queries. 
> Next phases will be lifting restrictions one by one. 
> Note that this patch already lifts one restriction *Restriction.6.m* (The LHS 
> in a SubQuery must have all its Column References be qualified)
> Known issues with this patch are:
>  * Return path tests fails for various reasons and are currently disabled. We 
> plan to fix and re-enable this later.
>   * Semi-join optimization (HIVE-15227) is disabled by default as it doesn't 
> work with this patch. We plan to fix this and re-enable it by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15445) Subquery failing with ClassCastException

2016-12-16 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755167#comment-15755167
 ] 

Jesus Camacho Rodriguez commented on HIVE-15445:


Patch seems to cause regressions in constant folding; I will check for another 
way of fixing this issue.

> Subquery failing with ClassCastException
> 
>
> Key: HIVE-15445
> URL: https://issues.apache.org/jira/browse/HIVE-15445
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15445.patch
>
>
> To reproduce:
> {code:sql}
> CREATE TABLE table_7 (int_col INT);
> SELECT
> (t1.int_col) * (t1.int_col) AS int_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) t1
> WHERE
> (False) NOT IN (SELECT
> False AS boolean_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) tt1
> WHERE
> (t1.int_col) = (tt1.int_col));
> {code}
> The problem seems to be in the method that tries to resolve the subquery 
> column _MIN(NULL)_. It checks the column inspector and ends up returning a 
> constant expression instead of a column expression for _min(null)_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15446) Hive fails in recursive debug

2016-12-16 Thread Chaoyu Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-15446:
---
Status: Patch Available  (was: Open)

The old HADOOP_CLIENT_OPTS with main debug options should be removed from 
HADOOP_OPTS when a new HADOOP_CLIENT_OPTS with child debug options will be 
appended to HADOOP_OPTS when child process is invoked by HADOOP.sh.
[~ashutoshc] could review the patch? thanks

> Hive fails in recursive debug
> -
>
> Key: HIVE-15446
> URL: https://issues.apache.org/jira/browse/HIVE-15446
> Project: Hive
>  Issue Type: Bug
>  Components: Diagnosability
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>Priority: Minor
> Attachments: HIVE-15446.patch
>
>
> When running hive recursive debug mode, for example,
> ./bin/hive --debug:port=10008,childSuspend=y
> It fails with error msg:
> --
> ERROR: Cannot load this JVM TI agent twice, check your java command line for 
> duplicate jdwp options.Error occurred during initialization of VM
> agent library failed to init: jdwp
> --
> It is because HADOOP_OPTS and HADOOP_CLIENT_OPTS both have jvm debug options 
> when invoking HADOOP.sh for the child process. The HADOOP_CLIENT_OPTS is 
> appended to HADOOP_OPTS in HADOOP.sh which leads to the duplicated debug 
> options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15445) Subquery failing with ClassCastException

2016-12-16 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755139#comment-15755139
 ] 

Hive QA commented on HIVE-15445:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843616/HIVE-15445.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 247 failed/errored test(s), 10787 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=234)
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed 
out) (batchId=144)

[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,cbo_rp_subq_not_in.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join26] (batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join27] (batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnStatsUpdateForStatsOptimizer_2]
 (batchId=27)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[combine2] (batchId=6)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constGby] (batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constprog_partitioner] 
(batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer10] 
(batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer11] 
(batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer13] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer15] 
(batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer7] 
(batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer8] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer9] 
(batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_genericudaf] 
(batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_udf] (batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dynamic_rdd_cache] 
(batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[except_all] (batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explain_logical] 
(batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[fold_case] (batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[fold_eq_with_case_when] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby4_map] 
(batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby4_map_skew] 
(batchId=53)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_cube1] 
(batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_duplicate_key] 
(batchId=6)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_position] 
(batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_rollup1] 
(batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_11] 
(batchId=55)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_1_23] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_2] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_3] 
(batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_4] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_5] 
(batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_6] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_7] 
(batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_9] 
(batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_skew_1_23] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_test_1] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[infer_bucket_sort_grouping_operators]
 (batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input30] (batchId=27)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input31] (batchId=55)

[jira] [Updated] (HIVE-15446) Hive fails in recursive debug

2016-12-16 Thread Chaoyu Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-15446:
---
Attachment: HIVE-15446.patch

> Hive fails in recursive debug
> -
>
> Key: HIVE-15446
> URL: https://issues.apache.org/jira/browse/HIVE-15446
> Project: Hive
>  Issue Type: Bug
>  Components: Diagnosability
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>Priority: Minor
> Attachments: HIVE-15446.patch
>
>
> When running hive recursive debug mode, for example,
> ./bin/hive --debug:port=10008,childSuspend=y
> It fails with error msg:
> --
> ERROR: Cannot load this JVM TI agent twice, check your java command line for 
> duplicate jdwp options.Error occurred during initialization of VM
> agent library failed to init: jdwp
> --
> It is because HADOOP_OPTS and HADOOP_CLIENT_OPTS both have jvm debug options 
> when invoking HADOOP.sh for the child process. The HADOOP_CLIENT_OPTS is 
> appended to HADOOP_OPTS in HADOOP.sh which leads to the duplicated debug 
> options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15376) Improve heartbeater scheduling for transactions

2016-12-16 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755093#comment-15755093
 ] 

Eugene Koifman commented on HIVE-15376:
---

patch 8
why did you remove acquireLocksWithHeartbeatDelay()?
I seems that in the process you changed when heartbeat starts for RO query.  In 
particular it starts before the lock record is even created.  
It also looks like you cancelling logic is never reached when there is no lock 
due to "if (!atLeastOneLock) {"

can you explain your changes in TestDbTxnManager.testLockTimeout()   - this 
used to test lock expiration for locks outside of a txn (e.g. RO query).  You 
now added a txn so that it tests a txn timeout.

Case 4 you added in TestDbTxnManager.testHeartbeater().  Since it doesn't run 
the reaper process, what doe it test?

> Improve heartbeater scheduling for transactions
> ---
>
> Key: HIVE-15376
> URL: https://issues.apache.org/jira/browse/HIVE-15376
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-15376.1.patch, HIVE-15376.2.patch, 
> HIVE-15376.3.patch, HIVE-15376.4.patch, HIVE-15376.5.patch, 
> HIVE-15376.6.patch, HIVE-15376.7.patch, HIVE-15376.8.patch
>
>
> HIVE-12366 improved the heartbeater logic by bringing down the gap between 
> the lock acquisition and first heartbeat, but that's not enough, there may 
> still be some issue, e.g.
>  Time A: a transaction is opened
>  Time B: acquireLocks is called (blocking call), but it can take a long time 
> to actually acquire the locks and return if the system is busy
>  Time C: as acquireLocks returns, the first heartbeat is sent
> If hive.txn.timeout < C - A, then the transaction will be timed out and 
> aborted, thus causing failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15439) Support INSERT OVERWRITE for internal druid datasources.

2016-12-16 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755078#comment-15755078
 ] 

slim bouguerra commented on HIVE-15439:
---

[~ashutoshc] and [~jcamachorodriguez] can you please take look here.

> Support INSERT OVERWRITE for internal druid datasources.
> 
>
> Key: HIVE-15439
> URL: https://issues.apache.org/jira/browse/HIVE-15439
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15439.patch
>
>
> Add support for SQL statement INSERT OVERWRITE TABLE druid_internal_table.
> In order to add this support will need to add new post insert hook to update 
> the druid metadata. Creation of the segment will be the same as CTAS.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15439) Support INSERT OVERWRITE for internal druid datasources.

2016-12-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755072#comment-15755072
 ] 

ASF GitHub Bot commented on HIVE-15439:
---

GitHub user b-slim opened a pull request:

https://github.com/apache/hive/pull/124

[HIVE-15439] adding support for insert overwite

Add support for SQL statement INSERT OVERWRITE TABLE druid_internal_table.
In order to add this support will need to add new post insert hook to 
update the druid metadata. Creation of the segment will be the same as CTAS.
https://issues.apache.org/jira/browse/HIVE-15439

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/b-slim/hive HIVE-15439-adding-insert

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/124.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #124


commit 26b2667d490c6de71d691aa38322b0c1c7e6778e
Author: Slim Bouguerra 
Date:   2016-12-15T23:22:11Z

adding commit insert function def to MetaHook

commit 02f6a8935ab58eb663b6ccf34a931e0e68d4af14
Author: Slim Bouguerra 
Date:   2016-12-16T00:44:44Z

adding commit insert to the FileSkinPlan

commit 0e9b1f6e5eca815be6a9724188275f32e65fd40c
Author: Slim Bouguerra 
Date:   2016-12-16T17:47:29Z

add joda dependency for UTs and add copyrights




> Support INSERT OVERWRITE for internal druid datasources.
> 
>
> Key: HIVE-15439
> URL: https://issues.apache.org/jira/browse/HIVE-15439
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15439.patch
>
>
> Add support for SQL statement INSERT OVERWRITE TABLE druid_internal_table.
> In order to add this support will need to add new post insert hook to update 
> the druid metadata. Creation of the segment will be the same as CTAS.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15439) Support INSERT OVERWRITE for internal druid datasources.

2016-12-16 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-15439:
--
Attachment: HIVE-15439.patch

> Support INSERT OVERWRITE for internal druid datasources.
> 
>
> Key: HIVE-15439
> URL: https://issues.apache.org/jira/browse/HIVE-15439
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15439.patch
>
>
> Add support for SQL statement INSERT OVERWRITE TABLE druid_internal_table.
> In order to add this support will need to add new post insert hook to update 
> the druid metadata. Creation of the segment will be the same as CTAS.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15439) Support INSERT OVERWRITE for internal druid datasources.

2016-12-16 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-15439:
--
Status: Patch Available  (was: Open)

> Support INSERT OVERWRITE for internal druid datasources.
> 
>
> Key: HIVE-15439
> URL: https://issues.apache.org/jira/browse/HIVE-15439
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>
> Add support for SQL statement INSERT OVERWRITE TABLE druid_internal_table.
> In order to add this support will need to add new post insert hook to update 
> the druid metadata. Creation of the segment will be the same as CTAS.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13278) Avoid FileNotFoundException when map/reduce.xml is not available

2016-12-16 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-13278:

Attachment: HIVE-13278.5.patch

[~lirui] Yes I think this solution should work and is much more clean! Thanks 
for the suggestion. Attaching patch v5 to test.

> Avoid FileNotFoundException when map/reduce.xml is not available
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HIVE-13278.1.patch, HIVE-13278.2.patch, 
> HIVE-13278.3.patch, HIVE-13278.4.patch, HIVE-13278.5.patch
>
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it 
> as Minor currently.
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14688) Hive drop call fails in presence of TDE

2016-12-16 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755017#comment-15755017
 ] 

Eugene Koifman commented on HIVE-14688:
---

is there a test that covers this?

> Hive drop call fails in presence of TDE
> ---
>
> Key: HIVE-14688
> URL: https://issues.apache.org/jira/browse/HIVE-14688
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Deepesh Khandelwal
>Assignee: Wei Zheng
> Attachments: HIVE-14688.1.patch
>
>
> In Hadoop 2.8.0 TDE trash collection was fixed through HDFS-8831. This 
> enables us to make drop table calls for Hive managed tables where Hive 
> metastore warehouse directory is in encrypted zone. However even with the 
> feature in HDFS, Hive drop table currently fail:
> {noformat}
> $ hdfs crypto -listZones
> /apps/hive/warehouse  key2 
> $ hdfs dfs -ls /apps/hive/warehouse
> Found 1 items
> drwxrwxrwt   - hdfs hdfs  0 2016-09-01 02:54 
> /apps/hive/warehouse/.Trash
> hive> create table abc(a string, b int);
> OK
> Time taken: 5.538 seconds
> hive> dfs -ls /apps/hive/warehouse;
> Found 2 items
> drwxrwxrwt   - hdfs   hdfs  0 2016-09-01 02:54 
> /apps/hive/warehouse/.Trash
> drwxrwxrwx   - deepesh hdfs  0 2016-09-01 17:15 
> /apps/hive/warehouse/abc
> hive> drop table if exists abc;
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Unable to drop 
> default.abc because it is in an encryption zone and trash is enabled.  Use 
> PURGE option to skip trash.)
> {noformat}
> The problem lies here:
> {code:title=metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java}
> private void checkTrashPurgeCombination(Path pathToData, String objectName, 
> boolean ifPurge)
> ...
>   if (trashEnabled) {
> try {
>   HadoopShims.HdfsEncryptionShim shim =
> 
> ShimLoader.getHadoopShims().createHdfsEncryptionShim(FileSystem.get(hiveConf),
>  hiveConf);
>   if (shim.isPathEncrypted(pathToData)) {
> throw new MetaException("Unable to drop " + objectName + " 
> because it is in an encryption zone" +
>   " and trash is enabled.  Use PURGE option to skip trash.");
>   }
> } catch (IOException ex) {
>   MetaException e = new MetaException(ex.getMessage());
>   e.initCause(ex);
>   throw e;
> }
>   }
> {code}
> As we can see that we are making an assumption that delete wouldn't be 
> successful in encrypted zone. We need to modify this logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15122) Hive: Upcasting types should not obscure stats (min/max/ndv)

2016-12-16 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15122:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Fails are unrelated. Pushed to master, thanks for reviewing [~ashutoshc]!

> Hive: Upcasting types should not obscure stats (min/max/ndv)
> 
>
> Key: HIVE-15122
> URL: https://issues.apache.org/jira/browse/HIVE-15122
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Jesus Camacho Rodriguez
> Fix For: 2.2.0
>
> Attachments: HIVE-15122.03.patch, HIVE-15122.patch
>
>
> A UDFToLong breaks PK/FK inferences and triggers mis-estimation of joins in 
> LLAP.
> Snippet from the bad plan.
> {code}
> | STAGE PLANS:
>   
>|
> |   Stage: Stage-1
>   
>|
> | Tez 
>   
>|
> |   DagId: hive_20161031222730_a700058f-78eb-40d6-a67d-43add60a50e2:6 
>   
>|
> |   Edges:
>   
>|
> | Map 2 <- Map 1 (BROADCAST_EDGE) 
>   
>|
> | Map 3 <- Map 2 (BROADCAST_EDGE) 
>   
>|
> | Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE), Map 7 
> (CUSTOM_SIMPLE_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)  
> |
> | Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
>   
>|
> | Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
>   
>|
> |   DagName:  
>   
>|
> |   Vertices: 
>   
>|
> | Map 1   
>   
>|
> | Map Operator Tree:  
>   
>|
> | TableScan   
>   
>|
> |   alias: supplier   
>   
>|
> |   filterExpr: (s_suppkey is not null and s_nationkey is not 
> null) (type: boolean) 
>|
> |   Statistics: Num rows: 1000 Data size: 16000 Basic 
> stats: COMPLETE Column stats: COMPLETE
>|
> |   Filter Operator   
>   
>|
> | predicate: (s_suppkey is not null and s_nationkey is 
> not null) (type: boolean) 
>   |
> | Statistics: Num rows: 1000 Data size: 16000 
> Basic stats: COMPLETE Column stats: COMPLETE  
>|
> | Select Operator 
>   
>|
> |

[jira] [Comment Edited] (HIVE-15445) Subquery failing with ClassCastException

2016-12-16 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15754937#comment-15754937
 ] 

Jesus Camacho Rodriguez edited comment on HIVE-15445 at 12/16/16 4:57 PM:
--

It seems it was introduced in HIVE-9195. I think the method used to create a 
ColumnExprDesc from a ColumnInfo should not check the object inspectors for 
constants; there are other methods in Hive that take care of that.

I am submitting a patch that disables that check. If other methods take care of 
that indeed, then we should not see ptest failures...


was (Author: jcamachorodriguez):
It seems it was introduced in HIVE-9195. I think the method used to create a 
ColumnExprDesc from a ColumnInfo should not check the object inspectors for 
constants; there are other methods in Hive that take care of that.

I am submitting a patch that disables the check of the object inspectors when 
we are creating the ColumnExprDesc from the ColumnInfo. If other methods take 
care of that indeed, then we should not see ptest failures...

> Subquery failing with ClassCastException
> 
>
> Key: HIVE-15445
> URL: https://issues.apache.org/jira/browse/HIVE-15445
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15445.patch
>
>
> To reproduce:
> {code:sql}
> CREATE TABLE table_7 (int_col INT);
> SELECT
> (t1.int_col) * (t1.int_col) AS int_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) t1
> WHERE
> (False) NOT IN (SELECT
> False AS boolean_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) tt1
> WHERE
> (t1.int_col) = (tt1.int_col));
> {code}
> The problem seems to be in the method that tries to resolve the subquery 
> column _MIN(NULL)_. It checks the column inspector and ends up returning a 
> constant expression instead of a column expression for _min(null)_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15445) Subquery failing with ClassCastException

2016-12-16 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15445:
---
Description: 
To reproduce:

{code:sql}
CREATE TABLE table_7 (int_col INT);

SELECT
(t1.int_col) * (t1.int_col) AS int_col
FROM (
SELECT
MIN(NULL) OVER () AS int_col
FROM table_7
) t1
WHERE
(False) NOT IN (SELECT
False AS boolean_col
FROM (
SELECT
MIN(NULL) OVER () AS int_col
FROM table_7
) tt1
WHERE
(t1.int_col) = (tt1.int_col));
{code}

The problem seems to be in the method that tries to resolve the subquery column 
_MIN(NULL)_. It checks the column inspector and ends up returning a constant 
expression instead of a column expression for _min(null)_.

  was:
To reproduce:

{code:sql}
CREATE TABLE table_7 (int_col INT);

SELECT
(t1.int_col) * (t1.int_col) AS int_col
FROM (
SELECT
MIN(NULL) OVER () AS int_col
FROM table_7
) t1
WHERE
(False) NOT IN (SELECT
False AS boolean_col
FROM (
SELECT
MIN(NULL) OVER () AS int_col
FROM table_7
) tt1
WHERE
(t1.int_col) = (tt1.int_col));
{code}

The problem seems to be in the method that tries to resolve the subquery column 
_MIN(NULL)_. It checks the column inspector and ends up returning a constant 
descriptor instead of a column descriptor for _min(null)_.


> Subquery failing with ClassCastException
> 
>
> Key: HIVE-15445
> URL: https://issues.apache.org/jira/browse/HIVE-15445
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-15445.patch
>
>
> To reproduce:
> {code:sql}
> CREATE TABLE table_7 (int_col INT);
> SELECT
> (t1.int_col) * (t1.int_col) AS int_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) t1
> WHERE
> (False) NOT IN (SELECT
> False AS boolean_col
> FROM (
> SELECT
> MIN(NULL) OVER () AS int_col
> FROM table_7
> ) tt1
> WHERE
> (t1.int_col) = (tt1.int_col));
> {code}
> The problem seems to be in the method that tries to resolve the subquery 
> column _MIN(NULL)_. It checks the column inspector and ends up returning a 
> constant expression instead of a column expression for _min(null)_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 163 matches

Mail list logo