[jira] [Commented] (HIVE-15956) StackOverflowError when drop lots of partitions

2017-03-11 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906444#comment-15906444
 ] 

Zoltan Haindrich commented on HIVE-15956:
-

[~niklaus.xiao] I've tried your patch...and I've still seen the problem after 
applying the fix. Did it work for you? -  it might be possible that I've 
screwed something up - but after a clean rebuild it still failed

> StackOverflowError when drop lots of partitions
> ---
>
> Key: HIVE-15956
> URL: https://issues.apache.org/jira/browse/HIVE-15956
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.3.0, 2.2.0
>Reporter: Niklaus Xiao
>Assignee: Niklaus Xiao
> Attachments: HIVE-15956.patch
>
>
> Repro steps:
> 1. Create partitioned table and add 1 partitions
> {code}
> create table test_partition(id int) partitioned by (dt int);
> alter table test_partition add partition(dt=1);
> alter table test_partition add partition(dt=3);
> alter table test_partition add partition(dt=4);
> ...
> alter table test_partition add partition(dt=1);
> {code}
> 2. Drop 9000 partitions:
> {code}
> alter table test_partition drop partition(dt<9000);
> {code}
> Step 2 will fail with StackOverflowError:
> {code}
> Exception in thread "pool-7-thread-161" java.lang.StackOverflowError
> at 
> org.datanucleus.query.expression.ExpressionCompiler.isOperator(ExpressionCompiler.java:819)
> at 
> org.datanucleus.query.expression.ExpressionCompiler.compileOrAndExpression(ExpressionCompiler.java:190)
> at 
> org.datanucleus.query.expression.ExpressionCompiler.compileExpression(ExpressionCompiler.java:179)
> at 
> org.datanucleus.query.expression.ExpressionCompiler.compileOrAndExpression(ExpressionCompiler.java:192)
> at 
> org.datanucleus.query.expression.ExpressionCompiler.compileExpression(ExpressionCompiler.java:179)
> at 
> org.datanucleus.query.expression.ExpressionCompiler.compileOrAndExpression(ExpressionCompiler.java:192)
> at 
> org.datanucleus.query.expression.ExpressionCompiler.compileExpression(ExpressionCompiler.java:179)
> {code}
> {code}
> Exception in thread "pool-7-thread-198" java.lang.StackOverflowError
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:83)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15867) Add blobstore tests for import/export

2017-03-11 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906445#comment-15906445
 ] 

Sahil Takiar commented on HIVE-15867:
-

+1 (non-binding, still need a committer to take a look)

> Add blobstore tests for import/export
> -
>
> Key: HIVE-15867
> URL: https://issues.apache.org/jira/browse/HIVE-15867
> Project: Hive
>  Issue Type: Bug
>Reporter: Thomas Poepping
>Assignee: Juan Rodríguez Hortalá
> Attachments: HIVE-15867.patch
>
>
> This patch covers ten separate tests testing import and export operations 
> running against blobstore filesystems:
> * Import addpartition
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore
> ** blobstore -> hdfs
> * import/export
> ** blobstore -> file
> ** file -> blobstore
> ** blobstore -> blobstore (partitioned and non-partitioned)
> ** blobstore -> HDFS (partitioned and non-partitioned)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11019) Can't create an Avro table with uniontype column correctly

2017-03-11 Thread Nikita Goyal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906418#comment-15906418
 ] 

Nikita Goyal commented on HIVE-11019:
-

I have been facing the same issue. Can anyone look?

> Can't create an Avro table with uniontype column correctly
> --
>
> Key: HIVE-11019
> URL: https://issues.apache.org/jira/browse/HIVE-11019
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Bing Li
>Assignee: Bing Li
>
> I tried the example in 
> https://cwiki.apache.org/confluence/display/Hive/AvroSerDe
> And found that it can't create an AVRO table correctly with uniontype
> hive> create table avro_union(union1 uniontype)STORED 
> AS AVRO;
> OK
> Time taken: 0.083 seconds
> hive> describe avro_union;
> OK
> union1  uniontype  
>   
> Time taken: 0.058 seconds, Fetched: 1 row(s)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-03-11 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906367#comment-15906367
 ] 

Sergey Shelukhin commented on HIVE-15665:
-

Indexes are going to be read like regular streams... I was able to work on this 
a little bit more, I now have unfinished code for all 3 metadata cache 
constituents :)

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.WIP.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16133) Footer cache in Tez AM can take too much memory

2017-03-11 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16133:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the reviews!

> Footer cache in Tez AM can take too much memory
> ---
>
> Key: HIVE-16133
> URL: https://issues.apache.org/jira/browse/HIVE-16133
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-16133.01.patch, HIVE-16133.02.patch, 
> HIVE-16133.02.patch, HIVE-16133.03.patch, HIVE-16133.04.patch, 
> HIVE-16133.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16182) Semijoin: Avoid VectorHashKeyWrapper allocations for the bloom hash aggregate

2017-03-11 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906363#comment-15906363
 ] 

Sergey Shelukhin commented on HIVE-16182:
-

+1

> Semijoin: Avoid VectorHashKeyWrapper allocations for the bloom hash aggregate
> -
>
> Key: HIVE-16182
> URL: https://issues.apache.org/jira/browse/HIVE-16182
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
>  Labels: performance
> Attachments: HIVE-16182.1.patch
>
>
> To avoid GC spam during the hash aggregate part of the bloom filter, the key 
> for the semijoin can be special-cased as an immutable empty key.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15978) Support regr_* functions

2017-03-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906337#comment-15906337
 ] 

Hive QA commented on HIVE-15978:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12857491/HIVE-15978.1.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10420 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udaf_binarysetfunctions] 
(batchId=35)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4091/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4091/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4091/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12857491 - PreCommit-HIVE-Build

> Support regr_* functions
> 
>
> Key: HIVE-15978
> URL: https://issues.apache.org/jira/browse/HIVE-15978
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Zoltan Haindrich
> Attachments: HIVE-15978.1.patch
>
>
> Support the standard regr_* functions, regr_slope, regr_intercept, regr_r2, 
> regr_sxx, regr_syy, regr_sxy, regr_avgx, regr_avgy, regr_count. SQL reference 
> section 10.9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15978) Support regr_* functions

2017-03-11 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-15978:

Status: Patch Available  (was: Open)

> Support regr_* functions
> 
>
> Key: HIVE-15978
> URL: https://issues.apache.org/jira/browse/HIVE-15978
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Zoltan Haindrich
> Attachments: HIVE-15978.1.patch
>
>
> Support the standard regr_* functions, regr_slope, regr_intercept, regr_r2, 
> regr_sxx, regr_syy, regr_sxy, regr_avgx, regr_avgy, regr_count. SQL reference 
> section 10.9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15978) Support regr_* functions

2017-03-11 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-15978:

Attachment: HIVE-15978.1.patch

[~pxiong] now I see that I never had any chance to not declare these as 
aggregators :)
patch #1) I've retrofitted some existing aggregators to service the regr_ 
methods.


> Support regr_* functions
> 
>
> Key: HIVE-15978
> URL: https://issues.apache.org/jira/browse/HIVE-15978
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Zoltan Haindrich
> Attachments: HIVE-15978.1.patch
>
>
> Support the standard regr_* functions, regr_slope, regr_intercept, regr_r2, 
> regr_sxx, regr_syy, regr_sxy, regr_avgx, regr_avgy, regr_count. SQL reference 
> section 10.9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16182) Semijoin: Avoid VectorHashKeyWrapper allocations for the bloom hash aggregate

2017-03-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906307#comment-15906307
 ] 

Hive QA commented on HIVE-16182:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12857490/HIVE-16182.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 10339 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4090/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4090/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4090/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12857490 - PreCommit-HIVE-Build

> Semijoin: Avoid VectorHashKeyWrapper allocations for the bloom hash aggregate
> -
>
> Key: HIVE-16182
> URL: https://issues.apache.org/jira/browse/HIVE-16182
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
>  Labels: performance
> Attachments: HIVE-16182.1.patch
>
>
> To avoid GC spam during the hash aggregate part of the bloom filter, the key 
> for the semijoin can be special-cased as an immutable empty key.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16182) Semijoin: Avoid VectorHashKeyWrapper allocations for the bloom hash aggregate

2017-03-11 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-16182:
---
Status: Patch Available  (was: Open)

> Semijoin: Avoid VectorHashKeyWrapper allocations for the bloom hash aggregate
> -
>
> Key: HIVE-16182
> URL: https://issues.apache.org/jira/browse/HIVE-16182
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
>  Labels: performance
> Attachments: HIVE-16182.1.patch
>
>
> To avoid GC spam during the hash aggregate part of the bloom filter, the key 
> for the semijoin can be special-cased as an immutable empty key.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16182) Semijoin: Avoid VectorHashKeyWrapper allocations for the bloom hash aggregate

2017-03-11 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-16182:
---
Labels: performance  (was: )

> Semijoin: Avoid VectorHashKeyWrapper allocations for the bloom hash aggregate
> -
>
> Key: HIVE-16182
> URL: https://issues.apache.org/jira/browse/HIVE-16182
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
>  Labels: performance
> Attachments: HIVE-16182.1.patch
>
>
> To avoid GC spam during the hash aggregate part of the bloom filter, the key 
> for the semijoin can be special-cased as an immutable empty key.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16182) Semijoin: Avoid VectorHashKeyWrapper allocations for the bloom hash aggregate

2017-03-11 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-16182:
--

Assignee: Gopal V

> Semijoin: Avoid VectorHashKeyWrapper allocations for the bloom hash aggregate
> -
>
> Key: HIVE-16182
> URL: https://issues.apache.org/jira/browse/HIVE-16182
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-16182.1.patch
>
>
> To avoid GC spam during the hash aggregate part of the bloom filter, the key 
> for the semijoin can be special-cased as an immutable empty key.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16182) Semijoin: Avoid VectorHashKeyWrapper allocations for the bloom hash aggregate

2017-03-11 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-16182:
---
Attachment: HIVE-16182.1.patch

> Semijoin: Avoid VectorHashKeyWrapper allocations for the bloom hash aggregate
> -
>
> Key: HIVE-16182
> URL: https://issues.apache.org/jira/browse/HIVE-16182
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 2.2.0
>Reporter: Gopal V
> Attachments: HIVE-16182.1.patch
>
>
> To avoid GC spam during the hash aggregate part of the bloom filter, the key 
> for the semijoin can be special-cased as an immutable empty key.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16181) Make logic for hdfs directory location extraction more generic, in webhcat test driver

2017-03-11 Thread Aswathy Chellammal Sreekumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aswathy Chellammal Sreekumar updated HIVE-16181:

Attachment: HIVE-16181.1.patch

> Make logic for hdfs directory location extraction more generic, in webhcat 
> test driver
> --
>
> Key: HIVE-16181
> URL: https://issues.apache.org/jira/browse/HIVE-16181
> Project: Hive
>  Issue Type: Test
>  Components: WebHCat
>Reporter: Aswathy Chellammal Sreekumar
>Priority: Minor
> Attachments: HIVE-16181.1.patch
>
>
> Patch to make regular expression for directory location lookup in 
> setLocationPermGroup of TestDriverCurl more generic to accommodate patterns 
> without port number like hdfs://mycluster//hive/warehouse/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16132) DataSize stats don't seem correct in semijoin opt branch

2017-03-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906269#comment-15906269
 ] 

Hive QA commented on HIVE-16132:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12857484/HIVE-16132.6.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10339 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=153)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4089/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4089/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4089/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12857484 - PreCommit-HIVE-Build

> DataSize stats don't seem correct in semijoin opt branch
> 
>
> Key: HIVE-16132
> URL: https://issues.apache.org/jira/browse/HIVE-16132
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16132.1.patch, HIVE-16132.2.patch, 
> HIVE-16132.3.patch, HIVE-16132.4.patch, HIVE-16132.5.patch, HIVE-16132.6.patch
>
>
> For the following operator tree snippet, the second Select is the start of a 
> semijoin optimization branch. Take a look at the Data size - it is the same 
> as the data size for its parent Select, even though the second select has 
> only a single bigint column in its projection (the parent has 2 columns). I 
> would expect the size to be 533328 (16 bytes * 3).
> Fixing this estimate may become important if we need to estimate the cost of 
> generating the min/max/bloomfilter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16132) DataSize stats don't seem correct in semijoin opt branch

2017-03-11 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16132:
--
Attachment: HIVE-16132.6.patch

Needed a code refresh locally. Result files updated.

> DataSize stats don't seem correct in semijoin opt branch
> 
>
> Key: HIVE-16132
> URL: https://issues.apache.org/jira/browse/HIVE-16132
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16132.1.patch, HIVE-16132.2.patch, 
> HIVE-16132.3.patch, HIVE-16132.4.patch, HIVE-16132.5.patch, HIVE-16132.6.patch
>
>
> For the following operator tree snippet, the second Select is the start of a 
> semijoin optimization branch. Take a look at the Data size - it is the same 
> as the data size for its parent Select, even though the second select has 
> only a single bigint column in its projection (the parent has 2 columns). I 
> would expect the size to be 533328 (16 bytes * 3).
> Fixing this estimate may become important if we need to estimate the cost of 
> generating the min/max/bloomfilter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-8750) Commit initial encryption work

2017-03-11 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906150#comment-15906150
 ] 

Lefty Leverenz edited comment on HIVE-8750 at 3/11/17 10:24 AM:


The encryption branch was merged to trunk for release 1.1.0 (formerly known as 
0.15). See HIVE-9264.

So *hive.exec.stagingdir* and *hive.exec.copyfile.maxsize* need to be 
documented in the wiki.

* [Configuration Properties -- Query and DDL Execution | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]

Adding a TODOC15 label (for release 1.1.0).

Edit (11/Mar/17):  HIVE-14864 corrects the description of 
*hive.exec.copyfile.maxsize* in release 2.2.0 -- its value is in bytes, not 
megabytes.


was (Author: le...@hortonworks.com):
The encryption branch was merged to trunk for release 1.1.0 (formerly known as 
0.15). See HIVE-9264.

So *hive.exec.stagingdir* and *hive.exec.copyfile.maxsize* need to be 
documented in the wiki.

* [Configuration Properties -- Query and DDL Execution | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]

Adding a TODOC15 label (for release 1.1.0).

> Commit initial encryption work
> --
>
> Key: HIVE-8750
> URL: https://issues.apache.org/jira/browse/HIVE-8750
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Sergio Peña
>  Labels: TODOC15
> Fix For: encryption-branch, 1.1.0
>
> Attachments: HIVE-8750.1.patch
>
>
> I believe Sergio has some work done for encryption. In this item we'll commit 
> it to branch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14864) Distcp is not called from MoveTask when src is a directory

2017-03-11 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906158#comment-15906158
 ] 

Lefty Leverenz commented on HIVE-14864:
---

Doc note:  This adds *hive.exec.copyfile.maxnumfiles* to HiveConf.java and 
corrects the description of *hive.exec.copyfile.maxsize* (added in 1.1.0 by 
HIVE-8750 but not documented yet) so they need to be documented in the wiki.

* [Configuration Properties -- Query and DDL Execution | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]

Added a TODOC2.2 label.

> Distcp is not called from MoveTask when src is a directory
> --
>
> Key: HIVE-14864
> URL: https://issues.apache.org/jira/browse/HIVE-14864
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Sahil Takiar
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-14864.1.patch, HIVE-14864.2.patch, 
> HIVE-14864.3.patch, HIVE-14864.4.patch, HIVE-14864.patch
>
>
> In FileUtils.java the following code does not get executed even when src 
> directory size is greater than HIVE_EXEC_COPYFILE_MAXSIZE because 
> srcFS.getFileStatus(src).getLen() returns 0 when src is a directory. We 
> should use srcFS.getContentSummary(src).getLength() instead.
> {noformat}
> /* Run distcp if source file/dir is too big */
> if (srcFS.getUri().getScheme().equals("hdfs") &&
> srcFS.getFileStatus(src).getLen() > 
> conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE)) {
>   LOG.info("Source is " + srcFS.getFileStatus(src).getLen() + " bytes. 
> (MAX: " + conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE) + 
> ")");
>   LOG.info("Launch distributed copy (distcp) job.");
>   HiveConfUtil.updateJobCredentialProviders(conf);
>   copied = shims.runDistCp(src, dst, conf);
>   if (copied && deleteSource) {
> srcFS.delete(src, true);
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14864) Distcp is not called from MoveTask when src is a directory

2017-03-11 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-14864:
--
Labels: TODOC2.2  (was: )

> Distcp is not called from MoveTask when src is a directory
> --
>
> Key: HIVE-14864
> URL: https://issues.apache.org/jira/browse/HIVE-14864
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Sahil Takiar
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-14864.1.patch, HIVE-14864.2.patch, 
> HIVE-14864.3.patch, HIVE-14864.4.patch, HIVE-14864.patch
>
>
> In FileUtils.java the following code does not get executed even when src 
> directory size is greater than HIVE_EXEC_COPYFILE_MAXSIZE because 
> srcFS.getFileStatus(src).getLen() returns 0 when src is a directory. We 
> should use srcFS.getContentSummary(src).getLength() instead.
> {noformat}
> /* Run distcp if source file/dir is too big */
> if (srcFS.getUri().getScheme().equals("hdfs") &&
> srcFS.getFileStatus(src).getLen() > 
> conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE)) {
>   LOG.info("Source is " + srcFS.getFileStatus(src).getLen() + " bytes. 
> (MAX: " + conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE) + 
> ")");
>   LOG.info("Launch distributed copy (distcp) job.");
>   HiveConfUtil.updateJobCredentialProviders(conf);
>   copied = shims.runDistCp(src, dst, conf);
>   if (copied && deleteSource) {
> srcFS.delete(src, true);
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-8750) Commit initial encryption work

2017-03-11 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906150#comment-15906150
 ] 

Lefty Leverenz commented on HIVE-8750:
--

The encryption branch was merged to trunk for release 1.1.0 (formerly known as 
0.15). See HIVE-9264.

So *hive.exec.stagingdir* and *hive.exec.copyfile.maxsize* need to be 
documented in the wiki.

* [Configuration Properties -- Query and DDL Execution | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]

Adding a TODOC15 label (for release 1.1.0).

> Commit initial encryption work
> --
>
> Key: HIVE-8750
> URL: https://issues.apache.org/jira/browse/HIVE-8750
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Sergio Peña
>  Labels: TODOC15
> Fix For: encryption-branch, 1.1.0
>
> Attachments: HIVE-8750.1.patch
>
>
> I believe Sergio has some work done for encryption. In this item we'll commit 
> it to branch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-8750) Commit initial encryption work

2017-03-11 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-8750:
-
Labels: TODOC15  (was: )

> Commit initial encryption work
> --
>
> Key: HIVE-8750
> URL: https://issues.apache.org/jira/browse/HIVE-8750
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Sergio Peña
>  Labels: TODOC15
> Fix For: encryption-branch, 1.1.0
>
> Attachments: HIVE-8750.1.patch
>
>
> I believe Sergio has some work done for encryption. In this item we'll commit 
> it to branch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-8750) Commit initial encryption work

2017-03-11 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-8750:
-
Fix Version/s: 1.1.0

> Commit initial encryption work
> --
>
> Key: HIVE-8750
> URL: https://issues.apache.org/jira/browse/HIVE-8750
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Sergio Peña
> Fix For: encryption-branch, 1.1.0
>
> Attachments: HIVE-8750.1.patch
>
>
> I believe Sergio has some work done for encryption. In this item we'll commit 
> it to branch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16180) LLAP: Native memory leak in EncodedReader

2017-03-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906144#comment-15906144
 ] 

Hive QA commented on HIVE-16180:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12857456/HIVE-16180.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10339 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=141)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4088/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4088/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4088/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12857456 - PreCommit-HIVE-Build

> LLAP: Native memory leak in EncodedReader
> -
>
> Key: HIVE-16180
> URL: https://issues.apache.org/jira/browse/HIVE-16180
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Critical
> Attachments: DirectCleaner.java, FullGC-15GB-cleanup.png, 
> Full-gc-native-mem-cleanup.png, HIVE-16180.1.patch, HIVE-16180.2.patch, 
> Native-mem-spike.png
>
>
> Observed this in internal test run. There is a native memory leak in Orc 
> EncodedReaderImpl that can cause YARN pmem monitor to kill the container 
> running the daemon. Direct byte buffers are null'ed out which is not 
> guaranteed to be cleaned until next Full GC. To show this issue, attaching a 
> small test program that allocates 3x256MB direct byte buffers. First buffer 
> is null'ed out but still native memory is used. Second buffer user Cleaner to 
> clean up native allocation. Third buffer is also null'ed but this time 
> invoking a System.gc() which cleans up all native memory. Output from the 
> test program is below
> {code}
> Allocating 3x256MB direct memory..
> Native memory used: 786432000
> Native memory used after data1=null: 786432000
> Native memory used after data2.clean(): 524288000
> Native memory used after data3=null: 524288000
> Native memory used without gc: 524288000
> Native memory used after gc: 0
> {code}
> Longer term improvements/solutions:
> 1) Use DirectBufferPool from hadoop or netty's 
> https://netty.io/4.0/api/io/netty/buffer/PooledByteBufAllocator.html as 
> direct byte buffer allocations are expensive (System.gc() + 100ms thread 
> sleep).
> 2) Use HADOOP-12760 for proper cleaner invocation in JDK8 and JDK9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15981) Allow empty grouping sets

2017-03-11 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906141#comment-15906141
 ] 

Lefty Leverenz commented on HIVE-15981:
---

Should this behavioral change be documented in the wiki?

* [Group By | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy]

If so, please add a TODOC2.2 label.

> Allow empty grouping sets
> -
>
> Key: HIVE-15981
> URL: https://issues.apache.org/jira/browse/HIVE-15981
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Zoltan Haindrich
> Fix For: 2.2.0
>
> Attachments: HIVE-15981.1.patch, HIVE-15981.2.patch
>
>
> group by () should be treated as equivalent to no group by clause. Currently 
> it throws a parse error



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16132) DataSize stats don't seem correct in semijoin opt branch

2017-03-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906131#comment-15906131
 ] 

Hive QA commented on HIVE-16132:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12857449/HIVE-16132.5.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10339 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction]
 (batchId=148)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4087/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4087/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4087/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12857449 - PreCommit-HIVE-Build

> DataSize stats don't seem correct in semijoin opt branch
> 
>
> Key: HIVE-16132
> URL: https://issues.apache.org/jira/browse/HIVE-16132
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16132.1.patch, HIVE-16132.2.patch, 
> HIVE-16132.3.patch, HIVE-16132.4.patch, HIVE-16132.5.patch
>
>
> For the following operator tree snippet, the second Select is the start of a 
> semijoin optimization branch. Take a look at the Data size - it is the same 
> as the data size for its parent Select, even though the second select has 
> only a single bigint column in its projection (the parent has 2 columns). I 
> would expect the size to be 533328 (16 bytes * 3).
> Fixing this estimate may become important if we need to estimate the cost of 
> generating the min/max/bloomfilter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)