from:"Barnabas Maidics \(JIRA\)"

[jira] [Assigned] (HIVE-24196) Refactor getAcidState in AcidUtils to use HMS endpoint

2020-09-24 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-24196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics reassigned HIVE-24196:
---


> Refactor getAcidState in AcidUtils to use HMS endpoint
> --
>
> Key: HIVE-24196
> URL: https://issues.apache.org/jira/browse/HIVE-24196
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-23987) Upgrade arrow version to 0.11.0

2020-08-04 Thread Barnabas Maidics (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170807#comment-17170807
 ] 

Barnabas Maidics edited comment on HIVE-23987 at 8/4/20, 2:57 PM:
--

Hi [~ShubhamChaurasia],

In [HIVE-23034|https://issues.apache.org/jira/browse/HIVE-23034], you added 
BaseJdbcWithMiniLlap#testInvalidReferenceCountScenario test, which fails for me 
after upgrading Apache Arrow version to 0.11.0 (from 0.10.0). 

{code:java}
TestJdbcWithMiniLlapVectorArrowBatch>BaseJdbcWithMiniLlap.testInvalidReferenceCountScenario:408
 expected:<16384> but was:<26624>
{code}

How can it be that it changed after upgrading the dependency? 

Thanks for your help.


was (Author: b.maidics):
Hi [~ShubhamChaurasia],

In [HIVE-23034|https://issues.apache.org/jira/browse/HIVE-23034], you added 
BaseJdbcWithMiniLlap#testInvalidReferenceCountScenario test, which fails for me 
after upgrading Apache Arrow version to 0.11.0 (from 0.10.0). 

{code:java}
TestJdbcWithMiniLlapVectorArrowBatch>BaseJdbcWithMiniLlap.testInvalidReferenceCountScenario:408
 expected:<16384> but was:<26624>
{code}

I'm not sure where the number 16384 came from. How does the insert statement in 
the test generates 16384 rows, and how can it be that it changed after 
upgrading the dependency? 

Thanks for your help.

> Upgrade arrow version to 0.11.0
> ---
>
> Key: HIVE-23987
> URL: https://issues.apache.org/jira/browse/HIVE-23987
> Project: Hive
>  Issue Type: Improvement
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>
> As part of [HIVE-23890|https://issues.apache.org/jira/browse/HIVE-23890], 
> we're introducing flatbuffers as a dependency. 
> Arrow 0.10.0 has an unofficial flatbuffer dependency, which is incompatible 
> with the official ones: https://issues.apache.org/jira/browse/ARROW-3175
> It was fixed in 0.11.0. We should upgrade to that version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-23987) Upgrade arrow version to 0.11.0

2020-08-04 Thread Barnabas Maidics (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170807#comment-17170807
 ] 

Barnabas Maidics commented on HIVE-23987:
-

Hi [~ShubhamChaurasia],

In [HIVE-23034|https://issues.apache.org/jira/browse/HIVE-23034], you added 
BaseJdbcWithMiniLlap#testInvalidReferenceCountScenario test, which fails for me 
after upgrading Apache Arrow version to 0.11.0 (from 0.10.0). 

{code:java}
TestJdbcWithMiniLlapVectorArrowBatch>BaseJdbcWithMiniLlap.testInvalidReferenceCountScenario:408
 expected:<16384> but was:<26624>
{code}

I'm not sure where the number 16384 came from. How does the insert statement in 
the test generates 16384 rows, and how can it be that it changed after 
upgrading the dependency? 

Thanks for your help.

> Upgrade arrow version to 0.11.0
> ---
>
> Key: HIVE-23987
> URL: https://issues.apache.org/jira/browse/HIVE-23987
> Project: Hive
>  Issue Type: Improvement
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>
> As part of [HIVE-23890|https://issues.apache.org/jira/browse/HIVE-23890], 
> we're introducing flatbuffers as a dependency. 
> Arrow 0.10.0 has an unofficial flatbuffer dependency, which is incompatible 
> with the official ones: https://issues.apache.org/jira/browse/ARROW-3175
> It was fixed in 0.11.0. We should upgrade to that version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (HIVE-23987) Upgrade arrow version to 0.11.0

2020-08-04 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23987 started by Barnabas Maidics.
---
> Upgrade arrow version to 0.11.0
> ---
>
> Key: HIVE-23987
> URL: https://issues.apache.org/jira/browse/HIVE-23987
> Project: Hive
>  Issue Type: Improvement
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>
> As part of [HIVE-23890|https://issues.apache.org/jira/browse/HIVE-23890], 
> we're introducing flatbuffers as a dependency. 
> Arrow 0.10.0 has an unofficial flatbuffer dependency, which is incompatible 
> with the official ones: https://issues.apache.org/jira/browse/ARROW-3175
> It was fixed in 0.11.0. We should upgrade to that version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23987) Upgrade arrow version to 0.11.0

2020-08-04 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics reassigned HIVE-23987:
---


> Upgrade arrow version to 0.11.0
> ---
>
> Key: HIVE-23987
> URL: https://issues.apache.org/jira/browse/HIVE-23987
> Project: Hive
>  Issue Type: Improvement
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>
> As part of [HIVE-23890|https://issues.apache.org/jira/browse/HIVE-23890], 
> we're introducing flatbuffers as a dependency. 
> Arrow 0.10.0 has an unofficial flatbuffer dependency, which is incompatible 
> with the official ones: https://issues.apache.org/jira/browse/ARROW-3175
> It was fixed in 0.11.0. We should upgrade to that version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization

2020-07-23 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-23890:

Description: 
New thrift objects would be:


{code:java}
struct GetFileListRequest {
1: optional string catName,
2: required string dbName,
3: required string tableName,
4: required list partVals,
6: optional string validWriteIdList
}

struct GetFileListResponse {
1: required binary fileListData
}
{code}


Where GetFileListResponse contains a binary field, which would be a FlatBuffer 
object

> Create HMS endpoint for querying file lists using FlatBuffers as serialization
> --
>
> Key: HIVE-23890
> URL: https://issues.apache.org/jira/browse/HIVE-23890
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>
> New thrift objects would be:
> {code:java}
> struct GetFileListRequest {
> 1: optional string catName,
> 2: required string dbName,
> 3: required string tableName,
> 4: required list partVals,
> 6: optional string validWriteIdList
> }
> struct GetFileListResponse {
> 1: required binary fileListData
> }
> {code}
> Where GetFileListResponse contains a binary field, which would be a 
> FlatBuffer object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization

2020-07-21 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics reassigned HIVE-23890:
---


> Create HMS endpoint for querying file lists using FlatBuffers as serialization
> --
>
> Key: HIVE-23890
> URL: https://issues.apache.org/jira/browse/HIVE-23890
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23849) Hive skips the creation of ColumnAccessInfo when creating a view

2020-07-16 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-23849:

Labels:   (was: pull-request-available)

> Hive skips the creation of ColumnAccessInfo when creating a view
> 
>
> Key: HIVE-23849
> URL: https://issues.apache.org/jira/browse/HIVE-23849
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When creating a view, Hive skips the creation of ColumnAccessInfo that should 
> be created at [step 8|#L12601]. This causes Authorization error. 
> Currently, this issue is "hidden" when CBO is enabled. By introducing 
> HIVE-14496, CalcitePlanner creates this ColumnAccessInfo at [step 
> 2|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12459].
>  But after turning off CBO, the issue is still there. 
> I think the return statement in [step 
> 5|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12574]
>  is not necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23849) Hive skips the creation of ColumnAccessInfo when creating a view

2020-07-15 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-23849:

Description: 
When creating a view, Hive skips the creation of ColumnAccessInfo that should 
be created at [step 8|#L12601]]. This causes Authorization error. 

Currently, this issue is "hidden" when CBO is enabled. By introducing 
HIVE-14496, CalcitePlanner creates this ColumnAccessInfo at [step 
2|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12459].
 But after turning off CBO, the issue is still there. 

I think the return statement in [step 
5|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12574]]
 is not necessary.

  was:
When creating a view, Hive skips the creation of ColumnAccessInfo that should 
be created at [step 8|#L12601]]. This causes Authorization error. 

Currently, this issue is "hidden" when CBO is enabled. By introducing 
HIVE-14496, CalcitePlanner creates this ColumnAccessInfo at [step 
2|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12459].
 But after turning off CBO, the issue is still there. 

I think the return statement in [[step 
5|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12574]|#L12574]]
 is not necessary.


> Hive skips the creation of ColumnAccessInfo when creating a view
> 
>
> Key: HIVE-23849
> URL: https://issues.apache.org/jira/browse/HIVE-23849
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>
> When creating a view, Hive skips the creation of ColumnAccessInfo that should 
> be created at [step 8|#L12601]]. This causes Authorization error. 
> Currently, this issue is "hidden" when CBO is enabled. By introducing 
> HIVE-14496, CalcitePlanner creates this ColumnAccessInfo at [step 
> 2|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12459].
>  But after turning off CBO, the issue is still there. 
> I think the return statement in [step 
> 5|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12574]]
>  is not necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23849) Hive skips the creation of ColumnAccessInfo when creating a view

2020-07-15 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-23849:

Description: 
When creating a view, Hive skips the creation of ColumnAccessInfo that should 
be created at [step 8|#L12601]]. This causes Authorization error. 

Currently, this issue is "hidden" when CBO is enabled. By introducing 
HIVE-14496, CalcitePlanner creates this ColumnAccessInfo at [step 
2|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12459].
 But after turning off CBO, the issue is still there. 

I think the return statement in [[step 
5|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12574]|#L12574]]
 is not necessary.

  was:
When creating a view, Hive skips the creation of ColumnAccessInfo that should 
be created at [step 
8|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12601]
 This causes Authorization error. 

Currently, this issue is "hidden" when CBO is enabled. By introducing 
HIVE-14496, CalcitePlanner creates this ColumnAccessInfo at [step 2|#L12460]]. 
But after turning off CBO, the issue is still there. 

I think the return statement in [step 5|#L12574]] is not necessary.


> Hive skips the creation of ColumnAccessInfo when creating a view
> 
>
> Key: HIVE-23849
> URL: https://issues.apache.org/jira/browse/HIVE-23849
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>
> When creating a view, Hive skips the creation of ColumnAccessInfo that should 
> be created at [step 8|#L12601]]. This causes Authorization error. 
> Currently, this issue is "hidden" when CBO is enabled. By introducing 
> HIVE-14496, CalcitePlanner creates this ColumnAccessInfo at [step 
> 2|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12459].
>  But after turning off CBO, the issue is still there. 
> I think the return statement in [[step 
> 5|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12574]|#L12574]]
>  is not necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23849) Hive skips the creation of ColumnAccessInfo when creating a view

2020-07-15 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-23849:

Description: 
When creating a view, Hive skips the creation of ColumnAccessInfo that should 
be created at [step 8|#L12601]. This causes Authorization error. 

Currently, this issue is "hidden" when CBO is enabled. By introducing 
HIVE-14496, CalcitePlanner creates this ColumnAccessInfo at [step 
2|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12459].
 But after turning off CBO, the issue is still there. 

I think the return statement in [step 
5|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12574]]
 is not necessary.

  was:
When creating a view, Hive skips the creation of ColumnAccessInfo that should 
be created at [step 8|#L12601]]. This causes Authorization error. 

Currently, this issue is "hidden" when CBO is enabled. By introducing 
HIVE-14496, CalcitePlanner creates this ColumnAccessInfo at [step 
2|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12459].
 But after turning off CBO, the issue is still there. 

I think the return statement in [step 
5|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12574]]
 is not necessary.


> Hive skips the creation of ColumnAccessInfo when creating a view
> 
>
> Key: HIVE-23849
> URL: https://issues.apache.org/jira/browse/HIVE-23849
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>
> When creating a view, Hive skips the creation of ColumnAccessInfo that should 
> be created at [step 8|#L12601]. This causes Authorization error. 
> Currently, this issue is "hidden" when CBO is enabled. By introducing 
> HIVE-14496, CalcitePlanner creates this ColumnAccessInfo at [step 
> 2|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12459].
>  But after turning off CBO, the issue is still there. 
> I think the return statement in [step 
> 5|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12574]]
>  is not necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23849) Hive skips the creation of ColumnAccessInfo when creating a view

2020-07-15 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-23849:

Description: 
When creating a view, Hive skips the creation of ColumnAccessInfo that should 
be created at [step 8|#L12601]. This causes Authorization error. 

Currently, this issue is "hidden" when CBO is enabled. By introducing 
HIVE-14496, CalcitePlanner creates this ColumnAccessInfo at [step 
2|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12459].
 But after turning off CBO, the issue is still there. 

I think the return statement in [step 
5|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12574]
 is not necessary.

  was:
When creating a view, Hive skips the creation of ColumnAccessInfo that should 
be created at [step 8|#L12601]. This causes Authorization error. 

Currently, this issue is "hidden" when CBO is enabled. By introducing 
HIVE-14496, CalcitePlanner creates this ColumnAccessInfo at [step 
2|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12459].
 But after turning off CBO, the issue is still there. 

I think the return statement in [step 
5|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12574]]
 is not necessary.


> Hive skips the creation of ColumnAccessInfo when creating a view
> 
>
> Key: HIVE-23849
> URL: https://issues.apache.org/jira/browse/HIVE-23849
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>
> When creating a view, Hive skips the creation of ColumnAccessInfo that should 
> be created at [step 8|#L12601]. This causes Authorization error. 
> Currently, this issue is "hidden" when CBO is enabled. By introducing 
> HIVE-14496, CalcitePlanner creates this ColumnAccessInfo at [step 
> 2|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12459].
>  But after turning off CBO, the issue is still there. 
> I think the return statement in [step 
> 5|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12574]
>  is not necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23849) Hive skips the creation of ColumnAccessInfo when creating a view

2020-07-15 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-23849:

Description: 
When creating a view, Hive skips the creation of ColumnAccessInfo that should 
be created at [step 
8|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12601]
 This causes Authorization error. 

Currently, this issue is "hidden" when CBO is enabled. By introducing 
HIVE-14496, CalcitePlanner creates this ColumnAccessInfo at [step 2|#L12460]]. 
But after turning off CBO, the issue is still there. 

I think the return statement in [step 5|#L12574]] is not necessary.

  was:
When creating a view, Hive skips the creation of ColumnAccessInfo that should 
be created at [step 
8|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12601]].
 This causes Authorization error. 

Currently, this issue is "hidden" when CBO is enabled. By introducing 
[HIVE-14496|https://issues.apache.org/jira/browse/HIVE-14496], CalcitePlanner 
creates this ColumnAccessInfo at [step 
2|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12460]].
 But after turning off CBO, the issue is still there. 

I think the return statement in [step 
5|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12574]]
 is not necessary.


> Hive skips the creation of ColumnAccessInfo when creating a view
> 
>
> Key: HIVE-23849
> URL: https://issues.apache.org/jira/browse/HIVE-23849
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>
> When creating a view, Hive skips the creation of ColumnAccessInfo that should 
> be created at [step 
> 8|https://github.com/apache/hive/blob/11e069b277fd1a18899b8ca1d2926fcbe73f17f2/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12601]
>  This causes Authorization error. 
> Currently, this issue is "hidden" when CBO is enabled. By introducing 
> HIVE-14496, CalcitePlanner creates this ColumnAccessInfo at [step 
> 2|#L12460]]. But after turning off CBO, the issue is still there. 
> I think the return statement in [step 5|#L12574]] is not necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23849) Hive skips the creation of ColumnAccessInfo when creating a view

2020-07-15 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics reassigned HIVE-23849:
---


> Hive skips the creation of ColumnAccessInfo when creating a view
> 
>
> Key: HIVE-23849
> URL: https://issues.apache.org/jira/browse/HIVE-23849
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>
> When creating a view, Hive skips the creation of ColumnAccessInfo that should 
> be created at [step 
> 8|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12601]].
>  This causes Authorization error. 
> Currently, this issue is "hidden" when CBO is enabled. By introducing 
> [HIVE-14496|https://issues.apache.org/jira/browse/HIVE-14496], CalcitePlanner 
> creates this ColumnAccessInfo at [step 
> 2|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12460]].
>  But after turning off CBO, the issue is still there. 
> I think the return statement in [step 
> 5|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L12574]]
>  is not necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23774) Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java

2020-06-29 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics reassigned HIVE-23774:
---


> Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java
> 
>
> Key: HIVE-23774
> URL: https://issues.apache.org/jira/browse/HIVE-23774
> Project: Hive
>  Issue Type: Improvement
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Trivial
>
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1589]
> This log is not needed at INFO log level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (HIVE-23774) Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java

2020-06-29 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23774 started by Barnabas Maidics.
---
> Reduce log level at aggrColStatsForPartitions in MetaStoreDirectSql.java
> 
>
> Key: HIVE-23774
> URL: https://issues.apache.org/jira/browse/HIVE-23774
> Project: Hive
>  Issue Type: Improvement
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Trivial
>
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1589]
> This log is not needed at INFO log level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23738) DBLockManager::lock() : Move lock request to debug level

2020-06-23 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics reassigned HIVE-23738:
---

Assignee: Barnabas Maidics

> DBLockManager::lock() : Move lock request to debug level
> 
>
> Key: HIVE-23738
> URL: https://issues.apache.org/jira/browse/HIVE-23738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Barnabas Maidics
>Priority: Trivial
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java#L102]
>  
> For Q78 @30TB scale, it ends up dumping couple of MBs of log in info level to 
> print the lock request type. If possible, this should be moved to debug level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (HIVE-23738) DBLockManager::lock() : Move lock request to debug level

2020-06-23 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23738 started by Barnabas Maidics.
---
> DBLockManager::lock() : Move lock request to debug level
> 
>
> Key: HIVE-23738
> URL: https://issues.apache.org/jira/browse/HIVE-23738
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Barnabas Maidics
>Priority: Trivial
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbLockManager.java#L102]
>  
> For Q78 @30TB scale, it ends up dumping couple of MBs of log in info level to 
> print the lock request type. If possible, this should be moved to debug level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23211) Fix metastore schema differences between init scripts, and upgrade scripts

2020-04-20 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-23211:

Status: Open  (was: Patch Available)

> Fix metastore schema differences between init scripts, and upgrade scripts
> --
>
> Key: HIVE-23211
> URL: https://issues.apache.org/jira/browse/HIVE-23211
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-23211.1.patch, HIVE-23211.2.patch, 
> HIVE-23211.3.patch
>
>
> There are some differences (character encoding, defaults etc..) in metastore 
> schema if we initialize using the init scripts, or upgrade using the upgrade 
> scripts. The schema should be identical.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23211) Fix metastore schema differences between init scripts, and upgrade scripts

2020-04-20 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-23211:

Attachment: HIVE-23211.3.patch
Status: Patch Available  (was: Open)

> Fix metastore schema differences between init scripts, and upgrade scripts
> --
>
> Key: HIVE-23211
> URL: https://issues.apache.org/jira/browse/HIVE-23211
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-23211.1.patch, HIVE-23211.2.patch, 
> HIVE-23211.3.patch
>
>
> There are some differences (character encoding, defaults etc..) in metastore 
> schema if we initialize using the init scripts, or upgrade using the upgrade 
> scripts. The schema should be identical.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23211) Fix metastore schema differences between init scripts, and upgrade scripts

2020-04-17 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-23211:

Status: Open  (was: Patch Available)

> Fix metastore schema differences between init scripts, and upgrade scripts
> --
>
> Key: HIVE-23211
> URL: https://issues.apache.org/jira/browse/HIVE-23211
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-23211.1.patch, HIVE-23211.2.patch
>
>
> There are some differences (character encoding, defaults etc..) in metastore 
> schema if we initialize using the init scripts, or upgrade using the upgrade 
> scripts. The schema should be identical.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23211) Fix metastore schema differences between init scripts, and upgrade scripts

2020-04-17 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-23211:

Attachment: HIVE-23211.2.patch
Status: Patch Available  (was: Open)

> Fix metastore schema differences between init scripts, and upgrade scripts
> --
>
> Key: HIVE-23211
> URL: https://issues.apache.org/jira/browse/HIVE-23211
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-23211.1.patch, HIVE-23211.2.patch
>
>
> There are some differences (character encoding, defaults etc..) in metastore 
> schema if we initialize using the init scripts, or upgrade using the upgrade 
> scripts. The schema should be identical.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23211) Fix metastore schema differences between init scripts, and upgrade scripts

2020-04-16 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-23211:

Attachment: HIVE-23211.1.patch
Status: Patch Available  (was: In Progress)

> Fix metastore schema differences between init scripts, and upgrade scripts
> --
>
> Key: HIVE-23211
> URL: https://issues.apache.org/jira/browse/HIVE-23211
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-23211.1.patch
>
>
> There are some differences (character encoding, defaults etc..) in metastore 
> schema if we initialize using the init scripts, or upgrade using the upgrade 
> scripts. The schema should be identical.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23211) Fix metastore schema differences between init scripts, and upgrade scripts

2020-04-16 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-23211:

Attachment: (was: HIVE-23211.1.patch)

> Fix metastore schema differences between init scripts, and upgrade scripts
> --
>
> Key: HIVE-23211
> URL: https://issues.apache.org/jira/browse/HIVE-23211
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-23211.1.patch
>
>
> There are some differences (character encoding, defaults etc..) in metastore 
> schema if we initialize using the init scripts, or upgrade using the upgrade 
> scripts. The schema should be identical.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23211) Fix metastore schema differences between init scripts, and upgrade scripts

2020-04-16 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-23211:

Attachment: HIVE-23211.1.patch

> Fix metastore schema differences between init scripts, and upgrade scripts
> --
>
> Key: HIVE-23211
> URL: https://issues.apache.org/jira/browse/HIVE-23211
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-23211.1.patch
>
>
> There are some differences (character encoding, defaults etc..) in metastore 
> schema if we initialize using the init scripts, or upgrade using the upgrade 
> scripts. The schema should be identical.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23211) Fix metastore schema differences between init scripts, and upgrade scripts

2020-04-15 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics reassigned HIVE-23211:
---


> Fix metastore schema differences between init scripts, and upgrade scripts
> --
>
> Key: HIVE-23211
> URL: https://issues.apache.org/jira/browse/HIVE-23211
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>
> There are some differences (character encoding, defaults etc..) in metastore 
> schema if we initialize using the init scripts, or upgrade using the upgrade 
> scripts. The schema should be identical.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (HIVE-23211) Fix metastore schema differences between init scripts, and upgrade scripts

2020-04-15 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23211 started by Barnabas Maidics.
---
> Fix metastore schema differences between init scripts, and upgrade scripts
> --
>
> Key: HIVE-23211
> URL: https://issues.apache.org/jira/browse/HIVE-23211
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>
> There are some differences (character encoding, defaults etc..) in metastore 
> schema if we initialize using the init scripts, or upgrade using the upgrade 
> scripts. The schema should be identical.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-22976) Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 constraint

2020-03-08 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22976:

Attachment: HIVE-22976.4.patch
Status: Patch Available  (was: Open)

> Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 
> constraint
> --
>
> Key: HIVE-22976
> URL: https://issues.apache.org/jira/browse/HIVE-22976
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Minor
> Attachments: HIVE-22976.1.patch, HIVE-22976.2.patch, 
> HIVE-22976.3.patch, HIVE-22976.4.patch
>
>
> The schema init script (>=hive-schema-3.0.0) contains a constraint addition, 
> which is missing from the upgrade scripts in oracle and mssql. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-22976) Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 constraint

2020-03-08 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22976:

Status: Open  (was: Patch Available)

> Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 
> constraint
> --
>
> Key: HIVE-22976
> URL: https://issues.apache.org/jira/browse/HIVE-22976
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Minor
> Attachments: HIVE-22976.1.patch, HIVE-22976.2.patch, 
> HIVE-22976.3.patch, HIVE-22976.4.patch
>
>
> The schema init script (>=hive-schema-3.0.0) contains a constraint addition, 
> which is missing from the upgrade scripts in oracle and mssql. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-22976) Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 constraint

2020-03-07 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22976:

Attachment: HIVE-22976.3.patch
Status: Patch Available  (was: Open)

> Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 
> constraint
> --
>
> Key: HIVE-22976
> URL: https://issues.apache.org/jira/browse/HIVE-22976
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Minor
> Attachments: HIVE-22976.1.patch, HIVE-22976.2.patch, 
> HIVE-22976.3.patch
>
>
> The schema init script (>=hive-schema-3.0.0) contains a constraint addition, 
> which is missing from the upgrade scripts in oracle and mssql. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-22976) Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 constraint

2020-03-07 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22976:

Status: Open  (was: Patch Available)

> Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 
> constraint
> --
>
> Key: HIVE-22976
> URL: https://issues.apache.org/jira/browse/HIVE-22976
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Minor
> Attachments: HIVE-22976.1.patch, HIVE-22976.2.patch
>
>
> The schema init script (>=hive-schema-3.0.0) contains a constraint addition, 
> which is missing from the upgrade scripts in oracle and mssql. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-22976) Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 constraint

2020-03-06 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22976:

Status: Open  (was: Patch Available)

> Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 
> constraint
> --
>
> Key: HIVE-22976
> URL: https://issues.apache.org/jira/browse/HIVE-22976
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Minor
> Attachments: HIVE-22976.1.patch, HIVE-22976.2.patch
>
>
> The schema init script (>=hive-schema-3.0.0) contains a constraint addition, 
> which is missing from the upgrade scripts in oracle and mssql. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-22976) Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 constraint

2020-03-06 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22976:

Attachment: HIVE-22976.2.patch
Status: Patch Available  (was: Open)

> Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 
> constraint
> --
>
> Key: HIVE-22976
> URL: https://issues.apache.org/jira/browse/HIVE-22976
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Minor
> Attachments: HIVE-22976.1.patch, HIVE-22976.2.patch
>
>
> The schema init script (>=hive-schema-3.0.0) contains a constraint addition, 
> which is missing from the upgrade scripts in oracle and mssql. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-22976) Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 constraint

2020-03-05 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22976:

Attachment: HIVE-22976.1.patch
Status: Patch Available  (was: In Progress)

> Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 
> constraint
> --
>
> Key: HIVE-22976
> URL: https://issues.apache.org/jira/browse/HIVE-22976
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Minor
> Attachments: HIVE-22976.1.patch
>
>
> The schema init script (>=hive-schema-3.0.0) contains a constraint addition, 
> which is missing from the upgrade scripts in oracle and mssql. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-22976) Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 constraint

2020-03-05 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22976:

Attachment: (was: HIVE-22976.1.patch)

> Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 
> constraint
> --
>
> Key: HIVE-22976
> URL: https://issues.apache.org/jira/browse/HIVE-22976
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Minor
> Attachments: HIVE-22976.1.patch
>
>
> The schema init script (>=hive-schema-3.0.0) contains a constraint addition, 
> which is missing from the upgrade scripts in oracle and mssql. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (HIVE-22976) Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 constraint

2020-03-05 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-22976 started by Barnabas Maidics.
---
> Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 
> constraint
> --
>
> Key: HIVE-22976
> URL: https://issues.apache.org/jira/browse/HIVE-22976
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Minor
> Attachments: HIVE-22976.1.patch
>
>
> The schema init script (>=hive-schema-3.0.0) contains a constraint addition, 
> which is missing from the upgrade scripts in oracle and mssql. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-22976) Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 constraint

2020-03-05 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22976:

Attachment: HIVE-22976.1.patch

> Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 
> constraint
> --
>
> Key: HIVE-22976
> URL: https://issues.apache.org/jira/browse/HIVE-22976
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Minor
> Attachments: HIVE-22976.1.patch
>
>
> The schema init script (>=hive-schema-3.0.0) contains a constraint addition, 
> which is missing from the upgrade scripts in oracle and mssql. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-22976) Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 constraint

2020-03-04 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22976:

Description: The schema init script (>=hive-schema-3.0.0) contains a 
constraint addition, which is missing from the upgrade scripts in oracle and 
mssql.   (was: The schema init script (hive-schema-3.1.3000) contains a 
constraint addition, which is missing from the upgrade scripts in oracle and 
mssql. )

> Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 
> constraint
> --
>
> Key: HIVE-22976
> URL: https://issues.apache.org/jira/browse/HIVE-22976
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Minor
>
> The schema init script (>=hive-schema-3.0.0) contains a constraint addition, 
> which is missing from the upgrade scripts in oracle and mssql. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-22976) Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 constraint

2020-03-04 Thread Barnabas Maidics (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics reassigned HIVE-22976:
---


> Oracle and MSSQL upgrade script missing the addition of WM_RESOURCEPLAN_FK1 
> constraint
> --
>
> Key: HIVE-22976
> URL: https://issues.apache.org/jira/browse/HIVE-22976
> Project: Hive
>  Issue Type: Bug
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Minor
>
> The schema init script (hive-schema-3.1.3000) contains a constraint addition, 
> which is missing from the upgrade scripts in oracle and mssql. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-22037) HS2 should log when shutting down due to OOM

2019-07-25 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-22037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22037:

Attachment: HIVE-22037.3.patch
Status: Patch Available  (was: Open)

> HS2 should log when shutting down due to OOM
> 
>
> Key: HIVE-22037
> URL: https://issues.apache.org/jira/browse/HIVE-22037
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-22037.2.patch, HIVE-22037.3.patch, HIVE-22037.patch
>
>
> Currently, if HS2 runs into OOM issue, ThreadPoolExecutorWithOomHook kicks in 
> and runs oomHook, which will stop HS2. Everything happens without logging. In 
> the log, you can only see, that HS2 stopped. 
> We should log the stacktrace. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (HIVE-22037) HS2 should log when shutting down due to OOM

2019-07-25 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-22037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22037:

Status: Open  (was: Patch Available)

> HS2 should log when shutting down due to OOM
> 
>
> Key: HIVE-22037
> URL: https://issues.apache.org/jira/browse/HIVE-22037
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-22037.2.patch, HIVE-22037.3.patch, HIVE-22037.patch
>
>
> Currently, if HS2 runs into OOM issue, ThreadPoolExecutorWithOomHook kicks in 
> and runs oomHook, which will stop HS2. Everything happens without logging. In 
> the log, you can only see, that HS2 stopped. 
> We should log the stacktrace. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (HIVE-22037) HS2 should log when shutting down due to OOM

2019-07-24 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-22037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22037:

Attachment: HIVE-22037.2.patch
Status: Patch Available  (was: Open)

> HS2 should log when shutting down due to OOM
> 
>
> Key: HIVE-22037
> URL: https://issues.apache.org/jira/browse/HIVE-22037
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-22037.2.patch, HIVE-22037.patch
>
>
> Currently, if HS2 runs into OOM issue, ThreadPoolExecutorWithOomHook kicks in 
> and runs oomHook, which will stop HS2. Everything happens without logging. In 
> the log, you can only see, that HS2 stopped. 
> We should log the stacktrace. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (HIVE-22037) HS2 should log when shutting down due to OOM

2019-07-24 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-22037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22037:

Status: Open  (was: Patch Available)

> HS2 should log when shutting down due to OOM
> 
>
> Key: HIVE-22037
> URL: https://issues.apache.org/jira/browse/HIVE-22037
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-22037.patch
>
>
> Currently, if HS2 runs into OOM issue, ThreadPoolExecutorWithOomHook kicks in 
> and runs oomHook, which will stop HS2. Everything happens without logging. In 
> the log, you can only see, that HS2 stopped. 
> We should log the stacktrace. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (HIVE-22037) HS2 should log when shutting down due to OOM

2019-07-24 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-22037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22037:

Attachment: HIVE-22037.patch
Status: Patch Available  (was: In Progress)

> HS2 should log when shutting down due to OOM
> 
>
> Key: HIVE-22037
> URL: https://issues.apache.org/jira/browse/HIVE-22037
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-22037.patch
>
>
> Currently, if HS2 runs into OOM issue, ThreadPoolExecutorWithOomHook kicks in 
> and runs oomHook, which will stop HS2. Everything happens without logging. In 
> the log, you can only see, that HS2 stopped. 
> We should log the stacktrace. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (HIVE-22037) HS2 should log when shutting down due to OOM

2019-07-24 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-22037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22037:

Attachment: (was: HIVE-22037.1.patch)

> HS2 should log when shutting down due to OOM
> 
>
> Key: HIVE-22037
> URL: https://issues.apache.org/jira/browse/HIVE-22037
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>
> Currently, if HS2 runs into OOM issue, ThreadPoolExecutorWithOomHook kicks in 
> and runs oomHook, which will stop HS2. Everything happens without logging. In 
> the log, you can only see, that HS2 stopped. 
> We should log the stacktrace. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (HIVE-22037) HS2 should log when shutting down due to OOM

2019-07-24 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-22037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22037:

Attachment: (was: HIVE-22037.1.patch)

> HS2 should log when shutting down due to OOM
> 
>
> Key: HIVE-22037
> URL: https://issues.apache.org/jira/browse/HIVE-22037
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-22037.1.patch
>
>
> Currently, if HS2 runs into OOM issue, ThreadPoolExecutorWithOomHook kicks in 
> and runs oomHook, which will stop HS2. Everything happens without logging. In 
> the log, you can only see, that HS2 stopped. 
> We should log the stacktrace. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (HIVE-22037) HS2 should log when shutting down due to OOM

2019-07-24 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-22037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22037:

Attachment: HIVE-22037.1.patch

> HS2 should log when shutting down due to OOM
> 
>
> Key: HIVE-22037
> URL: https://issues.apache.org/jira/browse/HIVE-22037
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-22037.1.patch
>
>
> Currently, if HS2 runs into OOM issue, ThreadPoolExecutorWithOomHook kicks in 
> and runs oomHook, which will stop HS2. Everything happens without logging. In 
> the log, you can only see, that HS2 stopped. 
> We should log the stacktrace. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Work started] (HIVE-22037) HS2 should log when shutting down due to OOM

2019-07-24 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-22037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-22037 started by Barnabas Maidics.
---
> HS2 should log when shutting down due to OOM
> 
>
> Key: HIVE-22037
> URL: https://issues.apache.org/jira/browse/HIVE-22037
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-22037.1.patch
>
>
> Currently, if HS2 runs into OOM issue, ThreadPoolExecutorWithOomHook kicks in 
> and runs oomHook, which will stop HS2. Everything happens without logging. In 
> the log, you can only see, that HS2 stopped. 
> We should log the stacktrace. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (HIVE-22037) HS2 should log when shutting down due to OOM

2019-07-24 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-22037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-22037:

Attachment: HIVE-22037.1.patch

> HS2 should log when shutting down due to OOM
> 
>
> Key: HIVE-22037
> URL: https://issues.apache.org/jira/browse/HIVE-22037
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-22037.1.patch
>
>
> Currently, if HS2 runs into OOM issue, ThreadPoolExecutorWithOomHook kicks in 
> and runs oomHook, which will stop HS2. Everything happens without logging. In 
> the log, you can only see, that HS2 stopped. 
> We should log the stacktrace. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Assigned] (HIVE-22037) HS2 should log when shutting down due to OOM

2019-07-24 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-22037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics reassigned HIVE-22037:
---


> HS2 should log when shutting down due to OOM
> 
>
> Key: HIVE-22037
> URL: https://issues.apache.org/jira/browse/HIVE-22037
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
>
> Currently, if HS2 runs into OOM issue, ThreadPoolExecutorWithOomHook kicks in 
> and runs oomHook, which will stop HS2. Everything happens without logging. In 
> the log, you can only see, that HS2 stopped. 
> We should log the stacktrace. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (HIVE-20758) Constraints: Show create table does not show constraints

2019-02-13 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20758:

Attachment: HIVE-20758.7.patch
Status: Patch Available  (was: Open)

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, HIVE-20758.2.patch, 
> HIVE-20758.3.patch, HIVE-20758.4.patch, HIVE-20758.5.patch, 
> HIVE-20758.6.patch, HIVE-20758.7.patch, Screen Shot 2019-01-23 at 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |

[jira] [Updated] (HIVE-20758) Constraints: Show create table does not show constraints

2019-02-13 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20758:

Status: Open  (was: Patch Available)

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, HIVE-20758.2.patch, 
> HIVE-20758.3.patch, HIVE-20758.4.patch, HIVE-20758.5.patch, 
> HIVE-20758.6.patch, HIVE-20758.7.patch, Screen Shot 2019-01-23 at 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1539710410')

[jira] [Updated] (HIVE-20758) Constraints: Show create table does not show constraints

2019-02-13 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20758:

Attachment: HIVE-20758.6.patch
Status: Patch Available  (was: Open)

Adding tests for check and default constraints.

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, HIVE-20758.2.patch, 
> HIVE-20758.3.patch, HIVE-20758.4.patch, HIVE-20758.5.patch, 
> HIVE-20758.6.patch, Screen Shot 2019-01-23 at 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',

[jira] [Updated] (HIVE-20758) Constraints: Show create table does not show constraints

2019-02-13 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20758:

Status: Open  (was: Patch Available)

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, HIVE-20758.2.patch, 
> HIVE-20758.3.patch, HIVE-20758.4.patch, HIVE-20758.5.patch, Screen Shot 
> 2019-01-23 at 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1539710410')|
>

[jira] [Updated] (HIVE-20758) Constraints: Show create table does not show constraints

2019-02-08 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20758:

Attachment: HIVE-20758.5.patch
Status: Patch Available  (was: Open)

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, HIVE-20758.2.patch, 
> HIVE-20758.3.patch, HIVE-20758.4.patch, HIVE-20758.5.patch, Screen Shot 
> 2019-01-23 at 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1539710410')

[jira] [Updated] (HIVE-20758) Constraints: Show create table does not show constraints

2019-02-08 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20758:

Status: Open  (was: Patch Available)

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, HIVE-20758.2.patch, 
> HIVE-20758.3.patch, HIVE-20758.4.patch, HIVE-20758.5.patch, Screen Shot 
> 2019-01-23 at 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1539710410')|
>

[jira] [Commented] (HIVE-20758) Constraints: Show create table does not show constraints

2019-02-07 Thread Barnabas Maidics (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762780#comment-16762780
 ] 

Barnabas Maidics commented on HIVE-20758:
-

Thanks [~klcopp] ! 

I added the show create table commands to this create_with_constraints.q file, 
which is already testing with describe formatted/extended. 

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, HIVE-20758.2.patch, 
> HIVE-20758.3.patch, HIVE-20758.4.patch, Screen Shot 2019-01-23 at 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |

[jira] [Updated] (HIVE-20758) Constraints: Show create table does not show constraints

2019-02-07 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20758:

Attachment: HIVE-20758.4.patch
Status: Patch Available  (was: Open)

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, HIVE-20758.2.patch, 
> HIVE-20758.3.patch, HIVE-20758.4.patch, Screen Shot 2019-01-23 at 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1539710410')|
>

[jira] [Updated] (HIVE-20758) Constraints: Show create table does not show constraints

2019-02-07 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20758:

Status: Open  (was: Patch Available)

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, HIVE-20758.2.patch, 
> HIVE-20758.3.patch, Screen Shot 2019-01-23 at 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1539710410')|
> ++
> {code}



--
This

[jira] [Updated] (HIVE-20758) Constraints: Show create table does not show constraints

2019-02-06 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20758:

Attachment: HIVE-20758.3.patch
Status: Patch Available  (was: Open)

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, HIVE-20758.2.patch, 
> HIVE-20758.3.patch, Screen Shot 2019-01-23 at 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1539710410')|
>

[jira] [Updated] (HIVE-20758) Constraints: Show create table does not show constraints

2019-02-06 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20758:

Status: Open  (was: Patch Available)

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, HIVE-20758.2.patch, Screen Shot 
> 2019-01-23 at 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1539710410')|
> ++
> {code}



--
This message was sent by

[jira] [Updated] (HIVE-20758) Constraints: Show create table does not show constraints

2019-02-05 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20758:

Attachment: HIVE-20758.2.patch
Status: Patch Available  (was: Open)

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, HIVE-20758.2.patch, Screen Shot 
> 2019-01-23 at 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1539710410')|
> ++
>

[jira] [Updated] (HIVE-20758) Constraints: Show create table does not show constraints

2019-02-05 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20758:

Status: Open  (was: Patch Available)

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, HIVE-20758.2.patch, Screen Shot 
> 2019-01-23 at 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1539710410')|
> ++
> {code}



--
This message was sent by

[jira] [Updated] (HIVE-20758) Constraints: Show create table does not show constraints

2019-01-23 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20758:

Attachment: (was: HIVE-20758.1.patch)

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, Screen Shot 2019-01-23 at 
> 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1539710410')|
> ++
> {code}



--
This message was sent by Atlassian JIRA

[jira] [Updated] (HIVE-20758) Constraints: Show create table does not show constraints

2019-01-23 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20758:

Attachment: HIVE-20758.1.patch
Status: Patch Available  (was: Open)

Resubmitting the patch, since the Precommit test was triggered by an image :)

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, Screen Shot 2019-01-23 at 
> 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1539710410')|
>

[jira] [Updated] (HIVE-20758) Constraints: Show create table does not show constraints

2019-01-23 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20758:

Status: Open  (was: Patch Available)

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, Screen Shot 2019-01-23 at 
> 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1539710410')|
> ++
> {code}



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HIVE-20758) Constraints: Show create table does not show constraints

2019-01-23 Thread Barnabas Maidics (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749824#comment-16749824
 ] 

Barnabas Maidics commented on HIVE-20758:
-

Hi [~gopalv] ,

I created a solution for this. My patch does not add new functionality to 
Metastore as [~vgarg] suggested but solves the problem.

The only limitation which I couldn't solve is that I think we don't store 
additional keywords in Metastore, which belongs to a constraint (like DISABLE 
or RELY), so I couldn't add it to the result. Do you think these should be 
added to Metastore, or for now it's okay without them as well?

Otherwise, show create table now showing all the constraints (without 
additional keywords) mentioned in the Jira in the following format:

!Screen Shot 2019-01-23 at 11.52.04.png!

 

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, Screen Shot 2019-01-23 at 
> 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT

[jira] [Updated] (HIVE-20758) Constraints: Show create table does not show constraints

2019-01-23 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20758:

Attachment: Screen Shot 2019-01-23 at 11.52.04.png

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch, Screen Shot 2019-01-23 at 
> 11.52.04.png
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1539710410')|
> ++
> {code}



--
This message was sent by Atlassian

[jira] [Updated] (HIVE-20758) Constraints: Show create table does not show constraints

2019-01-23 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20758:

Attachment: HIVE-20758.1.patch
Status: Patch Available  (was: Open)

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20758.1.patch
>
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1539710410')|
> ++
> {code}



--
This message was sent by Atlassian JIRA

[jira] [Assigned] (HIVE-20758) Constraints: Show create table does not show constraints

2019-01-23 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics reassigned HIVE-20758:
---

Assignee: Barnabas Maidics

> Constraints: Show create table does not show constraints
> 
>
> Key: HIVE-20758
> URL: https://issues.apache.org/jira/browse/HIVE-20758
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Barnabas Maidics
>Priority: Major
>
> Even though the desc formatted shows the constraints, the show create table 
> does not
> {code}
> | # Primary Key  | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | pk_in  
> | NULL
>|
> | Column Names:  | inv_date_sk
> | inv_item_sk 
>|
> || NULL   
> | NULL
>|
> | # Foreign Keys | NULL   
> | NULL
>|
> | Table: | 
> tpcds_bin_partitioned_orc_1.inventory  | NULL 
>   |
> | Constraint Name:   | inv_d  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.date_dim.d_date_sk | 
> Column Name:inv_date_sk| Key Sequence:1   
>   |
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_i  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.item.i_item_sk | Column 
> Name:inv_item_sk| Key Sequence:1  
>|
> || NULL   
> | NULL
>|
> | Constraint Name:   | inv_w  
> | NULL
>|
> | Parent Column Name:tpcds_bin_partitioned_orc_1.warehouse.w_warehouse_sk 
> | Column Name:inv_warehouse_sk   | Key Sequence:1 
> |
> || NULL   
> | NULL
>|
> {code}
> But 
> {code}
> ++
> |   createtab_stmt   |
> ++
> | CREATE TABLE `inventory`(  |
> |   `inv_item_sk` bigint,|
> |   `inv_warehouse_sk` bigint,   |
> |   `inv_quantity_on_hand` int,  |
> |   `inv_date_sk` bigint)|
> | ROW FORMAT SERDE   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  |
> | STORED AS INPUTFORMAT  |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION   |
> |   
> 'hdfs:///warehouse/tablespace/managed/hive/tpcds_bin_partitioned_orc_1.db/inventory'
>  |
> | TBLPROPERTIES (|
> |   'bucketing_version'='2', |
> |   'transactional'='true',  |
> |   'transactional_properties'='default',|
> |   'transient_lastDdlTime'='1539710410')|
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21125) Hive does not check for dependent materialized views when issuing a DROP TABLE command

2019-01-16 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-21125:

Description: 
Dropping a table leads to undefined behavior when that table is the source of 
an existing materialized view. The following behavior is observed:
 
 * Table still appears in 'show tables' despite not being in metastore
 * Actions on table hang and then display a "could not fetch table" error
 * Rebuilding any dependent materialized view has same error

It seems that the root cause is the fact that users are allowed to issue a DROP 
TABLE command against a table even if there is a materialized view using this 
table at the time. This is not something I have seen other query languages 
permit. 

Repro steps: Launch these commands from any Hive 3 client:
{code:java}
create table footable (id int); insert into footable values (1), (2), (3);
create materialized view mv_footable as select count(*) from footable;
drop table footable;

--These lines have unexpected behavior
show tables;
select * from footable;
alter materialized view mv_footable rebuild;{code}

  was:
Dropping a table leads to undefined behavior when that table is the source of 
an existing materialized view. The following behavior is observed:
  
 * Table still appears in 'show tables' despite not being in metastore
 * Actions on table hang and then display a "could not fetch table" error
 * Rebuilding any dependent materialized view has same error

It seems that the root cause is the fact that users are allowed to issue a DROP 
TABLE command against a table even if there is a materialized view using this 
table at the time. This is not something I have seen other query languages 
permit. 

Repro steps: Launch these commands from any Hive 3 client:
{code:java}
create table footable (id int); insert into footable values (1), (2), (3);
create materialized view mv_footable as select count(*) from footable;
drop table footable;

--These lines have unexpected behavior
show tables;
select * from footable;
alter materialized view mv_footable rebuild;{code}


> Hive does not check for dependent materialized views when issuing a DROP 
> TABLE command
> --
>
> Key: HIVE-21125
> URL: https://issues.apache.org/jira/browse/HIVE-21125
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 3.1.0
>Reporter: Taylor Cox
>Priority: Major
>
> Dropping a table leads to undefined behavior when that table is the source of 
> an existing materialized view. The following behavior is observed:
>  
>  * Table still appears in 'show tables' despite not being in metastore
>  * Actions on table hang and then display a "could not fetch table" error
>  * Rebuilding any dependent materialized view has same error
> It seems that the root cause is the fact that users are allowed to issue a 
> DROP TABLE command against a table even if there is a materialized view using 
> this table at the time. This is not something I have seen other query 
> languages permit. 
> Repro steps: Launch these commands from any Hive 3 client:
> {code:java}
> create table footable (id int); insert into footable values (1), (2), (3);
> create materialized view mv_footable as select count(*) from footable;
> drop table footable;
> --These lines have unexpected behavior
> show tables;
> select * from footable;
> alter materialized view mv_footable rebuild;{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-21072) NPE when running partitioned CTAS statements

2019-01-16 Thread Barnabas Maidics (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744000#comment-16744000
 ] 

Barnabas Maidics edited comment on HIVE-21072 at 1/16/19 12:53 PM:
---

[~jcamachorodriguez], yes that patch you referred fixes this issue. 


was (Author: b.maidics):
[~jcamachorodriguez], yes, that patch you referred fixes this issue. 

> NPE when running partitioned CTAS statements
> 
>
> Key: HIVE-21072
> URL: https://issues.apache.org/jira/browse/HIVE-21072
> Project: Hive
>  Issue Type: Bug
>Reporter: Liang-Chi Hsieh
>Priority: Major
>  Labels: pull-request-available
>
> HIVE-20241 adds support of partitioned CTAS statements:
> {code:sql}
> CREATE TABLE partition_ctas_1 PARTITIONED BY (key) AS
> SELECT value, key FROM src where key > 200 and key < 300;{code}
>  
> However, I've tried this feature by checking out latest branch-3, and 
> encountered NPE:
> {code:java}
> hive> CREATE TABLE t PARTITIONED BY (part) AS SELECT 1 as id, "a" as part;
> FAILED: NullPointerException null
> {code}
> I also ran the query test partition_ctas.q. The test passes when using 
> TestMiniLlapLocalCliDriver, but when I go to test it with TestCliDriver 
> manually, it also throws NullPointerException:
> {code:java}
> 2018-12-25T05:58:22,221 ERROR [a96009a7-3dda-4d95-9536-e2e16d976856 main] 
> ql.Driver: FAILED: NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.usePartitionColumns(GenMapRedUtils.java:2103)
> at 
> org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.createMRWorkForMergingFiles(GenMapRedUtils.java:1323)
> at 
> org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.process(GenMRFileSink1.java:113)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:54)
> at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
> at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
> at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> at 
> org.apache.hadoop.hive.ql.parse.MapReduceCompiler.generateTaskTree(MapReduceCompiler.java:323)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:244)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12503)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:357)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:166)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:664)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1854)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1801)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1796)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21072) NPE when running partitioned CTAS statements

2019-01-16 Thread Barnabas Maidics (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744000#comment-16744000
 ] 

Barnabas Maidics commented on HIVE-21072:
-

[~jcamachorodriguez], yes, that patch you referred fixes this issue. 

> NPE when running partitioned CTAS statements
> 
>
> Key: HIVE-21072
> URL: https://issues.apache.org/jira/browse/HIVE-21072
> Project: Hive
>  Issue Type: Bug
>Reporter: Liang-Chi Hsieh
>Priority: Major
>  Labels: pull-request-available
>
> HIVE-20241 adds support of partitioned CTAS statements:
> {code:sql}
> CREATE TABLE partition_ctas_1 PARTITIONED BY (key) AS
> SELECT value, key FROM src where key > 200 and key < 300;{code}
>  
> However, I've tried this feature by checking out latest branch-3, and 
> encountered NPE:
> {code:java}
> hive> CREATE TABLE t PARTITIONED BY (part) AS SELECT 1 as id, "a" as part;
> FAILED: NullPointerException null
> {code}
> I also ran the query test partition_ctas.q. The test passes when using 
> TestMiniLlapLocalCliDriver, but when I go to test it with TestCliDriver 
> manually, it also throws NullPointerException:
> {code:java}
> 2018-12-25T05:58:22,221 ERROR [a96009a7-3dda-4d95-9536-e2e16d976856 main] 
> ql.Driver: FAILED: NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.usePartitionColumns(GenMapRedUtils.java:2103)
> at 
> org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.createMRWorkForMergingFiles(GenMapRedUtils.java:1323)
> at 
> org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.process(GenMRFileSink1.java:113)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:54)
> at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
> at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
> at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> at 
> org.apache.hadoop.hive.ql.parse.MapReduceCompiler.generateTaskTree(MapReduceCompiler.java:323)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:244)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12503)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:357)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:166)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:664)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1854)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1801)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1796)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21072) NPE when running partitioned CTAS statements

2019-01-15 Thread Barnabas Maidics (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743070#comment-16743070
 ] 

Barnabas Maidics commented on HIVE-21072:
-

I've also tried running partitioned CTAS and ran into the same error. As I saw, 
if the execution engine is TEZ, it works perfectly (that is why partition_ctas 
worked with TestMiniLlapLocalCliDriver). 

Using MR, the NPE was thrown because Hive tried to create a Map-only merge job 
(_GenMapRedUtils.createMRWorkForMergingFiles_), but the _tableInfo_ of the 
_FileSinkDesc_ doesn't contain an entry with the key of "partition_columns" and 
we try to call split on a null.
{code:java}
String[] partNames = properties.getProperty(

org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_PARTITION_COLUMNS)
.split("/");
{code}
A possible quick fix is to *set _hive.merge.mapfiles_ to _false_* (as a 
default, it is true) so these steps will be skipped 
(_GenMapRedUtils.__isMergeRequired_ will return _false_).
But maybe I miss something about this feature. [~jcamachorodriguez] what do you 
think the long-term fix would be for this?

> NPE when running partitioned CTAS statements
> 
>
> Key: HIVE-21072
> URL: https://issues.apache.org/jira/browse/HIVE-21072
> Project: Hive
>  Issue Type: Bug
>Reporter: Liang-Chi Hsieh
>Priority: Major
>  Labels: pull-request-available
>
> HIVE-20241 adds support of partitioned CTAS statements:
> {code:sql}
> CREATE TABLE partition_ctas_1 PARTITIONED BY (key) AS
> SELECT value, key FROM src where key > 200 and key < 300;{code}
>  
> However, I've tried this feature by checking out latest branch-3, and 
> encountered NPE:
> {code:java}
> hive> CREATE TABLE t PARTITIONED BY (part) AS SELECT 1 as id, "a" as part;
> FAILED: NullPointerException null
> {code}
> I also ran the query test partition_ctas.q. The test passes when using 
> TestMiniLlapLocalCliDriver, but when I go to test it with TestCliDriver 
> manually, it also throws NullPointerException:
> {code:java}
> 2018-12-25T05:58:22,221 ERROR [a96009a7-3dda-4d95-9536-e2e16d976856 main] 
> ql.Driver: FAILED: NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.usePartitionColumns(GenMapRedUtils.java:2103)
> at 
> org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.createMRWorkForMergingFiles(GenMapRedUtils.java:1323)
> at 
> org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.process(GenMRFileSink1.java:113)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:54)
> at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
> at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
> at 
> org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> at 
> org.apache.hadoop.hive.ql.parse.MapReduceCompiler.generateTaskTree(MapReduceCompiler.java:323)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:244)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12503)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:357)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:166)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:664)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1854)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1801)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1796)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21072) NPE when running partitioned CTAS statements

2019-01-15 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-21072:

Description: 
HIVE-20241 adds support of partitioned CTAS statements:
{code:sql}
CREATE TABLE partition_ctas_1 PARTITIONED BY (key) AS
SELECT value, key FROM src where key > 200 and key < 300;{code}
 
However, I've tried this feature by checking out latest branch-3, and 
encountered NPE:
{code:java}
hive> CREATE TABLE t PARTITIONED BY (part) AS SELECT 1 as id, "a" as part;
FAILED: NullPointerException null
{code}
I also ran the query test partition_ctas.q. The test passes when using 
TestMiniLlapLocalCliDriver, but when I go to test it with TestCliDriver 
manually, it also throws NullPointerException:
{code:java}
2018-12-25T05:58:22,221 ERROR [a96009a7-3dda-4d95-9536-e2e16d976856 main] 
ql.Driver: FAILED: NullPointerException null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.usePartitionColumns(GenMapRedUtils.java:2103)
at 
org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.createMRWorkForMergingFiles(GenMapRedUtils.java:1323)
at 
org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.process(GenMRFileSink1.java:113)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at 
org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:54)
at 
org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
at 
org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
at 
org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at 
org.apache.hadoop.hive.ql.parse.MapReduceCompiler.generateTaskTree(MapReduceCompiler.java:323)
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:244)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12503)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:357)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:166)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:664)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1854)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1801)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1796)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
{code}

  was:
HIVE-20241 adds support of partitioned CTAS statements:
{code:sql}
CREATE TABLE partition_ctas_1 PARTITIONED BY (key) AS
SELECT value, key FROM src where key > 200 and key < 300;{code}
 
However, I've tried this feature by checking out latest branch-3, and 
encountered NPE:
{code:java}
hive> CREATE TABLE t PARTITIONED BY (part) AS SELECT 1 as id, "a" as part;
FAILED: NullPointerException null
{code}

I also ran the query test partition_ctas.q. The test passes when using 
TestMiniLlapLocalCliDriver, but when I go to test it with TestCliDriver 
manually, it also throws NullPointerException:
{code}
2018-12-25T05:58:22,221 ERROR [a96009a7-3dda-4d95-9536-e2e16d976856 main] 
ql.Driver: FAILED: NullPointerException null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.usePartitionColumns(GenMapRedUtils.java:2103)
at 
org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.createMRWorkForMergingFiles(GenMapRedUtils.java:1323)
at 
org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.process(GenMRFileSink1.java:113)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at 
org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:54)
at 
org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
at 
org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
at 
org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:65)
at

[jira] [Commented] (HIVE-20241) Support partitioning spec in CTAS statements

2019-01-09 Thread Barnabas Maidics (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738056#comment-16738056
 ] 

Barnabas Maidics commented on HIVE-20241:
-

I think the documentation should be updated about CTAS: 
[https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS).]
It says: 
"CTAS has these restrictions:
 * The target table cannot be a partitioned table."

As I understand this is no longer a restriction. 

 

> Support partitioning spec in CTAS statements
> 
>
> Key: HIVE-20241
> URL: https://issues.apache.org/jira/browse/HIVE-20241
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: TODOC3.2
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-20241.01.patch, HIVE-20241.01.patch, 
> HIVE-20241.01.patch, HIVE-20241.02.patch, HIVE-20241.03.patch, 
> HIVE-20241.patch
>
>
> Currently, for partitioned tables we will declare the table and insert the 
> data in different operations. This issue is to extend CTAS statement to 
> support specifying partition columns.
> For instance:
> {code:sql}
> CREATE TABLE partition_ctas_1 PARTITIONED BY (key) AS
> SELECT value, key FROM src where key > 200 and key < 300;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2019-01-04 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Attachment: (was: HIVE-20760.13.patch)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.10.patch, HIVE-20760.11.patch, 
> HIVE-20760.12.patch, HIVE-20760.13.patch, HIVE-20760.4.patch, 
> HIVE-20760.5.patch, HIVE-20760.6.patch, HIVE-20760.7.patch, 
> HIVE-20760.8.patch, HIVE-20760.9.patch, HIVE-20760.patch, 
> hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2019-01-04 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Attachment: HIVE-20760.13.patch
Status: Patch Available  (was: Open)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.10.patch, HIVE-20760.11.patch, 
> HIVE-20760.12.patch, HIVE-20760.13.patch, HIVE-20760.4.patch, 
> HIVE-20760.5.patch, HIVE-20760.6.patch, HIVE-20760.7.patch, 
> HIVE-20760.8.patch, HIVE-20760.9.patch, HIVE-20760.patch, 
> hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2019-01-04 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Attachment: HIVE-20760.13.patch

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.10.patch, HIVE-20760.11.patch, 
> HIVE-20760.12.patch, HIVE-20760.13.patch, HIVE-20760.4.patch, 
> HIVE-20760.5.patch, HIVE-20760.6.patch, HIVE-20760.7.patch, 
> HIVE-20760.8.patch, HIVE-20760.9.patch, HIVE-20760.patch, 
> hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2019-01-04 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Status: Open  (was: Patch Available)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.10.patch, HIVE-20760.11.patch, 
> HIVE-20760.12.patch, HIVE-20760.13.patch, HIVE-20760.4.patch, 
> HIVE-20760.5.patch, HIVE-20760.6.patch, HIVE-20760.7.patch, 
> HIVE-20760.8.patch, HIVE-20760.9.patch, HIVE-20760.patch, 
> hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-12-18 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Attachment: HIVE-20760.12.patch
Status: Patch Available  (was: Open)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.10.patch, HIVE-20760.11.patch, 
> HIVE-20760.12.patch, HIVE-20760.4.patch, HIVE-20760.5.patch, 
> HIVE-20760.6.patch, HIVE-20760.7.patch, HIVE-20760.8.patch, 
> HIVE-20760.9.patch, HIVE-20760.patch, hiveconf_interned.html, 
> hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-12-18 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Status: Open  (was: Patch Available)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.10.patch, HIVE-20760.11.patch, 
> HIVE-20760.4.patch, HIVE-20760.5.patch, HIVE-20760.6.patch, 
> HIVE-20760.7.patch, HIVE-20760.8.patch, HIVE-20760.9.patch, HIVE-20760.patch, 
> hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-12-14 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Attachment: HIVE-20760.11.patch
Status: Patch Available  (was: Open)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.10.patch, HIVE-20760.11.patch, 
> HIVE-20760.4.patch, HIVE-20760.5.patch, HIVE-20760.6.patch, 
> HIVE-20760.7.patch, HIVE-20760.8.patch, HIVE-20760.9.patch, HIVE-20760.patch, 
> hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-12-14 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Status: Open  (was: Patch Available)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.10.patch, HIVE-20760.11.patch, 
> HIVE-20760.4.patch, HIVE-20760.5.patch, HIVE-20760.6.patch, 
> HIVE-20760.7.patch, HIVE-20760.8.patch, HIVE-20760.9.patch, HIVE-20760.patch, 
> hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-12-11 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Attachment: HIVE-20760.10.patch
Status: Patch Available  (was: Open)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.10.patch, HIVE-20760.4.patch, 
> HIVE-20760.5.patch, HIVE-20760.6.patch, HIVE-20760.7.patch, 
> HIVE-20760.8.patch, HIVE-20760.9.patch, HIVE-20760.patch, 
> hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-12-11 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Status: Open  (was: Patch Available)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.4.patch, HIVE-20760.5.patch, 
> HIVE-20760.6.patch, HIVE-20760.7.patch, HIVE-20760.8.patch, 
> HIVE-20760.9.patch, HIVE-20760.patch, hiveconf_interned.html, 
> hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-12-10 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Attachment: HIVE-20760.9.patch
Status: Patch Available  (was: Open)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.4.patch, HIVE-20760.5.patch, 
> HIVE-20760.6.patch, HIVE-20760.7.patch, HIVE-20760.8.patch, 
> HIVE-20760.9.patch, HIVE-20760.patch, hiveconf_interned.html, 
> hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-12-10 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Status: Open  (was: Patch Available)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.4.patch, HIVE-20760.5.patch, 
> HIVE-20760.6.patch, HIVE-20760.7.patch, HIVE-20760.8.patch, 
> HIVE-20760.9.patch, HIVE-20760.patch, hiveconf_interned.html, 
> hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-11-26 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Status: Open  (was: Patch Available)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.4.patch, HIVE-20760.5.patch, 
> HIVE-20760.6.patch, HIVE-20760.7.patch, HIVE-20760.8.patch, HIVE-20760.patch, 
> hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-11-26 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Attachment: HIVE-20760.8.patch
Status: Patch Available  (was: Open)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.4.patch, HIVE-20760.5.patch, 
> HIVE-20760.6.patch, HIVE-20760.7.patch, HIVE-20760.8.patch, HIVE-20760.patch, 
> hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-11-19 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Attachment: HIVE-20760.7.patch
Status: Patch Available  (was: Open)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.4.patch, HIVE-20760.5.patch, 
> HIVE-20760.6.patch, HIVE-20760.7.patch, HIVE-20760.patch, 
> hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-11-19 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Status: Open  (was: Patch Available)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.4.patch, HIVE-20760.5.patch, 
> HIVE-20760.6.patch, HIVE-20760.7.patch, HIVE-20760.patch, 
> hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-11-19 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Attachment: HIVE-20760.6.patch
Status: Patch Available  (was: Open)

Fixed clone problem in HiveConfProperties caused by cloning already removed 
Properties.

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.4.patch, HIVE-20760.5.patch, 
> HIVE-20760.6.patch, HIVE-20760.patch, hiveconf_interned.html, 
> hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-11-19 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Status: Open  (was: Patch Available)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.4.patch, HIVE-20760.5.patch, HIVE-20760.patch, 
> hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-11-13 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Attachment: HIVE-20760.5.patch
Status: Patch Available  (was: Open)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.4.patch, HIVE-20760.5.patch, HIVE-20760.patch, 
> hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-11-13 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Status: Open  (was: Patch Available)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.4.patch, HIVE-20760.patch, 
> hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-11-06 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Attachment: HIVE-20760.4.patch
Status: Patch Available  (was: Open)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.4.patch, HIVE-20760.patch, 
> hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-11-06 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Status: Open  (was: Patch Available)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.patch, hiveconf_interned.html, 
> hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-10-30 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Attachment: HIVE-20760-3.patch
Status: Patch Available  (was: Open)

HIVE-20760-3.patch: Fixing HiveConfProperties.size() and preventing 
HiveConfProperties chain happening when creating HiveConf from a conf which 
base is already interned.

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760-3.patch, HIVE-20760.patch, hiveconf_interned.html, 
> hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20760) Reducing memory overhead due to multiple HiveConfs

2018-10-30 Thread Barnabas Maidics (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barnabas Maidics updated HIVE-20760:

Status: Open  (was: Patch Available)

> Reducing memory overhead due to multiple HiveConfs
> --
>
> Key: HIVE-20760
> URL: https://issues.apache.org/jira/browse/HIVE-20760
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Barnabas Maidics
>Assignee: Barnabas Maidics
>Priority: Major
> Attachments: HIVE-20760-1.patch, HIVE-20760-2.patch, 
> HIVE-20760.patch, hiveconf_interned.html, hiveconf_original.html
>
>
> The issue is that every Hive task has to load its own version of 
> {{HiveConf}}. When running with a large number of cores per executor (HoS), 
> there is a significant (~10%) amount of memory wasted due to this 
> duplication. 
> I looked into the problem and found a way to reduce the overhead caused by 
> the multiple HiveConf objects.
> I've created an implementation of Properties, somewhat similar to 
> CopyOnFirstWriteProperties. CopyOnFirstWriteProperties can't be used to solve 
> this problem, because it drops the interned Properties right after we add a 
> new property.
> So my implementation looks like this:
>  * When we create a new HiveConf from an existing one (copy constructor), we 
> change the properties object stored by HiveConf to the new Properties 
> implementation (HiveConfProperties). We have 2 possible way to do this. 
> Either we change the visibility of the properties field in the ancestor class 
> (Configuration which comes from hadoop) to protected, or a simpler way is to 
> just change the type using reflection.
>  * HiveConfProperties instantly intern the given properties. After this, 
> every time we add a new property to HiveConf, we add it to an additional 
> Properties object. This way if we create multiple HiveConf with the same base 
> properties, they will use the same Properties object but each session/task 
> can add its own unique properties.
>  * Getting a property from HiveConfProperties would look like this: (I stored 
> the non-interned properties in super class)
>                 String property=super.getProperty(key);
>                 if (property == null) property= interned.getProperty(key);
>                 return property;
> Running some tests showed that the interning works (with 50 connections to 
> HiveServer2, heapdumps created after sessions are created for queries): 
> Overall memory:
>          original: 34,599K              interned: 20,582K
> Retained memory of HiveConfs:
>         original: 16,366K               interned: 10,804K
> I attach the JXray reports about the heapdumps.
> What are your thoughts about this solution? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 >

1 - 100 of 117 matches

Mail list logo