[jira] [Updated] (HIVE-19171) Persist runtime statistics in metastore

2018-04-23 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19171:

Attachment: HIVE-19171.03.patch

> Persist runtime statistics in metastore
> ---
>
> Key: HIVE-19171
> URL: https://issues.apache.org/jira/browse/HIVE-19171
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19171.01.patch, HIVE-19171.01wip01.patch, 
> HIVE-19171.01wip02.patch, HIVE-19171.01wip03.patch, HIVE-19171.02.patch, 
> HIVE-19171.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19171) Persist runtime statistics in metastore

2018-04-23 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19171:

Attachment: (was: HIVE-19171.03.patch)

> Persist runtime statistics in metastore
> ---
>
> Key: HIVE-19171
> URL: https://issues.apache.org/jira/browse/HIVE-19171
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19171.01.patch, HIVE-19171.01wip01.patch, 
> HIVE-19171.01wip02.patch, HIVE-19171.01wip03.patch, HIVE-19171.02.patch, 
> HIVE-19171.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19171) Persist runtime statistics in metastore

2018-04-23 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19171:

Attachment: HIVE-19171.03.patch

> Persist runtime statistics in metastore
> ---
>
> Key: HIVE-19171
> URL: https://issues.apache.org/jira/browse/HIVE-19171
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19171.01.patch, HIVE-19171.01wip01.patch, 
> HIVE-19171.01wip02.patch, HIVE-19171.01wip03.patch, HIVE-19171.02.patch, 
> HIVE-19171.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19194) TestDruidStorageHandler fails

2018-04-23 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16447925#comment-16447925
 ] 

Zoltan Haindrich commented on HIVE-19194:
-

[~vgarg] I've added an addendum for branch-3 - there were some import problems

> TestDruidStorageHandler fails
> -
>
> Key: HIVE-19194
> URL: https://issues.apache.org/jira/browse/HIVE-19194
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Reporter: Ashutosh Chauhan
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-19194.patch
>
>
> This tests fails randomly. If its not reproducible locally consider improving 
> its stability since it does fail once in a while on Hive QA. 
> {code}
> java.lang.AssertionError: expected:<0> but was:<1> at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.failNotEquals(Assert.java:743) at 
> org.junit.Assert.assertEquals(Assert.java:118) at 
> org.junit.Assert.assertEquals(Assert.java:555) at 
> org.junit.Assert.assertEquals(Assert.java:542) at 
> org.apache.hadoop.hive.druid.TestDruidStorageHandler.testCommitMultiInsertOverwriteTable(TestDruidStorageHandler.java:414)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19131) DecimalColumnStatsMergerTest comparison review

2018-04-23 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448002#comment-16448002
 ] 

Zoltan Haindrich commented on HIVE-19131:
-

test failures are not related

> DecimalColumnStatsMergerTest comparison review
> --
>
> Key: HIVE-19131
> URL: https://issues.apache.org/jira/browse/HIVE-19131
> Project: Hive
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: HIVE-19131.01.patch
>
>
> DecimalColumnStatsMergerTest has a strange comparison logic, which needs to 
> be reviewed.
> Regarding low and high values, it uses compareTo with the same direction, 
> which seems to be incorrect: old.compareTo(new) > 0 -> pick old value in both 
> cases
> {code:java}
> Decimal lowValue = aggregateData.getLowValue() != null && 
> (aggregateData.getLowValue().compareTo(newData.getLowValue()) > 0) ? 
> aggregateData .getLowValue() : newData.getLowValue(); 
> aggregateData.setLowValue(lowValue); 
> Decimal highValue = aggregateData.getHighValue() != null && 
> (aggregateData.getHighValue().compareTo(newData.getHighValue()) > 0) ? 
> aggregateData .getHighValue() : newData.getHighValue();
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19131) DecimalColumnStatsMergerTest comparison review

2018-04-23 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19131:

   Resolution: Fixed
Fix Version/s: 3.1.0
   3.0.0
   Status: Resolved  (was: Patch Available)

pushed to master and branch-3;
Thank you [~abstractdog] for fixing this!

> DecimalColumnStatsMergerTest comparison review
> --
>
> Key: HIVE-19131
> URL: https://issues.apache.org/jira/browse/HIVE-19131
> Project: Hive
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-19131.01.patch
>
>
> DecimalColumnStatsMergerTest has a strange comparison logic, which needs to 
> be reviewed.
> Regarding low and high values, it uses compareTo with the same direction, 
> which seems to be incorrect: old.compareTo(new) > 0 -> pick old value in both 
> cases
> {code:java}
> Decimal lowValue = aggregateData.getLowValue() != null && 
> (aggregateData.getLowValue().compareTo(newData.getLowValue()) > 0) ? 
> aggregateData .getLowValue() : newData.getLowValue(); 
> aggregateData.setLowValue(lowValue); 
> Decimal highValue = aggregateData.getHighValue() != null && 
> (aggregateData.getHighValue().compareTo(newData.getHighValue()) > 0) ? 
> aggregateData .getHighValue() : newData.getHighValue();
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19274) Add an OpTreeSignature persistence checker hook

2018-04-23 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-19274:
---


> Add an OpTreeSignature persistence checker hook
> ---
>
> Key: HIVE-19274
> URL: https://issues.apache.org/jira/browse/HIVE-19274
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> Adding a Hook to run during testing which checks that OpTreeSignatures are 
> working as expected would be really usefull; it should run at least during 
> the PerfCliDriver 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19274) Add an OpTreeSignature persistence checker hook

2018-04-23 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19274:

Status: Patch Available  (was: Open)

01wip01) also contains 19171 for now

> Add an OpTreeSignature persistence checker hook
> ---
>
> Key: HIVE-19274
> URL: https://issues.apache.org/jira/browse/HIVE-19274
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19274.01wip01.patch
>
>
> Adding a Hook to run during testing which checks that OpTreeSignatures are 
> working as expected would be really usefull; it should run at least during 
> the PerfCliDriver 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19274) Add an OpTreeSignature persistence checker hook

2018-04-23 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19274:

Attachment: HIVE-19274.01wip01.patch

> Add an OpTreeSignature persistence checker hook
> ---
>
> Key: HIVE-19274
> URL: https://issues.apache.org/jira/browse/HIVE-19274
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19274.01wip01.patch
>
>
> Adding a Hook to run during testing which checks that OpTreeSignatures are 
> working as expected would be really usefull; it should run at least during 
> the PerfCliDriver 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19166) TestMiniLlapLocalCliDriver sysdb failure

2018-04-23 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448590#comment-16448590
 ] 

Zoltan Haindrich commented on HIVE-19166:
-

unfortunately this test lists tables actually in the database which might be 
different...because of HIVE-18051
 [~vgarg]: I hope you don't mind; I'll upload a new patch

> TestMiniLlapLocalCliDriver sysdb failure
> 
>
> Key: HIVE-19166
> URL: https://issues.apache.org/jira/browse/HIVE-19166
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-19166.1.patch, HIVE-19166.2.patch, 
> HIVE-19166.3.patch
>
>
> Broken by HIVE-18715



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19166) TestMiniLlapLocalCliDriver sysdb failure

2018-04-23 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-19166:
---

Assignee: Zoltan Haindrich  (was: Vineet Garg)

> TestMiniLlapLocalCliDriver sysdb failure
> 
>
> Key: HIVE-19166
> URL: https://issues.apache.org/jira/browse/HIVE-19166
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Vineet Garg
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19166.04.patch, HIVE-19166.1.patch, 
> HIVE-19166.2.patch, HIVE-19166.3.patch
>
>
> Broken by HIVE-18715



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19166) TestMiniLlapLocalCliDriver sysdb failure

2018-04-23 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19166:

Attachment: HIVE-19166.04.patch

> TestMiniLlapLocalCliDriver sysdb failure
> 
>
> Key: HIVE-19166
> URL: https://issues.apache.org/jira/browse/HIVE-19166
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Vineet Garg
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19166.04.patch, HIVE-19166.1.patch, 
> HIVE-19166.2.patch, HIVE-19166.3.patch
>
>
> Broken by HIVE-18715



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19166) TestMiniLlapLocalCliDriver sysdb failure

2018-04-23 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-19166:
---

Assignee: Vineet Garg  (was: Zoltan Haindrich)

> TestMiniLlapLocalCliDriver sysdb failure
> 
>
> Key: HIVE-19166
> URL: https://issues.apache.org/jira/browse/HIVE-19166
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-19166.04.patch, HIVE-19166.1.patch, 
> HIVE-19166.2.patch, HIVE-19166.3.patch
>
>
> Broken by HIVE-18715



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-19166) TestMiniLlapLocalCliDriver sysdb failure

2018-04-23 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448590#comment-16448590
 ] 

Zoltan Haindrich edited comment on HIVE-19166 at 4/23/18 5:59 PM:
--

unfortunately this test lists tables actually in the database which might be 
different...because of HIVE-18051
 [~vgarg]: I hope you don't mind; I'll upload a new patch
I've just listed all datasets...to load all - I hope this will make the ptests 
happy


was (Author: kgyrtkirk):
unfortunately this test lists tables actually in the database which might be 
different...because of HIVE-18051
 [~vgarg]: I hope you don't mind; I'll upload a new patch

> TestMiniLlapLocalCliDriver sysdb failure
> 
>
> Key: HIVE-19166
> URL: https://issues.apache.org/jira/browse/HIVE-19166
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-19166.04.patch, HIVE-19166.1.patch, 
> HIVE-19166.2.patch, HIVE-19166.3.patch
>
>
> Broken by HIVE-18715



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19194) TestDruidStorageHandler fails

2018-04-23 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448745#comment-16448745
 ] 

Zoltan Haindrich commented on HIVE-19194:
-

[~bslim] its only on branch-3 : 
https://github.com/apache/hive/commit/4397f38c9da60462a8cf1bd0dd6ed8dc7e745aba

> TestDruidStorageHandler fails
> -
>
> Key: HIVE-19194
> URL: https://issues.apache.org/jira/browse/HIVE-19194
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Reporter: Ashutosh Chauhan
>Assignee: slim bouguerra
>Priority: Major
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-19194.patch
>
>
> This tests fails randomly. If its not reproducible locally consider improving 
> its stability since it does fail once in a while on Hive QA. 
> {code}
> java.lang.AssertionError: expected:<0> but was:<1> at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.failNotEquals(Assert.java:743) at 
> org.junit.Assert.assertEquals(Assert.java:118) at 
> org.junit.Assert.assertEquals(Assert.java:555) at 
> org.junit.Assert.assertEquals(Assert.java:542) at 
> org.apache.hadoop.hive.druid.TestDruidStorageHandler.testCommitMultiInsertOverwriteTable(TestDruidStorageHandler.java:414)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-19171) Persist runtime statistics in metastore

2018-04-23 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-19171.
-
Resolution: Fixed

previous commit have missed a file which was renamed...thank you for reverting 
it!
pushed to master. Thank you Ashutosh for reviewing the patch!


> Persist runtime statistics in metastore
> ---
>
> Key: HIVE-19171
> URL: https://issues.apache.org/jira/browse/HIVE-19171
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-19171.01.patch, HIVE-19171.01wip01.patch, 
> HIVE-19171.01wip02.patch, HIVE-19171.01wip03.patch, HIVE-19171.02.patch, 
> HIVE-19171.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19077) Handle duplicate ptests requests standing in queue at the same time

2018-04-23 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449372#comment-16449372
 ] 

Zoltan Haindrich commented on HIVE-19077:
-

I think we should be in control of our jenkins jobs; but I don't have the right 
to do it...


> Handle duplicate ptests requests standing in queue at the same time
> ---
>
> Key: HIVE-19077
> URL: https://issues.apache.org/jira/browse/HIVE-19077
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Blocker
> Fix For: 3.1.0
>
> Attachments: HIVE-19077.0.patch, HIVE-19077.1.patch, 
> HIVE-19077.overrideoption.patch, HIVE-19077.sslFix.patch
>
>
> I've been keeping on eye on our {{PreCommit-HIVE-Build}} job, and what I 
> noticed that sometimes huge queues can build up, that contain jira's more 
> than once. (Yesterday I've seen a queue of 40, having 31 distinct jiras..)
> Simple scenario is that I upload a patch, it gets queued for ptest (already 
> long queue), and 3 hours later I will update it, re-upload and re-queue. Now 
> the current ptest infra seems to be smart enough to always deal with the 
> latest patch, so what will happen is that the same patch will be tested 2 
> times (with ~3 hours) diff, most probably with same result.
> I propose we do some deduplication - if ptest starts running the request for 
> Jira X, then it can take a look on the current queue, and see if X is there 
> again. If so, it can skip for now, it will be picked up later anyway.
> In practice this means that if you reconsider your patch and update it, your 
> original place in the queue will be gone (like as a penalty for changing it), 
> but overall it saves resources for the whole community.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19274) Add an OpTreeSignature persistence checker hook

2018-04-23 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19274:

Attachment: HIVE-19274.01.patch

> Add an OpTreeSignature persistence checker hook
> ---
>
> Key: HIVE-19274
> URL: https://issues.apache.org/jira/browse/HIVE-19274
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19274.01.patch, HIVE-19274.01wip01.patch
>
>
> Adding a Hook to run during testing which checks that OpTreeSignatures are 
> working as expected would be really usefull; it should run at least during 
> the PerfCliDriver 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19274) Add an OpTreeSignature persistence checker hook

2018-04-23 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449396#comment-16449396
 ] 

Zoltan Haindrich commented on HIVE-19274:
-

test failures are not related; attaching patch rebased to current master

> Add an OpTreeSignature persistence checker hook
> ---
>
> Key: HIVE-19274
> URL: https://issues.apache.org/jira/browse/HIVE-19274
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19274.01.patch, HIVE-19274.01wip01.patch
>
>
> Adding a Hook to run during testing which checks that OpTreeSignatures are 
> working as expected would be really usefull; it should run at least during 
> the PerfCliDriver 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19171) Persist runtime statistics in metastore

2018-04-24 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19171:

Attachment: (was: HIVE-19171.01-branch-3.patch)

> Persist runtime statistics in metastore
> ---
>
> Key: HIVE-19171
> URL: https://issues.apache.org/jira/browse/HIVE-19171
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-19171.01.patch, HIVE-19171.01wip01.patch, 
> HIVE-19171.01wip02.patch, HIVE-19171.01wip03.patch, HIVE-19171.02.patch, 
> HIVE-19171.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19171) Persist runtime statistics in metastore

2018-04-24 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19171:

Attachment: HIVE-19171.01-branch-3.patch

> Persist runtime statistics in metastore
> ---
>
> Key: HIVE-19171
> URL: https://issues.apache.org/jira/browse/HIVE-19171
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-19171.01.patch, HIVE-19171.01wip01.patch, 
> HIVE-19171.01wip02.patch, HIVE-19171.01wip03.patch, HIVE-19171.02.patch, 
> HIVE-19171.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HIVE-19171) Persist runtime statistics in metastore

2018-04-24 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reopened HIVE-19171:
-

> Persist runtime statistics in metastore
> ---
>
> Key: HIVE-19171
> URL: https://issues.apache.org/jira/browse/HIVE-19171
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-19171.01.patch, HIVE-19171.01wip01.patch, 
> HIVE-19171.01wip02.patch, HIVE-19171.01wip03.patch, HIVE-19171.02.patch, 
> HIVE-19171.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19171) Persist runtime statistics in metastore

2018-04-24 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19171:

Status: Patch Available  (was: Reopened)

> Persist runtime statistics in metastore
> ---
>
> Key: HIVE-19171
> URL: https://issues.apache.org/jira/browse/HIVE-19171
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-19171.01-branch-3.patch, HIVE-19171.01.patch, 
> HIVE-19171.01wip01.patch, HIVE-19171.01wip02.patch, HIVE-19171.01wip03.patch, 
> HIVE-19171.02.patch, HIVE-19171.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19171) Persist runtime statistics in metastore

2018-04-24 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19171:

Attachment: HIVE-19171.01-branch-3.patch

> Persist runtime statistics in metastore
> ---
>
> Key: HIVE-19171
> URL: https://issues.apache.org/jira/browse/HIVE-19171
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-19171.01-branch-3.patch, HIVE-19171.01.patch, 
> HIVE-19171.01wip01.patch, HIVE-19171.01wip02.patch, HIVE-19171.01wip03.patch, 
> HIVE-19171.02.patch, HIVE-19171.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19171) Persist runtime statistics in metastore

2018-04-25 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19171:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

pushed to branch-3 

> Persist runtime statistics in metastore
> ---
>
> Key: HIVE-19171
> URL: https://issues.apache.org/jira/browse/HIVE-19171
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-19171.01-branch-3.patch, HIVE-19171.01.patch, 
> HIVE-19171.01wip01.patch, HIVE-19171.01wip02.patch, HIVE-19171.01wip03.patch, 
> HIVE-19171.02.patch, HIVE-19171.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-17626) Query reoptimization using cached runtime statistics

2018-04-25 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-17626:

Labels: TODOC3.0  (was: )

> Query reoptimization using cached runtime statistics
> 
>
> Key: HIVE-17626
> URL: https://issues.apache.org/jira/browse/HIVE-17626
> Project: Hive
>  Issue Type: New Feature
>  Components: Logical Optimizer
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-17626.01.patch, HIVE-17626.01wip01.patch, 
> HIVE-17626.02.patch, HIVE-17626.03.patch, HIVE-17626.04.patch, 
> HIVE-17626.05.patch, HIVE-17626.06.patch, HIVE-17626.07A.patch, 
> HIVE-17626.07B.patch, HIVE-17626.08.patch, HIVE-17626.09.patch, 
> HIVE-17626.10.patch, HIVE-17626.11.patch, runtimestats.patch
>
>
> Something similar to "EXPLAIN ANALYZE" where we annotate explain plan with 
> actual and estimated statistics. The runtime stats can be cached at query 
> level and subsequent execution of the same query can make use of the cached 
> statistics from the previous run for better optimization. 
> Some use cases,
> 1) re-planning join query (mapjoin failures can be converted to shuffle joins)
> 2) better statistics for table scan operator if dynamic partition pruning is 
> involved
> 3) Better estimates for bloom filter initialization (setting expected entries 
> during merge)
> This can extended to support wider queries by caching fragments of operator 
> plans scanning same table(s) or matching some operator sequences.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19274) Add an OpTreeSignature persistence checker hook

2018-04-25 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19274:

   Resolution: Fixed
Fix Version/s: 3.1.0
   3.0.0
   Status: Resolved  (was: Patch Available)

pushed to master, branch-3. Thank you Ashutosh for reviewing the changes!

> Add an OpTreeSignature persistence checker hook
> ---
>
> Key: HIVE-19274
> URL: https://issues.apache.org/jira/browse/HIVE-19274
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-19274.01.patch, HIVE-19274.01wip01.patch
>
>
> Adding a Hook to run during testing which checks that OpTreeSignatures are 
> working as expected would be really usefull; it should run at least during 
> the PerfCliDriver 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19166) TestMiniLlapLocalCliDriver sysdb failure

2018-04-25 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-19166:
---

Assignee: Zoltan Haindrich  (was: Vineet Garg)

> TestMiniLlapLocalCliDriver sysdb failure
> 
>
> Key: HIVE-19166
> URL: https://issues.apache.org/jira/browse/HIVE-19166
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Vineet Garg
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19166.04.patch, HIVE-19166.1.patch, 
> HIVE-19166.2.patch, HIVE-19166.3.patch
>
>
> Broken by HIVE-18715



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19166) TestMiniLlapLocalCliDriver sysdb failure

2018-04-25 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452582#comment-16452582
 ] 

Zoltan Haindrich commented on HIVE-19166:
-

ok :) then I'll try to push it thru the finish line
requeued for test execution

> TestMiniLlapLocalCliDriver sysdb failure
> 
>
> Key: HIVE-19166
> URL: https://issues.apache.org/jira/browse/HIVE-19166
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Vineet Garg
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19166.04.patch, HIVE-19166.1.patch, 
> HIVE-19166.2.patch, HIVE-19166.3.patch
>
>
> Broken by HIVE-18715



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-19303) Fix grammar warnings

2018-04-25 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-19303.
-
Resolution: Duplicate

dup of HIVE-19278

> Fix grammar warnings
> 
>
> Key: HIVE-19303
> URL: https://issues.apache.org/jira/browse/HIVE-19303
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> It seems to be that something is not right around the handling of "KW_CHECK"
> https://github.com/apache/hive/blob/da10aabe56edf8fbb26d89d64bedcc4afa84a305/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g#L2376
> {code}
> warning(200): org/apache/hadoop/hive/ql/parse/HiveParser.g:2376:5: 
> Decision can match input such as "KW_CHECK {KW_EXISTS, KW_TINYINT}" using 
> multiple alternatives: 1, 2
> As a result, alternative(s) 2 were disabled for that input
> warning(200): org/apache/hadoop/hive/ql/parse/HiveParser.g:2376:5: 
> Decision can match input such as "KW_CHECK KW_STRUCT LESSTHAN" using multiple 
> alternatives: 1, 2
> As a result, alternative(s) 2 were disabled for that input
> warning(200): org/apache/hadoop/hive/ql/parse/HiveParser.g:2376:5: 
> Decision can match input such as "KW_CHECK KW_DATETIME" using multiple 
> alternatives: 1, 2
> As a result, alternative(s) 2 were disabled for that input
> warning(200): org/apache/hadoop/hive/ql/parse/HiveParser.g:2376:5: 
> Decision can match input such as "KW_CHECK KW_DATE {LPAREN, StringLiteral}" 
> using multiple alternatives: 1, 2
> As a result, alternative(s) 2 were disabled for that input
> warning(200): org/apache/hadoop/hive/ql/parse/HiveParser.g:2376:5: 
> Decision can match input such as "KW_CHECK KW_UNIONTYPE LESSTHAN" using 
> multiple alternatives: 1, 2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18739) Add support for Import/Export from Acid table

2018-04-25 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452635#comment-16452635
 ] 

Zoltan Haindrich commented on HIVE-18739:
-

[~ekoifman] It seems to me that this patch have broken a test: 
{{TestNegativeMinimrCliDriver#testCliDriver[minimr_broken_pipe]}}
I've rerun the test before and after 699c5768c88967abd507122d775bd5955ca45218 
and the failure is reproducible;
Could you please take a look?


> Add support for Import/Export from Acid table
> -
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-18739.01-branch-3.patch, HIVE-18739.01.patch, 
> HIVE-18739.02-branch-3.patch, HIVE-18739.04.patch, HIVE-18739.06.patch, 
> HIVE-18739.08.patch, HIVE-18739.09.patch, HIVE-18739.10.patch, 
> HIVE-18739.11.patch, HIVE-18739.12.patch, HIVE-18739.13.patch, 
> HIVE-18739.14.patch, HIVE-18739.15.patch, HIVE-18739.16.patch, 
> HIVE-18739.17.patch, HIVE-18739.19.patch, HIVE-18739.20.patch, 
> HIVE-18739.21.patch, HIVE-18739.23.patch, HIVE-18739.24.patch, 
> HIVE-18739.25.patch, HIVE-18739.26.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18862) qfiles: prepare .q files for using datasets

2018-04-25 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-18862:

Assignee: Laszlo Bodor  (was: Zoltan Haindrich)
  Status: Patch Available  (was: Reopened)

> qfiles: prepare .q files for using datasets
> ---
>
> Key: HIVE-18862
> URL: https://issues.apache.org/jira/browse/HIVE-18862
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-18862.01.patch, HIVE-18862.02.patch, 
> HIVE-18862.03.patch, HIVE-18862.04.patch, HIVE-18862.05.patch, 
> HIVE-18862.06.patch, HIVE-18862.07.patch, HIVE-18862.08.patch, 
> HIVE-18862.09-branch-3.patch, HIVE-18862.09.patch
>
>
> # Parse .q files for source table usage
>  # Add needed dataset annotations
>  # Remove create table statements from "q_test_init.sql" like files
>  # Handle oncoming issues related to dataset introduction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HIVE-18862) qfiles: prepare .q files for using datasets

2018-04-25 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reopened HIVE-18862:
-

> qfiles: prepare .q files for using datasets
> ---
>
> Key: HIVE-18862
> URL: https://issues.apache.org/jira/browse/HIVE-18862
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-18862.01.patch, HIVE-18862.02.patch, 
> HIVE-18862.03.patch, HIVE-18862.04.patch, HIVE-18862.05.patch, 
> HIVE-18862.06.patch, HIVE-18862.07.patch, HIVE-18862.08.patch, 
> HIVE-18862.09-branch-3.patch, HIVE-18862.09.patch
>
>
> # Parse .q files for source table usage
>  # Add needed dataset annotations
>  # Remove create table statements from "q_test_init.sql" like files
>  # Handle oncoming issues related to dataset introduction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18862) qfiles: prepare .q files for using datasets

2018-04-25 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-18862:
---

Assignee: Zoltan Haindrich  (was: Laszlo Bodor)

> qfiles: prepare .q files for using datasets
> ---
>
> Key: HIVE-18862
> URL: https://issues.apache.org/jira/browse/HIVE-18862
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Laszlo Bodor
>Assignee: Zoltan Haindrich
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-18862.01.patch, HIVE-18862.02.patch, 
> HIVE-18862.03.patch, HIVE-18862.04.patch, HIVE-18862.05.patch, 
> HIVE-18862.06.patch, HIVE-18862.07.patch, HIVE-18862.08.patch, 
> HIVE-18862.09-branch-3.patch, HIVE-18862.09.patch
>
>
> # Parse .q files for source table usage
>  # Add needed dataset annotations
>  # Remove create table statements from "q_test_init.sql" like files
>  # Handle oncoming issues related to dataset introduction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18862) qfiles: prepare .q files for using datasets

2018-04-25 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-18862:

Attachment: HIVE-18862.09-branch-3.patch

> qfiles: prepare .q files for using datasets
> ---
>
> Key: HIVE-18862
> URL: https://issues.apache.org/jira/browse/HIVE-18862
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Laszlo Bodor
>Assignee: Zoltan Haindrich
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-18862.01.patch, HIVE-18862.02.patch, 
> HIVE-18862.03.patch, HIVE-18862.04.patch, HIVE-18862.05.patch, 
> HIVE-18862.06.patch, HIVE-18862.07.patch, HIVE-18862.08.patch, 
> HIVE-18862.09-branch-3.patch, HIVE-18862.09.patch
>
>
> # Parse .q files for source table usage
>  # Add needed dataset annotations
>  # Remove create table statements from "q_test_init.sql" like files
>  # Handle oncoming issues related to dataset introduction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18862) qfiles: prepare .q files for using datasets

2018-04-25 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452664#comment-16452664
 ] 

Zoltan Haindrich commented on HIVE-18862:
-

I've attached a patch for branch-3

> qfiles: prepare .q files for using datasets
> ---
>
> Key: HIVE-18862
> URL: https://issues.apache.org/jira/browse/HIVE-18862
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-18862.01.patch, HIVE-18862.02.patch, 
> HIVE-18862.03.patch, HIVE-18862.04.patch, HIVE-18862.05.patch, 
> HIVE-18862.06.patch, HIVE-18862.07.patch, HIVE-18862.08.patch, 
> HIVE-18862.09-branch-3.patch, HIVE-18862.09.patch
>
>
> # Parse .q files for source table usage
>  # Add needed dataset annotations
>  # Remove create table statements from "q_test_init.sql" like files
>  # Handle oncoming issues related to dataset introduction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19137) orcfiledump doesn't print hive.acid.version value

2018-04-26 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453727#comment-16453727
 ] 

Zoltan Haindrich commented on HIVE-19137:
-

[~ikryvenko] [~ekoifman]; it seems this patch have broken some statistics in 
the q.out-s 
{{TestCliDriver#testCliDriver[autoColumnStats_4]}} is failing because of this 
patch

> orcfiledump doesn't print hive.acid.version value
> -
>
> Key: HIVE-19137
> URL: https://issues.apache.org/jira/browse/HIVE-19137
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Igor Kryvenko
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-19137-branch-3.01.patch, HIVE-19137.01.patch, 
> HIVE-19137.02-branch-3.patch, HIVE-19137.02.patch, 
> HIVE-19137.03-branch-3.patch, HIVE-19137.03.patch, HIVE-19137.04.patch, 
> HIVE-19137.05.patch
>
>
> HIVE-18659 added hive.acid.version in the file footer.  
> orcfiledump prints something like 
> {noformat}
> User Metadata:
>   hive.acid.key.index=1,536870912,1;
>   hive.acid.stats=2,0,0
>   hive.acid.version=
> {noformat}
> probably because
> {noformat}
> public static void setAcidVersionInDataFile(Writer writer) {
>   //so that we know which version wrote the file
>   ByteBuffer bf = ByteBuffer.allocate(4).putInt(ORC_ACID_VERSION);
>   bf.rewind(); //don't ask - some ByteBuffer weridness. w/o this, empty 
> buffer is written
>   writer.addUserMetadata(ACID_VERSION_KEY, bf);
> }
> {noformat}
> use 
> {{UTF8.encode())}} instead



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18739) Add support for Import/Export from Acid table

2018-04-26 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453729#comment-16453729
 ] 

Zoltan Haindrich commented on HIVE-18739:
-

[~ekoifman]  It seems to me that this patch have broken a test: 
TestNegativeMinimrCliDriver#testCliDriver[minimr_broken_pipe]
I've rerun the test before and after 699c5768c88967abd507122d775bd5955ca45218 
and the failure is reproducible;
Could you please take a look?

> Add support for Import/Export from Acid table
> -
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-18739.01-branch-3.patch, HIVE-18739.01.patch, 
> HIVE-18739.02-branch-3.patch, HIVE-18739.04.patch, HIVE-18739.06.patch, 
> HIVE-18739.08.patch, HIVE-18739.09.patch, HIVE-18739.10.patch, 
> HIVE-18739.11.patch, HIVE-18739.12.patch, HIVE-18739.13.patch, 
> HIVE-18739.14.patch, HIVE-18739.15.patch, HIVE-18739.16.patch, 
> HIVE-18739.17.patch, HIVE-18739.19.patch, HIVE-18739.20.patch, 
> HIVE-18739.21.patch, HIVE-18739.23.patch, HIVE-18739.24.patch, 
> HIVE-18739.25.patch, HIVE-18739.26.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19137) orcfiledump doesn't print hive.acid.version value

2018-04-26 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453730#comment-16453730
 ] 

Zoltan Haindrich commented on HIVE-19137:
-

[~ikryvenko] [~ekoifman]  TestCliDriver#testCliDriver[acid_nullscan] is also 
affected

> orcfiledump doesn't print hive.acid.version value
> -
>
> Key: HIVE-19137
> URL: https://issues.apache.org/jira/browse/HIVE-19137
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Igor Kryvenko
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-19137-branch-3.01.patch, HIVE-19137.01.patch, 
> HIVE-19137.02-branch-3.patch, HIVE-19137.02.patch, 
> HIVE-19137.03-branch-3.patch, HIVE-19137.03.patch, HIVE-19137.04.patch, 
> HIVE-19137.05.patch
>
>
> HIVE-18659 added hive.acid.version in the file footer.  
> orcfiledump prints something like 
> {noformat}
> User Metadata:
>   hive.acid.key.index=1,536870912,1;
>   hive.acid.stats=2,0,0
>   hive.acid.version=
> {noformat}
> probably because
> {noformat}
> public static void setAcidVersionInDataFile(Writer writer) {
>   //so that we know which version wrote the file
>   ByteBuffer bf = ByteBuffer.allocate(4).putInt(ORC_ACID_VERSION);
>   bf.rewind(); //don't ask - some ByteBuffer weridness. w/o this, empty 
> buffer is written
>   writer.addUserMetadata(ACID_VERSION_KEY, bf);
> }
> {noformat}
> use 
> {{UTF8.encode())}} instead



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19142) Umbrella: branch-3 failing tests

2018-04-26 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453736#comment-16453736
 ] 

Zoltan Haindrich commented on HIVE-19142:
-

there are some test breaks on master; which got backported to branch-3 also: 
HIVE-19137, HIVE-18739 ; they'll be probably fixed soon

> Umbrella: branch-3 failing tests
> 
>
> Key: HIVE-19142
> URL: https://issues.apache.org/jira/browse/HIVE-19142
> Project: Hive
>  Issue Type: Test
>Reporter: Vineet Garg
>Priority: Major
>
> This is the list [~alangates] specified on HIVE-19135 which are non-oom test 
> failures:
> *Errors*:
> TestAcidOnTez.testGetSplitsLocks
> TestJdbcWithLocalClusterSpark.testSparkQuery
> TestJdbcWithLocalClusterSpark.testTempTable
> TestJdbcWithMiniLlap.testLlapInputFormatEndToEnd
> TestMTQueries.testMTQueries1
> TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery
> TestNegativeCliDriver.alter_notnull_constraint_violation
> TestNegativeCliDriver.insert_into_acid_notnull
> TestNegativeCliDriver.insert_into_notnull_constraint
> TestNegativeCliDriver.insert_multi_into_notnull
> TestNegativeCliDriver.insert_overwrite_notnull_constraint
> TestNegativeCliDriver.update_notnull_constraint
> *Failures*:
> TestBlobstoreCliDriver.insert_into_dynamic_partitions
> TestBlobstoreCliDriver.insert_overwrite_directory
> TestBlobstoreCliDriver.insert_overwrite_dynamic_partitions
> -TestCliDriver.acid_table_stats-
> TestCliDriver.auto_sortmerge_join_2
> TestCliDriver.avro_alter_table_update_columns
> TestCliDriver.avrotblsjoin
> TestCliDriver.dbtxnmgr_showlocks
> TestCliDriver.orc_merge10
> TestCliDriver.orc_schema_evolution_float
> TestCliDriver.parquet_ppd_multifiles
> TestCliDriver.schema_evol_par_vec_table_dictionary_encoding
> TestCliDriver.schema_evol_par_vec_table_non_dictionary_encoding
> TestCliDriver.selectindate
> -TestCliDriver.statsoptimizer-
> TestCliDriver.vector_bround
> TestCliDriver.vector_case_when_1
> TestCliDriver.vector_coalesce_2
> TestCliDriver.vector_coalesce_3
> TestCliDriver.vector_interval_1
> TestCliDriver.vectorized_parquet_types
> TestMetastoreVersion.testMetastoreVersion
> TestMetastoreVersion.testVersionMatching
> TestMiniDruidCliDriver.druidkafkamini_basic
> TestMiniLlapCliDriver.llap_smb
> TestMiniLlapCliDriver.unionDistinct_1
> TestMiniTezCliDriver.explainanalyze_5
> -TestNegativeCliDriver.authorization_caseinsensitivity-
> -TestNegativeCliDriver.authorization_fail_1-
> -TestNegativeCliDriver.authorization_grant_table_dup-
> -TestNegativeCliDriver.authorization_role_case-
> -TestNegativeCliDriver.authorization_role_grant_nosuchrole-
> -TestNegativeCliDriver.authorization_table_grant_nosuchrole-
> TestNegativeCliDriver.subquery_subquery_chain
> TestSessionState.testCreatePath
> TestSessionState.testCreatePath
> TestSparkStatistics.testSparkStatistics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19314) Fix failures caused by HIVE-19137

2018-04-26 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-19314:
---


> Fix failures caused by HIVE-19137
> -
>
> Key: HIVE-19314
> URL: https://issues.apache.org/jira/browse/HIVE-19314
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Igor Kryvenko
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19137) orcfiledump doesn't print hive.acid.version value

2018-04-26 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453920#comment-16453920
 ] 

Zoltan Haindrich commented on HIVE-19137:
-

Sure, I've opened HIVE-19314 :)

> orcfiledump doesn't print hive.acid.version value
> -
>
> Key: HIVE-19137
> URL: https://issues.apache.org/jira/browse/HIVE-19137
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Igor Kryvenko
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-19137-branch-3.01.patch, HIVE-19137.01.patch, 
> HIVE-19137.02-branch-3.patch, HIVE-19137.02.patch, 
> HIVE-19137.03-branch-3.patch, HIVE-19137.03.patch, HIVE-19137.04.patch, 
> HIVE-19137.05.patch
>
>
> HIVE-18659 added hive.acid.version in the file footer.  
> orcfiledump prints something like 
> {noformat}
> User Metadata:
>   hive.acid.key.index=1,536870912,1;
>   hive.acid.stats=2,0,0
>   hive.acid.version=
> {noformat}
> probably because
> {noformat}
> public static void setAcidVersionInDataFile(Writer writer) {
>   //so that we know which version wrote the file
>   ByteBuffer bf = ByteBuffer.allocate(4).putInt(ORC_ACID_VERSION);
>   bf.rewind(); //don't ask - some ByteBuffer weridness. w/o this, empty 
> buffer is written
>   writer.addUserMetadata(ACID_VERSION_KEY, bf);
> }
> {noformat}
> use 
> {{UTF8.encode())}} instead



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19319) RuntimeStats fixes

2018-04-26 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-19319:
---


> RuntimeStats fixes
> --
>
> Key: HIVE-19319
> URL: https://issues.apache.org/jira/browse/HIVE-19319
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> there are some minor issues which were found during testing:
> * 0 sized map is persisted
> * if reoptimization occurs write happens twice
> * move entry limit to only apply to the cache
> * ensure that executor not get blocked



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19348) org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp are failing

2018-04-29 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457956#comment-16457956
 ] 

Zoltan Haindrich commented on HIVE-19348:
-

I'm sorry to hear that...althought I've written these tests - they shouldn't be 
failing; these are important cases to enable re-optimization.
This starts looking pretty wrong...there are people who are commiting patches 
which are breaking tests; and people who have written the tests are going after 
those failures
I'm currently unable to bisect this change (because of network issues) - but 
instead of fixing these tests I would really prefer to hand it off to the one 
broken it ; or at least change the driection of if someone opens a conversion 
that "it broke that stuff; but I'm not sure what's wrong with it" - is much 
better...and may get results much faster

I'm currently very close to have a tool which would be able to at least check 
failures like this; and report back to the offending jira after ptest runs - 
about the actually broken/suspicious tests...

I think for branch-3 the patches are currently just pushed...without caring 
much of the state of the branch...


>  org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp are failing
> ---
>
> Key: HIVE-19348
> URL: https://issues.apache.org/jira/browse/HIVE-19348
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vineet Garg
>Assignee: Zoltan Haindrich
>Priority: Major
>
>  * 
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testDifferentFiltersAreNotMatched
>2.7 sec 7
>  * 
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testUnrelatedFiltersAreNotMatched0
>   1.4 sec 7
>  * 
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testUnrelatedFiltersAreNotMatched1
>   1.8 sec 7
> *  
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testSameFiltersMatched 
>  1.8 sec 7
> *  
> org.apache.hadoop.hive.ql.plan.mapping.TestReOptimization.testNotReExecutedIfAssertionError
> {noformat}
> Error Message
> expected:<1> but was:<2>
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19348) org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp are failing

2018-04-29 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457958#comment-16457958
 ] 

Zoltan Haindrich commented on HIVE-19348:
-

After some examinination of the latest patches/etc; I suspect that HIVE-19269 
have broken these tests - and it seems to me that possibly the workload manager 
tests and the jdbc driver is also affected by it.

>  org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp are failing
> ---
>
> Key: HIVE-19348
> URL: https://issues.apache.org/jira/browse/HIVE-19348
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vineet Garg
>Assignee: Zoltan Haindrich
>Priority: Major
>
>  * 
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testDifferentFiltersAreNotMatched
>2.7 sec 7
>  * 
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testUnrelatedFiltersAreNotMatched0
>   1.4 sec 7
>  * 
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testUnrelatedFiltersAreNotMatched1
>   1.8 sec 7
> *  
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testSameFiltersMatched 
>  1.8 sec 7
> *  
> org.apache.hadoop.hive.ql.plan.mapping.TestReOptimization.testNotReExecutedIfAssertionError
> {noformat}
> Error Message
> expected:<1> but was:<2>
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19348) org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp are failing

2018-04-29 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457959#comment-16457959
 ] 

Zoltan Haindrich commented on HIVE-19348:
-

disabling vectorization for retry_failure*.q might not be a good idea...as 
people most probably will have vectorization enabled - most probably it just 
works - if it doesn't; it must be fixed
probably worth checking: TestAcidOnTez and TestDbTxnManager2

>  org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp are failing
> ---
>
> Key: HIVE-19348
> URL: https://issues.apache.org/jira/browse/HIVE-19348
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vineet Garg
>Assignee: Zoltan Haindrich
>Priority: Major
>
>  * 
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testDifferentFiltersAreNotMatched
>2.7 sec 7
>  * 
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testUnrelatedFiltersAreNotMatched0
>   1.4 sec 7
>  * 
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testUnrelatedFiltersAreNotMatched1
>   1.8 sec 7
> *  
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testSameFiltersMatched 
>  1.8 sec 7
> *  
> org.apache.hadoop.hive.ql.plan.mapping.TestReOptimization.testNotReExecutedIfAssertionError
> {noformat}
> Error Message
> expected:<1> but was:<2>
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19357) vectorization breaks assert functionality

2018-04-29 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458029#comment-16458029
 ] 

Zoltan Haindrich commented on HIVE-19357:
-

fyi: [~mmccline]

> vectorization breaks assert functionality
> -
>
> Key: HIVE-19357
> URL: https://issues.apache.org/jira/browse/HIVE-19357
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zoltan Haindrich
>Priority: Major
>
> This could be limited to assert exceptions; but might interfere with other 
> exceptions...discovered while "fixing" testreopt after HIVE-19269
> {code}
> create table tu(id_uv int,id_uw int,u int);
> create table tv(id_uv int,v int);
> create table tw(id_uw int,w int);
> insert into tu values 
> (10,10,10),(1,1,1),(2,2,2),(3,3,3),(4,4,4),(5,5,5),(6,6,6);
> insert into tv values (10,10),(1,1),(2,2),(3,3);
> insert into tw values 
> (10,10),(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9);
> set zzz=0;
> set hive.vectorized.execution.enabled=false;
> select assert_true(${hiveconf:zzz}>sum(1)) from tu join tv on 
> (tu.id_uv=tv.id_uv) where u<10 and v>1;
> -- fails as expected
> set hive.vectorized.execution.enabled=true;
> select assert_true(${hiveconf:zzz}>sum(1)) from tu join tv on 
> (tu.id_uv=tv.id_uv) where u<10 and v>1;
> -- there is a result set
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19348) org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp are failing

2018-04-29 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19348:

Attachment: HIVE-19348.01.patch

>  org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp are failing
> ---
>
> Key: HIVE-19348
> URL: https://issues.apache.org/jira/browse/HIVE-19348
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vineet Garg
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19348.01.patch
>
>
>  * 
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testDifferentFiltersAreNotMatched
>2.7 sec 7
>  * 
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testUnrelatedFiltersAreNotMatched0
>   1.4 sec 7
>  * 
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testUnrelatedFiltersAreNotMatched1
>   1.8 sec 7
> *  
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testSameFiltersMatched 
>  1.8 sec 7
> *  
> org.apache.hadoop.hive.ql.plan.mapping.TestReOptimization.testNotReExecutedIfAssertionError
> {noformat}
> Error Message
> expected:<1> but was:<2>
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19319) RuntimeStats fixes

2018-04-29 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19319:

Status: Patch Available  (was: Open)

> RuntimeStats fixes
> --
>
> Key: HIVE-19319
> URL: https://issues.apache.org/jira/browse/HIVE-19319
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19319.01.patch
>
>
> there are some minor issues which were found during testing:
> * 0 sized map is persisted
> * if reoptimization occurs write happens twice
> * move entry limit to only apply to the cache
> * ensure that executor not get blocked



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19348) org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp are failing

2018-04-29 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19348:

Status: Patch Available  (was: Open)

>  org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp are failing
> ---
>
> Key: HIVE-19348
> URL: https://issues.apache.org/jira/browse/HIVE-19348
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vineet Garg
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19348.01.patch
>
>
>  * 
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testDifferentFiltersAreNotMatched
>2.7 sec 7
>  * 
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testUnrelatedFiltersAreNotMatched0
>   1.4 sec 7
>  * 
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testUnrelatedFiltersAreNotMatched1
>   1.8 sec 7
> *  
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testSameFiltersMatched 
>  1.8 sec 7
> *  
> org.apache.hadoop.hive.ql.plan.mapping.TestReOptimization.testNotReExecutedIfAssertionError
> {noformat}
> Error Message
> expected:<1> but was:<2>
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19319) RuntimeStats fixes

2018-04-29 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19319:

Attachment: HIVE-19319.01.patch

> RuntimeStats fixes
> --
>
> Key: HIVE-19319
> URL: https://issues.apache.org/jira/browse/HIVE-19319
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19319.01.patch
>
>
> there are some minor issues which were found during testing:
> * 0 sized map is persisted
> * if reoptimization occurs write happens twice
> * move entry limit to only apply to the cache
> * ensure that executor not get blocked



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19319) RuntimeStats fixes

2018-04-29 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19319:

Attachment: HIVE-19319.02.patch

> RuntimeStats fixes
> --
>
> Key: HIVE-19319
> URL: https://issues.apache.org/jira/browse/HIVE-19319
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19319.01.patch, HIVE-19319.02.patch
>
>
> there are some minor issues which were found during testing:
> * 0 sized map is persisted
> * if reoptimization occurs write happens twice
> * move entry limit to only apply to the cache
> * ensure that executor not get blocked



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19319) RuntimeStats fixes

2018-04-29 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19319:

Attachment: HIVE-19319.03.patch

> RuntimeStats fixes
> --
>
> Key: HIVE-19319
> URL: https://issues.apache.org/jira/browse/HIVE-19319
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19319.01.patch, HIVE-19319.02.patch, 
> HIVE-19319.03.patch
>
>
> there are some minor issues which were found during testing:
> * 0 sized map is persisted
> * if reoptimization occurs write happens twice
> * move entry limit to only apply to the cache
> * ensure that executor not get blocked



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19319) RuntimeStats fixes

2018-04-29 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458030#comment-16458030
 ] 

Zoltan Haindrich commented on HIVE-19319:
-

03 also contains HIVE-19348 

> RuntimeStats fixes
> --
>
> Key: HIVE-19319
> URL: https://issues.apache.org/jira/browse/HIVE-19319
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19319.01.patch, HIVE-19319.02.patch, 
> HIVE-19319.03.patch
>
>
> there are some minor issues which were found during testing:
> * 0 sized map is persisted
> * if reoptimization occurs write happens twice
> * move entry limit to only apply to the cache
> * ensure that executor not get blocked



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19348) org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp are failing

2018-04-29 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458170#comment-16458170
 ] 

Zoltan Haindrich commented on HIVE-19348:
-

These tests look ok;I think I will update some of the qtests covering these 
parts to check also vectorized functions - as a matter of fact half of these 
cases have failed because they were able to handle vectorization.

For a different vectorization related issue; I've opened a separet ticket.

>  org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp are failing
> ---
>
> Key: HIVE-19348
> URL: https://issues.apache.org/jira/browse/HIVE-19348
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vineet Garg
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19348.01.patch
>
>
>  * 
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testDifferentFiltersAreNotMatched
>2.7 sec 7
>  * 
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testUnrelatedFiltersAreNotMatched0
>   1.4 sec 7
>  * 
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testUnrelatedFiltersAreNotMatched1
>   1.8 sec 7
> *  
> org.apache.hadoop.hive.ql.plan.mapping.TestOperatorCmp.testSameFiltersMatched 
>  1.8 sec 7
> *  
> org.apache.hadoop.hive.ql.plan.mapping.TestReOptimization.testNotReExecutedIfAssertionError
> {noformat}
> Error Message
> expected:<1> but was:<2>
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-18681) use the same jackson library consistently

2018-05-02 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-18681.
-
Resolution: Duplicate

Thank you [~janulatha]!
I'm closing this as a duplicate

> use the same jackson library consistently
> -
>
> Key: HIVE-18681
> URL: https://issues.apache.org/jira/browse/HIVE-18681
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Priority: Major
>
> currently there are uses of both:  org.codehaus.jackson and 
> com.fasterxml.jackson.core inside hive; it would be great to migrate to use 
> the latter.
> more info:
> https://stackoverflow.com/questions/30782706/org-codehaus-jackson-versus-com-fasterxml-jackson-core



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-19314) Fix failures caused by HIVE-19137

2018-05-02 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-19314.
-
Resolution: Resolved

yeah; it seems to be...

> Fix failures caused by HIVE-19137
> -
>
> Key: HIVE-19314
> URL: https://issues.apache.org/jira/browse/HIVE-19314
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Igor Kryvenko
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19096) query result cache interferes with explain analyze

2018-05-02 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461104#comment-16461104
 ] 

Zoltan Haindrich commented on HIVE-19096:
-

+1

> query result cache interferes with explain analyze 
> ---
>
> Key: HIVE-19096
> URL: https://issues.apache.org/jira/browse/HIVE-19096
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-19096.1.patch, HIVE-19096.2.patch
>
>
> if  result cache is active; the explain analyze doesn't really return usefull 
> informations; even for unseen queries the result is like this:
> {code}
> ++
> |Explain |
> ++
> | Stage-0|
> |   Fetch Operator   |
> | Cached Query Result:true,limit:-1  |
> ||
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19096) query result cache interferes with explain analyze

2018-05-02 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461108#comment-16461108
 ] 

Zoltan Haindrich commented on HIVE-19096:
-

requeued for test execution

> query result cache interferes with explain analyze 
> ---
>
> Key: HIVE-19096
> URL: https://issues.apache.org/jira/browse/HIVE-19096
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-19096.1.patch, HIVE-19096.2.patch
>
>
> if  result cache is active; the explain analyze doesn't really return usefull 
> informations; even for unseen queries the result is like this:
> {code}
> ++
> |Explain |
> ++
> | Stage-0|
> |   Fetch Operator   |
> | Cached Query Result:true,limit:-1  |
> ||
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-19238) ClassCastException: StandardStructObjectInspector cannot be cast to PrimitiveObjectInspector

2018-05-02 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-19238.
-
Resolution: Cannot Reproduce

This might get fixed along the way; I currently use a more recent version - and 
this problem doesn't happen anymore.

> ClassCastException: StandardStructObjectInspector cannot be cast to 
> PrimitiveObjectInspector
> 
>
> Key: HIVE-19238
> URL: https://issues.apache.org/jira/browse/HIVE-19238
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> while running tpcds#28 ; on a ~2 week old master:
> {code}
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: ClassCastException 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector 
> cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:308)
>  ~[hive-service-3.0.0.3.0.0.0-1075.jar:3.0.0.3.0.0.0-1075]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:196)
>  ~[hive-service-3.0.0.3.0.0.0-1075.jar:3.0.0.3.0.0.0-1075]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:257)
>  ~[hive-service-3.0.0.3.0.0.0-1075.jar:3.0.0.3.0.0.0-1075]
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:243) 
> ~[hive-service-3.0.0.3.0.0.0-1075.jar:3.0.0.3.0.0.0-1075]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541)
>  ~[hive-service-3.0.0.3.0.0.0-1075.jar:3.0.0.3.0.0.0-1075]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527)
>  ~[hive-service-3.0.0.3.0.0.0-1075.jar:3.0.0.3.0.0.0-1075]
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:311)
>  ~[hive-service-3.0.0.3.0.0.0-1075.jar:3.0.0.3.0.0.0-1075]
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:564)
>  [hive-service-3.0.0.3.0.0.0-1075.jar:3.0.0.3.0.0.0-1075]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
>  [hive-exec-3.0.0.3.0.0.0-1075.jar:3.0.0.3.0.0.0-1075]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
>  [hive-exec-3.0.0.3.0.0.0-1075.jar:3.0.0.3.0.0.0-1075]
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> [hive-exec-3.0.0.3.0.0.0-1075.jar:3.0.0.3.0.0.0-1075]
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> [hive-exec-3.0.0.3.0.0.0-1075.jar:3.0.0.3.0.0.0-1075]
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>  [hive-service-3.0.0.3.0.0.0-1075.jar:3.0.0.3.0.0.0-1075]
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  [hive-exec-3.0.0.3.0.0.0-1075.jar:3.0.0.3.0.0.0-1075]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_112]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_112]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19357) Vectorization: assert_true HiveException erroneously gets suppressed to NULL

2018-05-02 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461194#comment-16461194
 ] 

Zoltan Haindrich commented on HIVE-19357:
-

I'm not sure if this needs to be configurable or not; since enabling the flag 
may suppress sanity checks and other valid HiveExceptions...
but if it's only for keeping backward compaibility: +1
I've requeued the patch for hiveqa

> Vectorization: assert_true HiveException erroneously gets suppressed to NULL
> 
>
> Key: HIVE-19357
> URL: https://issues.apache.org/jira/browse/HIVE-19357
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Zoltan Haindrich
>Assignee: Matt McCline
>Priority: Major
> Attachments: HIVE-19357.01.patch, HIVE-19357.02.patch
>
>
> This could be limited to assert exceptions; but might interfere with other 
> exceptions...discovered while "fixing" testreopt after HIVE-19269
> {code}
> create table tu(id_uv int,id_uw int,u int);
> create table tv(id_uv int,v int);
> create table tw(id_uw int,w int);
> insert into tu values 
> (10,10,10),(1,1,1),(2,2,2),(3,3,3),(4,4,4),(5,5,5),(6,6,6);
> insert into tv values (10,10),(1,1),(2,2),(3,3);
> insert into tw values 
> (10,10),(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9);
> set zzz=0;
> set hive.vectorized.execution.enabled=false;
> select assert_true(${hiveconf:zzz}>sum(1)) from tu join tv on 
> (tu.id_uv=tv.id_uv) where u<10 and v>1;
> -- fails as expected
> set hive.vectorized.execution.enabled=true;
> select assert_true(${hiveconf:zzz}>sum(1)) from tu join tv on 
> (tu.id_uv=tv.id_uv) where u<10 and v>1;
> -- there is a result set
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19396) HiveOperation is incorrectly set for analyze statement

2018-05-02 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461937#comment-16461937
 ] 

Zoltan Haindrich commented on HIVE-19396:
-

+1 pending tests

> HiveOperation is incorrectly set for analyze statement
> --
>
> Key: HIVE-19396
> URL: https://issues.apache.org/jira/browse/HIVE-19396
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Major
> Attachments: HIVE-19396.patch
>
>
> Because we rewrite analyze to select compute_stats() operation enum gets set 
> to Query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19326) union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats

2018-05-03 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-19326:
---

Assignee: Zoltan Haindrich  (was: Ashutosh Chauhan)
Target Version/s: 3.0.0, 3.1.0
 Component/s: Statistics

This issue is present on the current master; and indeed: it causes incorrect 
results when {{hive.optimize.metadataonly}} is enabled.
Good eye [~sershe]; it seems its like this since 2015 :D

> union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats
> -
>
> Key: HIVE-19326
> URL: https://issues.apache.org/jira/browse/HIVE-19326
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sergey Shelukhin
>Assignee: Zoltan Haindrich
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Found when investigating the results change after converting tables to MM, 
> turns out the MM result is correct but the current one is not.
> The test ends like so:
> {noformat}
> desc formatted small_alltypesorc_a;
> ANALYZE TABLE small_alltypesorc_a COMPUTE STATISTICS;
> desc formatted small_alltypesorc_a;
> insert into table small_alltypesorc_a select * from small_alltypesorc1a;
> desc formatted small_alltypesorc_a;
> {noformat}
> The results from the descs in the golden file are:
> {noformat}
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 5   
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 15
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles2   
>   numRows 20  
> {noformat}
> Note the result change after analyze - the original nomRows is inaccurate, 
> but  BASIC_STATS is set to true.
> I am assuming with metadata only optimization this can produce incorrect 
> results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19402) Handle explain analyze for reoptimization

2018-05-03 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-19402:
---


> Handle explain analyze for reoptimization
> -
>
> Key: HIVE-19402
> URL: https://issues.apache.org/jira/browse/HIVE-19402
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> This might also enable to remove "explain reoptimization"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19326) union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats (incorrect query results possible)

2018-05-04 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463877#comment-16463877
 ] 

Zoltan Haindrich commented on HIVE-19326:
-

actually this seems to be 2 issues:

* there was an issue which arised from the fact that all the rs-es have 
overwritten eachothers output - I've fixed this..and now ctas for unions work 
as expected
* if {{hive.merge.tezfiles}} is enabled ; somehow the counters also get merged 
or something...I'm not sure if this is enabled in production environments or 
not...I'll take another look

I think {{hive.optimize.metadataonly}} should be enabled for all tests...or at 
least check that all tests would pass with it...

> union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats 
> (incorrect query results possible)
> 
>
> Key: HIVE-19326
> URL: https://issues.apache.org/jira/browse/HIVE-19326
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sergey Shelukhin
>Assignee: Zoltan Haindrich
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Found when investigating the results change after converting tables to MM, 
> turns out the MM result is correct but the current one is not.
> The test ends like so:
> {noformat}
> desc formatted small_alltypesorc_a;
> ANALYZE TABLE small_alltypesorc_a COMPUTE STATISTICS;
> desc formatted small_alltypesorc_a;
> insert into table small_alltypesorc_a select * from small_alltypesorc1a;
> desc formatted small_alltypesorc_a;
> {noformat}
> The results from the descs in the golden file are:
> {noformat}
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 5   
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 15
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles2   
>   numRows 20  
> {noformat}
> Note the result change after analyze - the original nomRows is inaccurate, 
> but  BASIC_STATS is set to true.
> I am assuming with metadata only optimization this can produce incorrect 
> results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19225) Class cast exception while running certain queries with UDAF like rank on internal struct columns

2018-05-07 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16465609#comment-16465609
 ] 

Zoltan Haindrich commented on HIVE-19225:
-

[~amrk7] could you share a query which triggers this?
because I think the same problem might get triggered in different cases: I'm 
currently thinking about Map/Union...

> Class cast exception while running certain queries with UDAF like rank on 
> internal struct columns
> -
>
> Key: HIVE-19225
> URL: https://issues.apache.org/jira/browse/HIVE-19225
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.2
>Reporter: Amruth S
>Assignee: Amruth S
>Priority: Major
> Attachments: HIVE-19225.patch
>
>
> Certain queries with rank function is causing class cast exception.
> {noformat}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct cannot be cast to 
> org.apache.hadoop.hive.serde2.io.TimestampWritable
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.getPrimitiveJavaObject(WritableTimestampObjectInspector.java:39)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.getPrimitiveJavaObject(WritableTimestampObjectInspector.java:25)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:412)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRank.copyToStandardObject(GenericUDAFRank.java:219)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFRank$GenericUDAFAbstractRankEvaluator.iterate(GenericUDAFRank.java:153)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:192)
>   at 
> org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.processRow(WindowingTableFunction.java:407)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.processRow(PTFOperator.java:325)
>   at 
> org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:139)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:236)
>   ... 7 more
> 2018-03-29 09:28:43,432 INFO [main] org.apache.hadoop.mapred.Task: Runnning 
> cleanup for the task
> {noformat}
> The following changes fixes this.
> The evaluator seem to skip the case where the primary obj emitted is struct. 
> Modified the code to find the field inside struct
> {code:java}
> diff --git 
> a/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/StandardStructObjectInspector.java
>  
> b/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/StandardStructObjectInspector.java
> index 36a500790a..e7731e99d7 100644
> --- 
> a/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/StandardStructObjectInspector.java
> +++ 
> b/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/StandardStructObjectInspector.java
> @@ -22,6 +22,7 @@
> import java.util.Arrays;
> import java.util.List;
> +import org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct;
> import org.slf4j.Logger;
> import org.slf4j.LoggerFactory;
> @@ -171,6 +172,10 @@ public Object getStructFieldData(Object data, 
> StructField fieldRef) {
> // so we have to do differently.
> boolean isArray = data.getClass().isArray();
> if (!isArray && !(data instanceof List)) {
> + if (data instanceof LazyBinaryStruct
> + && fieldRef.getFieldObjectInspector().getCategory() == Category.PRIMITIVE) {
> + return ((LazyBinaryStruct) data).getField(((MyField) fieldRef).fieldID);
> + }
> if (!warned) {
> LOG.warn("Invalid type for struct " + data.getClass());
> LOG.warn("ignoring similar errors.");
> {code}
> Let me know your thoughts



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-12342) Set default value of hive.optimize.index.filter to true

2018-05-08 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467602#comment-16467602
 ] 

Zoltan Haindrich commented on HIVE-12342:
-

[~ikryvenko] I see that this change has a lot of "boring" diffs...I've a tool 
which can be used to "classify" q.out changes programmatically and apply 
them...and might help you focus on the more relevant cases 
https://github.com/kgyrtkirk/hive-toolbox

it can process ptest results with something like:
{code}
build/install/toolbox/bin/toolbox 
http://104.198.109.242/logs/PreCommit-HIVE-Build-10764/test-results.tar.gz
{code}

> Set default value of hive.optimize.index.filter to true
> ---
>
> Key: HIVE-12342
> URL: https://issues.apache.org/jira/browse/HIVE-12342
> Project: Hive
>  Issue Type: Task
>  Components: Configuration
>Reporter: Ashutosh Chauhan
>Assignee: Igor Kryvenko
>Priority: Major
> Attachments: HIVE-12342.05.patch, HIVE-12342.06.patch, 
> HIVE-12342.07.patch, HIVE-12342.08.patch, HIVE-12342.09.patch, 
> HIVE-12342.1.patch, HIVE-12342.2.patch, HIVE-12342.3.patch, 
> HIVE-12342.4.patch, HIVE-12342.patch
>
>
> This configuration governs ppd for storage layer. When applicable, it will 
> always help. It should be on by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19460) Improve stats estimations for NOT IN operator

2018-05-08 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-19460:
---

Assignee: Zoltan Haindrich

> Improve stats estimations for NOT IN operator
> -
>
> Key: HIVE-19460
> URL: https://issues.apache.org/jira/browse/HIVE-19460
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19166) TestMiniLlapLocalCliDriver sysdb failure

2018-05-09 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19166:

Attachment: HIVE-19166.05.patch

> TestMiniLlapLocalCliDriver sysdb failure
> 
>
> Key: HIVE-19166
> URL: https://issues.apache.org/jira/browse/HIVE-19166
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Vineet Garg
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19166.04.patch, HIVE-19166.05.patch, 
> HIVE-19166.1.patch, HIVE-19166.2.patch, HIVE-19166.3.patch
>
>
> Broken by HIVE-18715



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19097) related equals and in operators may cause inaccurate stats estimations

2018-05-10 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470235#comment-16470235
 ] 

Zoltan Haindrich commented on HIVE-19097:
-

I've re-checked the current state of this fix:
 * for branch-2
 ** I would recommend to stay with the "localReferencedColumns" approach what 
I've started in patch 01 (without that small refactor)
 * for branch-3/master
 ** CALCITE-2247 could enable the simplification of (a=1 7& (a=2 || a=1)) to 
(a=1)
 ** but since Calcite is *always* breaking up "IN" clauses during sql parsing;
   Hive would need a facility to do that; I would like to suggest to add an IN 
opener to {{HivePointLookupOptimizerRule}}; since that rule already has an IN 
closer which closes large ORs.
[link to my current wip 
tree|https://github.com/apache/hive/compare/master...kgyrtkirk:HIVE-19097-related-equals#diff-85205e52a019624957e9ce940ff7cbeeR145]
 to add the in opener to hive

[~jcamachorodriguez], [~ashutoshc]: what do you think about the above?

> related equals and in operators may cause inaccurate stats estimations
> --
>
> Key: HIVE-19097
> URL: https://issues.apache.org/jira/browse/HIVE-19097
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19097.01.patch, HIVE-19097.partial.patch
>
>
> tpcds#74 is optimized in a way that for date_dim the condition contains IN 
> and = for the same column
> {code:java}
> | Map Operator Tree: |
> | TableScan  |
> |   alias: date_dim  |
> |   filterExpr: (((d_year) IN (2001, 2002) and (d_year = 
> 2002) and d_date_sk is not null) or ((d_year) IN (2001, 2002) and (d_year = 
> 2001) and d_date_sk is not null)) (type: boolean) |
> |   Statistics: Num rows: 73049 Data size: 876588 Basic 
> stats: COMPLETE Column stats: COMPLETE |
> |   Filter Operator  |
> | predicate: ((d_year) IN (2001, 2002) and (d_year = 
> 2002) and d_date_sk is not null) (type: boolean) |
> | Statistics: Num rows: 4 Data size: 48 Basic stats: 
> COMPLETE Column stats: COMPLETE |
> {code}
> the "real" row count will be 365
> for separate {{IN}} and {{=}} the estimation is very good; but if both are 
> present it becomes (very) underestimated.
> {code:java}
> set hive.query.results.cache.enabled=false;
> drop table if exists t1;
> drop table if exists t8;
> create table t1 (a integer,b integer);
> create table t8 like t1;
> insert into t1 values (1,1),(2,2),(3,3),(4,4),(5,5);
> insert into t8
> select * from t1 union all select * from t1 union all select * from t1 union 
> all select * from t1 union all
> select * from t1 union all select * from t1 union all select * from t1 union 
> all select * from t1
> ;
> analyze table t1 compute statistics for columns;
> analyze table t8 compute statistics for columns;
> explain analyze select sum(a) from t8 where b in (2,3) group by b;
> explain analyze select sum(a) from t8 where b=2 group by b;
> explain analyze select sum(a) from t1 where b in (2,3) and b=2 group by b;
> explain analyze select sum(a) from t8 where b in (2,3) and b=2 group by b;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19460) Improve stats estimations for NOT IN operator

2018-05-10 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19460:

Attachment: HIVE-19460.01wip01.patch

> Improve stats estimations for NOT IN operator
> -
>
> Key: HIVE-19460
> URL: https://issues.apache.org/jira/browse/HIVE-19460
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19460.01wip01.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19460) Improve stats estimations for NOT IN operator

2018-05-10 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19460:

Status: Patch Available  (was: Open)

> Improve stats estimations for NOT IN operator
> -
>
> Key: HIVE-19460
> URL: https://issues.apache.org/jira/browse/HIVE-19460
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19460.01wip01.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19460) Improve stats estimations for NOT IN operator

2018-05-10 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19460:

Attachment: HIVE-19460.01wip02.patch

> Improve stats estimations for NOT IN operator
> -
>
> Key: HIVE-19460
> URL: https://issues.apache.org/jira/browse/HIVE-19460
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19460.01wip01.patch, HIVE-19460.01wip02.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19166) TestMiniLlapLocalCliDriver sysdb failure

2018-05-11 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471750#comment-16471750
 ] 

Zoltan Haindrich commented on HIVE-19166:
-

it's funny that someone always rewrites sysdb.q with some different 
crap...first HIVE-18910 now HIVE-19448...
I'll wonder if there will be 1 more change before a successfull ptest run for 
this ticket...
the queue is just 32 today! maybe I get the an answer by monday!? :)


> TestMiniLlapLocalCliDriver sysdb failure
> 
>
> Key: HIVE-19166
> URL: https://issues.apache.org/jira/browse/HIVE-19166
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Vineet Garg
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19166.04.patch, HIVE-19166.05.patch, 
> HIVE-19166.1.patch, HIVE-19166.2.patch, HIVE-19166.3.patch
>
>
> Broken by HIVE-18715



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19166) TestMiniLlapLocalCliDriver sysdb failure

2018-05-11 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19166:

Attachment: HIVE-19166.06.patch

> TestMiniLlapLocalCliDriver sysdb failure
> 
>
> Key: HIVE-19166
> URL: https://issues.apache.org/jira/browse/HIVE-19166
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Vineet Garg
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19166.04.patch, HIVE-19166.05.patch, 
> HIVE-19166.06.patch, HIVE-19166.1.patch, HIVE-19166.2.patch, 
> HIVE-19166.3.patch
>
>
> Broken by HIVE-18715



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19489) Disable stats autogather for external tables

2018-05-11 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471762#comment-16471762
 ] 

Zoltan Haindrich commented on HIVE-19489:
-

I'm not sure if we should disable it globally; but there could be an option to 
do that - I think it would be probably be usefull to have a table level option 
to prevent it from happening on specific tables. Without statistics the planner 
will start operating in blind: I think fs level stats are not really good; auto 
gathering may also collect column stats which could be very usefull during 
estimations.
afaik auto gathering should not happen during LOAD DATA statements
cc: [~ashutoshc]

> Disable stats autogather for external tables
> 
>
> Key: HIVE-19489
> URL: https://issues.apache.org/jira/browse/HIVE-19489
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
>
> Hive auto-gather of table statistics can result in incorrect generation of 
> stats (and the stats being marked as accurate) in the case of external tables 
> where the data is being written by external apps.
> To avoid this issue, stats autogather will be disabled on external tables 
> when loading/inserting into a table with existing data, if 
> HIVE_DISABLE_UNSAFE_EXTERNALTABLE_OPERATIONS is enabled. In this situation, 
> users should rely on explicitly calling ANALYZE TABLE on their external 
> tables to make sure the stats are kept up-to-date.
> Autogather of stats will still be allowed to occur on external tables in the 
> case of INSERT OVERWRITE or LOAD DATA OVERWRITE, since the existing data is 
> being removed and so the stats calculated on the inserted/loaded data should 
> be accurate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-19500) Prevent multiple selectivity estimations for the same variable in conjuctions

2018-05-11 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-19500:
---


> Prevent multiple selectivity estimations for the same variable in conjuctions
> -
>
> Key: HIVE-19500
> URL: https://issues.apache.org/jira/browse/HIVE-19500
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> see HIVE-19097 for problem description
> for filters like: {{(d_year in (2001,2002) and d_year = 2001)}} the current 
> estimation is around {{(1/NDV)**2}} (iff column stats are available) 
> this patch targets on branch-2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19468) Add Apache license to TestTxnConcatenate

2018-05-11 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471786#comment-16471786
 ] 

Zoltan Haindrich commented on HIVE-19468:
-

+1

> Add Apache license to TestTxnConcatenate
> 
>
> Key: HIVE-19468
> URL: https://issues.apache.org/jira/browse/HIVE-19468
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Igor Kryvenko
>Assignee: Igor Kryvenko
>Priority: Major
> Attachments: HIVE-19468.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-13745) UDF current_date、current_timestamp、unix_timestamp NPE

2018-05-11 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471793#comment-16471793
 ] 

Zoltan Haindrich commented on HIVE-13745:
-

I don't think {{System.currentTimeMillis}} should be called from HiveConf...

I don't understand what positive side this could have to have it in the config; 
if its ok to get it at the time the UDF is constructed.
I think it would be better to set a "pirate" property in Driver at the start of 
the query execution and use that ; we already have a query timestamp for the 
session Session

> UDF current_date、current_timestamp、unix_timestamp NPE
> -
>
> Key: HIVE-13745
> URL: https://issues.apache.org/jira/browse/HIVE-13745
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Biao Wu
>Assignee: Biao Wu
>Priority: Major
> Attachments: HIVE-13745.1.patch, HIVE-13745.2-brach-2.patch, 
> HIVE-13745.patch
>
>
> NullPointerException when current_date is used in mapreduce



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-13745) UDF current_date、current_timestamp、unix_timestamp NPE

2018-05-11 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471800#comment-16471800
 ] 

Zoltan Haindrich commented on HIVE-13745:
-

[~ychena]: [~bill] have last commented on this ticket about 2 years ago; please 
assign it to yourself if you are working on it, and ask for review from someone 
else before commiting changes.

> UDF current_date、current_timestamp、unix_timestamp NPE
> -
>
> Key: HIVE-13745
> URL: https://issues.apache.org/jira/browse/HIVE-13745
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Biao Wu
>Assignee: Biao Wu
>Priority: Major
> Attachments: HIVE-13745.1.patch, HIVE-13745.2-brach-2.patch, 
> HIVE-13745.patch
>
>
> NullPointerException when current_date is used in mapreduce



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19468) Add Apache license to TestTxnConcatenate

2018-05-11 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19468:

   Resolution: Fixed
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

pushed to master. Thank you [~ikryvenko] for fixing this!

> Add Apache license to TestTxnConcatenate
> 
>
> Key: HIVE-19468
> URL: https://issues.apache.org/jira/browse/HIVE-19468
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Igor Kryvenko
>Assignee: Igor Kryvenko
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-19468.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19501) Fix HyperLogLog to be threadsafe

2018-05-11 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471867#comment-16471867
 ] 

Zoltan Haindrich commented on HIVE-19501:
-

also note that most probably the addShort / etc methods are unused

> Fix HyperLogLog to be threadsafe
> 
>
> Key: HIVE-19501
> URL: https://issues.apache.org/jira/browse/HIVE-19501
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> not sure if this is an issue in reality or not; but there are 3 static fields 
> in HyperLogLog which are rewritten during working; if there are multiple 
> threads are calculating HLL in the same JVM, there is a theoretical chance 
> that they might overwrite eachothers value...
> static fields:
> https://github.com/apache/hive/blob/8028ce8a4cf5a03e2998c33e032a511fae770b47/standalone-metastore/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java#L65
> usage:
> https://github.com/apache/hive/blob/8028ce8a4cf5a03e2998c33e032a511fae770b47/standalone-metastore/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java#L216



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19460) Improve stats estimations for NOT IN operator

2018-05-11 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19460:

Attachment: HIVE-19460.01wip03.patch

> Improve stats estimations for NOT IN operator
> -
>
> Key: HIVE-19460
> URL: https://issues.apache.org/jira/browse/HIVE-19460
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19460.01wip01.patch, HIVE-19460.01wip02.patch, 
> HIVE-19460.01wip03.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19500) Prevent multiple selectivity estimations for the same variable in conjuctions

2018-05-11 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19500:

Attachment: HIVE-19500.01.patch

> Prevent multiple selectivity estimations for the same variable in conjuctions
> -
>
> Key: HIVE-19500
> URL: https://issues.apache.org/jira/browse/HIVE-19500
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19500.01.patch
>
>
> see HIVE-19097 for problem description
> for filters like: {{(d_year in (2001,2002) and d_year = 2001)}} the current 
> estimation is around {{(1/NDV)**2}} (iff column stats are available) 
> this patch targets on branch-2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19500) Prevent multiple selectivity estimations for the same variable in conjuctions

2018-05-11 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19500:

Affects Version/s: 3.1.0
   3.0.0
 Target Version/s: 3.0.0, 3.1.0  (was: 2.3.2)
  Description: 
see HIVE-19097 for problem description

for filters like: {{(d_year in (2001,2002) and d_year = 2001)}} the current 
estimation is around {{(1/NDV)**2}} (iff column stats are available) 

actually the source of the problem was a small typo in HIVE-17465 

  was:
see HIVE-19097 for problem description

for filters like: {{(d_year in (2001,2002) and d_year = 2001)}} the current 
estimation is around {{(1/NDV)**2}} (iff column stats are available) 

this patch targets on branch-2


> Prevent multiple selectivity estimations for the same variable in conjuctions
> -
>
> Key: HIVE-19500
> URL: https://issues.apache.org/jira/browse/HIVE-19500
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19500.01.patch
>
>
> see HIVE-19097 for problem description
> for filters like: {{(d_year in (2001,2002) and d_year = 2001)}} the current 
> estimation is around {{(1/NDV)**2}} (iff column stats are available) 
> actually the source of the problem was a small typo in HIVE-17465 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19500) Prevent multiple selectivity estimations for the same variable in conjuctions

2018-05-11 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19500:

Status: Patch Available  (was: Open)

> Prevent multiple selectivity estimations for the same variable in conjuctions
> -
>
> Key: HIVE-19500
> URL: https://issues.apache.org/jira/browse/HIVE-19500
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19500.01.patch
>
>
> see HIVE-19097 for problem description
> for filters like: {{(d_year in (2001,2002) and d_year = 2001)}} the current 
> estimation is around {{(1/NDV)**2}} (iff column stats are available) 
> actually the source of the problem was a small typo in HIVE-17465 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19500) Prevent multiple selectivity estimations for the same variable in conjuctions

2018-05-11 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472350#comment-16472350
 ] 

Zoltan Haindrich commented on HIVE-19500:
-

[~vgarg] I'm not sure if this was a typo in HIVE-17465 ; or it was part of the 
intended change

> Prevent multiple selectivity estimations for the same variable in conjuctions
> -
>
> Key: HIVE-19500
> URL: https://issues.apache.org/jira/browse/HIVE-19500
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19500.01.patch
>
>
> see HIVE-19097 for problem description
> for filters like: {{(d_year in (2001,2002) and d_year = 2001)}} the current 
> estimation is around {{(1/NDV)**2}} (iff column stats are available) 
> actually the source of the problem was a small typo in HIVE-17465 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19326) union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats (incorrect query results possible)

2018-05-11 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472355#comment-16472355
 ] 

Zoltan Haindrich commented on HIVE-19326:
-

no, this is a stats optimizer bug

> union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats 
> (incorrect query results possible)
> 
>
> Key: HIVE-19326
> URL: https://issues.apache.org/jira/browse/HIVE-19326
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sergey Shelukhin
>Assignee: Zoltan Haindrich
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Found when investigating the results change after converting tables to MM, 
> turns out the MM result is correct but the current one is not.
> The test ends like so:
> {noformat}
> desc formatted small_alltypesorc_a;
> ANALYZE TABLE small_alltypesorc_a COMPUTE STATISTICS;
> desc formatted small_alltypesorc_a;
> insert into table small_alltypesorc_a select * from small_alltypesorc1a;
> desc formatted small_alltypesorc_a;
> {noformat}
> The results from the descs in the golden file are:
> {noformat}
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 5   
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 15
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles2   
>   numRows 20  
> {noformat}
> Note the result change after analyze - the original nomRows is inaccurate, 
> but  BASIC_STATS is set to true.
> I am assuming with metadata only optimization this can produce incorrect 
> results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19326) union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats (incorrect query results possible)

2018-05-11 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472400#comment-16472400
 ] 

Zoltan Haindrich commented on HIVE-19326:
-

I've a "half-fix": it's fixed in most cases, but if {{hive.merge.tezfiles}} is 
enabled the problem still occurs.


> union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats 
> (incorrect query results possible)
> 
>
> Key: HIVE-19326
> URL: https://issues.apache.org/jira/browse/HIVE-19326
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sergey Shelukhin
>Assignee: Zoltan Haindrich
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Found when investigating the results change after converting tables to MM, 
> turns out the MM result is correct but the current one is not.
> The test ends like so:
> {noformat}
> desc formatted small_alltypesorc_a;
> ANALYZE TABLE small_alltypesorc_a COMPUTE STATISTICS;
> desc formatted small_alltypesorc_a;
> insert into table small_alltypesorc_a select * from small_alltypesorc1a;
> desc formatted small_alltypesorc_a;
> {noformat}
> The results from the descs in the golden file are:
> {noformat}
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 5   
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 15
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles2   
>   numRows 20  
> {noformat}
> Note the result change after analyze - the original nomRows is inaccurate, 
> but  BASIC_STATS is set to true.
> I am assuming with metadata only optimization this can produce incorrect 
> results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19326) union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats (incorrect query results possible)

2018-05-11 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19326:

Attachment: HIVE-19326.01wip01.patch

> union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats 
> (incorrect query results possible)
> 
>
> Key: HIVE-19326
> URL: https://issues.apache.org/jira/browse/HIVE-19326
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sergey Shelukhin
>Assignee: Zoltan Haindrich
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HIVE-19326.01wip01.patch
>
>
> Found when investigating the results change after converting tables to MM, 
> turns out the MM result is correct but the current one is not.
> The test ends like so:
> {noformat}
> desc formatted small_alltypesorc_a;
> ANALYZE TABLE small_alltypesorc_a COMPUTE STATISTICS;
> desc formatted small_alltypesorc_a;
> insert into table small_alltypesorc_a select * from small_alltypesorc1a;
> desc formatted small_alltypesorc_a;
> {noformat}
> The results from the descs in the golden file are:
> {noformat}
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 5   
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 15
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles2   
>   numRows 20  
> {noformat}
> Note the result change after analyze - the original nomRows is inaccurate, 
> but  BASIC_STATS is set to true.
> I am assuming with metadata only optimization this can produce incorrect 
> results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19326) union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats (incorrect query results possible)

2018-05-11 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19326:

Status: Patch Available  (was: Open)

> union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats 
> (incorrect query results possible)
> 
>
> Key: HIVE-19326
> URL: https://issues.apache.org/jira/browse/HIVE-19326
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sergey Shelukhin
>Assignee: Zoltan Haindrich
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HIVE-19326.01wip01.patch
>
>
> Found when investigating the results change after converting tables to MM, 
> turns out the MM result is correct but the current one is not.
> The test ends like so:
> {noformat}
> desc formatted small_alltypesorc_a;
> ANALYZE TABLE small_alltypesorc_a COMPUTE STATISTICS;
> desc formatted small_alltypesorc_a;
> insert into table small_alltypesorc_a select * from small_alltypesorc1a;
> desc formatted small_alltypesorc_a;
> {noformat}
> The results from the descs in the golden file are:
> {noformat}
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 5   
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 15
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles2   
>   numRows 20  
> {noformat}
> Note the result change after analyze - the original nomRows is inaccurate, 
> but  BASIC_STATS is set to true.
> I am assuming with metadata only optimization this can produce incorrect 
> results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19326) union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats (incorrect query results possible)

2018-05-11 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472436#comment-16472436
 ] 

Zoltan Haindrich commented on HIVE-19326:
-

well..I think we are better of with it; then without it :)

> union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats 
> (incorrect query results possible)
> 
>
> Key: HIVE-19326
> URL: https://issues.apache.org/jira/browse/HIVE-19326
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sergey Shelukhin
>Assignee: Zoltan Haindrich
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HIVE-19326.01wip01.patch
>
>
> Found when investigating the results change after converting tables to MM, 
> turns out the MM result is correct but the current one is not.
> The test ends like so:
> {noformat}
> desc formatted small_alltypesorc_a;
> ANALYZE TABLE small_alltypesorc_a COMPUTE STATISTICS;
> desc formatted small_alltypesorc_a;
> insert into table small_alltypesorc_a select * from small_alltypesorc1a;
> desc formatted small_alltypesorc_a;
> {noformat}
> The results from the descs in the golden file are:
> {noformat}
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 5   
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 15
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles2   
>   numRows 20  
> {noformat}
> Note the result change after analyze - the original nomRows is inaccurate, 
> but  BASIC_STATS is set to true.
> I am assuming with metadata only optimization this can produce incorrect 
> results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-13745) UDF current_date、current_timestamp、unix_timestamp NPE

2018-05-14 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474074#comment-16474074
 ] 

Zoltan Haindrich commented on HIVE-13745:
-

I don't think this patch would make any affect on the current master or 
branch-3 (probably branch-2 also, but not sure) -- since afaik the udf call 
will be flattened at compile time - because of its nature (0 argument udf 
marked a "runtimeConstant");
[~ychena]: Could you provide a tests case which is outside of that scope?

> UDF current_date、current_timestamp、unix_timestamp NPE
> -
>
> Key: HIVE-13745
> URL: https://issues.apache.org/jira/browse/HIVE-13745
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Biao Wu
>Assignee: Yongzhi Chen
>Priority: Major
> Attachments: HIVE-13745.1.patch, HIVE-13745.2-branch-2.patch, 
> HIVE-13745.patch
>
>
> NullPointerException when current_date is used in mapreduce



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19501) Fix HyperLogLog to be threadsafe

2018-05-14 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474085#comment-16474085
 ] 

Zoltan Haindrich commented on HIVE-19501:
-

I think adding sync-s would probably slow things even more down; Gopal's wip 
patch in HIVE-18866 also removes these fields

> Fix HyperLogLog to be threadsafe
> 
>
> Key: HIVE-19501
> URL: https://issues.apache.org/jira/browse/HIVE-19501
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: HIVE-19501.01.patch
>
>
> not sure if this is an issue in reality or not; but there are 3 static fields 
> in HyperLogLog which are rewritten during working; if there are multiple 
> threads are calculating HLL in the same JVM, there is a theoretical chance 
> that they might overwrite eachothers value...
> static fields:
> https://github.com/apache/hive/blob/8028ce8a4cf5a03e2998c33e032a511fae770b47/standalone-metastore/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java#L65
> usage:
> https://github.com/apache/hive/blob/8028ce8a4cf5a03e2998c33e032a511fae770b47/standalone-metastore/src/main/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java#L216



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-13745) UDF current_date、current_timestamp、unix_timestamp NPE

2018-05-14 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474176#comment-16474176
 ] 

Zoltan Haindrich commented on HIVE-13745:
-

Could you please formulate the situation in which the NPE occurs in a qtest?

> UDF current_date、current_timestamp、unix_timestamp NPE
> -
>
> Key: HIVE-13745
> URL: https://issues.apache.org/jira/browse/HIVE-13745
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Biao Wu
>Assignee: Yongzhi Chen
>Priority: Major
> Attachments: HIVE-13745.1.patch, HIVE-13745.2-branch-2.patch, 
> HIVE-13745.patch
>
>
> NullPointerException when current_date is used in mapreduce



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19159) TestMTQueries.testMTQueries1 failure

2018-05-14 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19159:

   Resolution: Fixed
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

TestMTQueries passing with the modifications;
pushed to master, branch-3. Thank you [~abstractdog] for fixing it!

> TestMTQueries.testMTQueries1 failure
> 
>
> Key: HIVE-19159
> URL: https://issues.apache.org/jira/browse/HIVE-19159
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Reporter: Vineet Garg
>Assignee: Laszlo Bodor
>Priority: Blocker
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-19159.01.patch, HIVE-19159.02-branch-3.patch, 
> HIVE-19159.02.patch, HIVE-19159.03-branch-3.patch, HIVE-19159.03.patch
>
>
> I have confirmed that HIVE-18051 caused this failure



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19326) union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats (incorrect query results possible)

2018-05-14 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19326:

Attachment: HIVE-19326.02.patch

> union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats 
> (incorrect query results possible)
> 
>
> Key: HIVE-19326
> URL: https://issues.apache.org/jira/browse/HIVE-19326
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sergey Shelukhin
>Assignee: Zoltan Haindrich
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HIVE-19326.01wip01.patch, HIVE-19326.02.patch
>
>
> Found when investigating the results change after converting tables to MM, 
> turns out the MM result is correct but the current one is not.
> The test ends like so:
> {noformat}
> desc formatted small_alltypesorc_a;
> ANALYZE TABLE small_alltypesorc_a COMPUTE STATISTICS;
> desc formatted small_alltypesorc_a;
> insert into table small_alltypesorc_a select * from small_alltypesorc1a;
> desc formatted small_alltypesorc_a;
> {noformat}
> The results from the descs in the golden file are:
> {noformat}
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 5   
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 15
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles2   
>   numRows 20  
> {noformat}
> Note the result change after analyze - the original nomRows is inaccurate, 
> but  BASIC_STATS is set to true.
> I am assuming with metadata only optimization this can produce incorrect 
> results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19326) union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats (incorrect query results possible)

2018-05-14 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19326:

Attachment: HIVE-19326.03.patch

> union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats 
> (incorrect query results possible)
> 
>
> Key: HIVE-19326
> URL: https://issues.apache.org/jira/browse/HIVE-19326
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sergey Shelukhin
>Assignee: Zoltan Haindrich
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HIVE-19326.01wip01.patch, HIVE-19326.02.patch, 
> HIVE-19326.03.patch
>
>
> Found when investigating the results change after converting tables to MM, 
> turns out the MM result is correct but the current one is not.
> The test ends like so:
> {noformat}
> desc formatted small_alltypesorc_a;
> ANALYZE TABLE small_alltypesorc_a COMPUTE STATISTICS;
> desc formatted small_alltypesorc_a;
> insert into table small_alltypesorc_a select * from small_alltypesorc1a;
> desc formatted small_alltypesorc_a;
> {noformat}
> The results from the descs in the golden file are:
> {noformat}
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 5   
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 15
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles2   
>   numRows 20  
> {noformat}
> Note the result change after analyze - the original nomRows is inaccurate, 
> but  BASIC_STATS is set to true.
> I am assuming with metadata only optimization this can produce incorrect 
> results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19326) union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats (incorrect query results possible)

2018-05-14 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474308#comment-16474308
 ] 

Zoltan Haindrich commented on HIVE-19326:
-

I've added another small change; which makes case when hive.merge.tezfiles is 
enabled also correct; I've updated relevant q.out-s

> union_fast_stats MiniLlapLocal golden file has incorrect "accurate" stats 
> (incorrect query results possible)
> 
>
> Key: HIVE-19326
> URL: https://issues.apache.org/jira/browse/HIVE-19326
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sergey Shelukhin
>Assignee: Zoltan Haindrich
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HIVE-19326.01wip01.patch, HIVE-19326.02.patch, 
> HIVE-19326.03.patch
>
>
> Found when investigating the results change after converting tables to MM, 
> turns out the MM result is correct but the current one is not.
> The test ends like so:
> {noformat}
> desc formatted small_alltypesorc_a;
> ANALYZE TABLE small_alltypesorc_a COMPUTE STATISTICS;
> desc formatted small_alltypesorc_a;
> insert into table small_alltypesorc_a select * from small_alltypesorc1a;
> desc formatted small_alltypesorc_a;
> {noformat}
> The results from the descs in the golden file are:
> {noformat}
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 5   
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles1   
>   numRows 15
> ...
>   COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
>   numFiles2   
>   numRows 20  
> {noformat}
> Note the result change after analyze - the original nomRows is inaccurate, 
> but  BASIC_STATS is set to true.
> I am assuming with metadata only optimization this can produce incorrect 
> results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19460) Improve stats estimations for NOT IN operator

2018-05-14 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19460:

Attachment: HIVE-19460.02.patch

> Improve stats estimations for NOT IN operator
> -
>
> Key: HIVE-19460
> URL: https://issues.apache.org/jira/browse/HIVE-19460
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-19460.01wip01.patch, HIVE-19460.01wip02.patch, 
> HIVE-19460.01wip03.patch, HIVE-19460.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   6   7   8   9   10   >