[jira] [Created] (HIVE-19419) SharedScanOptimizer may leave unnecessary operators in the plan

2018-05-03 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-19419:
--

 Summary: SharedScanOptimizer may leave unnecessary operators in 
the plan
 Key: HIVE-19419
 URL: https://issues.apache.org/jira/browse/HIVE-19419
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 3.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Due to the interaction with branches created by semijoin reduction. In turn, 
this can lead to errors such as:

{noformat}
2018-05-03T21:19:41,277 INFO  [8d6a552a-b62f-44a4-bdb4-afc2e810ae56 
HiveServer2-Handler-Pool: Thread-139]: physical.Vectorizer 
(Vectorizer.java:walkStackToFindVectorizationContext(1260)) - 
walkStackToFindVectorizationContext RS has new vectorization context Context 
name GBY, level 0, sorted projectionColumnMap {0=_col0}, scratchColumnTypeNames 
[]
2018-05-03T21:19:41,278 ERROR [8d6a552a-b62f-44a4-bdb4-afc2e810ae56 
HiveServer2-Handler-Pool: Thread-139]: ql.Driver 
(SessionState.java:printError(1220)) - FAILED: SemanticException 
org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.reflect.InvocationTargetException
org.apache.hadoop.hive.ql.parse.SemanticException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationNodeProcessor.doVectorize(Vectorizer.java:1285)
at 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$MapWorkVectorizationNodeProcessor.process(Vectorizer.java:1346)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at 
org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:43)
at 
org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:54)
at 
org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:54)
at 
org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:54)
at 
org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:54)
at 
org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:54)
at 
org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:54)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.vectorizeMapWork(Vectorizer.java:955)
at 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(Vectorizer.java:514)
at 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Vectorizer.java:485)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125)
at 
org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(Vectorizer.java:1495)
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeTaskPlan(TezCompiler.java:644)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] hive pull request #332: HIVE-19130: NPE is thrown when REPL LOAD applied dro...

2018-05-03 Thread sankarh
Github user sankarh closed the pull request at:

https://github.com/apache/hive/pull/332


---


[GitHub] hive pull request #316: HIVE-18864: ValidWriteIdList snapshot seems incorrec...

2018-05-03 Thread sankarh
Github user sankarh closed the pull request at:

https://github.com/apache/hive/pull/316


---


[GitHub] hive pull request #331: HIVE-18988: Support bootstrap replication of ACID ta...

2018-05-03 Thread sankarh
Github user sankarh closed the pull request at:

https://github.com/apache/hive/pull/331


---


[GitHub] hive pull request #339: HIVE 18193:Migrate existing ACID tables to use write...

2018-05-03 Thread sankarh
GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/339

HIVE 18193:Migrate existing ACID tables to use write id per table rather 
than global transaction id



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-18193

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/339.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #339


commit bd41541d27a7cfe6c497a9c88097e9f0b4a905e0
Author: Sankar Hariappan 
Date:   2018-05-02T06:48:39Z

HIVE-18193: Migrate existing ACID tables to use write id per table rather 
than global transaction id.

commit 374b421b6d2f41bfa6a4b0cbc3f5068b5e55bb82
Author: Sankar Hariappan 
Date:   2018-05-03T09:23:06Z

Added NOT NULL constraint

commit 8c9f9ce989471f4d0e7e8eb48c3801baf79a58b6
Author: Sankar Hariappan 
Date:   2018-05-03T09:50:13Z

Updated upgrade scripts of all databases for write ID migration

commit 11d5510905d6566c7412c997094b01dbe6997a18
Author: Sankar Hariappan 
Date:   2018-05-03T13:46:58Z

Bug fixes with metastore install/upgrade tests




---


[jira] [Created] (HIVE-19418) add background stats updater similar to compactor

2018-05-03 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-19418:
---

 Summary: add background stats updater similar to compactor
 Key: HIVE-19418
 URL: https://issues.apache.org/jira/browse/HIVE-19418
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


There's a JIRA HIVE-19416 to add snapshot version to stats for MM/ACID tables 
to make them usable in a transaction without breaking ACID (for metadata-only 
optimization). However, stats for ACID tables can still become unusable if e.g. 
two parallel inserts run - neither sees the data written by the other, so after 
both finish, the snapshots on either set of stats won't match the current 
snapshot and the stats will be unusable.

Additionally, for ACID and non-ACID tables alike, a lot of the stats, with some 
exceptions like numRows, cannot be aggregated (i.e. you cannot combine ndvs 
from two inserts), and for ACID even less can be aggregated (you cannot derive 
min/max if some rows are deleted but you don't scan the rest of the dataset).

Therefore we will add background logic to metastore (similar to, and partially 
inside, the ACID compactor) to update stats.
It will have 3 modes of operation.
1) Off.
2) Update only the stats that exist but are out of date (generating stats can 
be expensive, so if the user is only analyzing a subset of tables it should be 
able to only update that subset). We can simply look at existing stats and only 
analyze for the relevant partitions and columns.
3) On: 2 + create stats for all tables and columns missing stats.
There will also be a table parameter to skip stats update. 

In phase 1, the process will operate outside of compactor, and run analyze 
command on the table. The analyze command will automatically save the stats 
with ACID snapshot information if needed, based on HIVE-19416, so we don't need 
to do any special state management and this will work for all table types. 
However it's also more expensive.

In phase 2, we can explore adding stats collection during MM compaction that 
uses a temp table. If we don't have open writers during major compaction (so we 
overwrite all of the data), the temp table stats can simply be copied over to 
the main table with correct snapshot information, saving us a table scan.

In phase 3, we can add custom stats collection logic to full ACID compactor 
that is not query based, the same way as we'd do for (2). Alternatively we can 
wait for ACID compactor to become query based and just reuse (2).









--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19417) Modify metastore to have persistent tables/objects

2018-05-03 Thread Steve Yeom (JIRA)
Steve Yeom created HIVE-19417:
-

 Summary: Modify metastore to have persistent tables/objects
 Key: HIVE-19417
 URL: https://issues.apache.org/jira/browse/HIVE-19417
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Affects Versions: 3.0.0
Reporter: Steve Yeom
Assignee: Steve Yeom
 Fix For: 3.0.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19416) Create single version transactional table metastore statistics for aggregation queries

2018-05-03 Thread Steve Yeom (JIRA)
Steve Yeom created HIVE-19416:
-

 Summary: Create single version transactional table metastore 
statistics for aggregation queries
 Key: HIVE-19416
 URL: https://issues.apache.org/jira/browse/HIVE-19416
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Steve Yeom
Assignee: Steve Yeom
 Fix For: 3.0.0


The system should use only statistics for aggregation queries like count on 
transactional tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: switching HiveQA to manual submission?

2018-05-03 Thread Vihang Karajgaonkar
In case you didn't see the latest comment on the JIRA, here is the response
I got from the builds team:

Unfortunately we've been running into some bugs with Jenkins which is
> causing us to restart the server. We're in the early stages of migrating to
> a new host which we're confident will solve most of the issues we're seeing
> now. Watch the bui...@apache.org list
> https://lists.apache.org/list.html?bui...@apache.org for more information
> about the process going forward. We'll also try and tweet about when we
> have problems and are investigating them: https://twitter.com/infrabot
>

On Thu, May 3, 2018 at 2:41 PM, Vihang Karajgaonkar 
wrote:

> I had sent an email to bui...@apache.org (not sure if that is the right
> list). Also created this JIRA https://issues.apache.
> org/jira/browse/INFRA-16478 to see if there is some response.
>
> On Thu, May 3, 2018 at 2:21 PM, Vihang Karajgaonkar 
> wrote:
>
>> I saw patches getting disappeared also. I know the patch for HIVE-17824
>> was not run since last 1 week. When we looked into this yesterday I found
>> that for some reason the jenkins server was restarted overnight and we lose
>> the pending queue. Does anyone know why this might be happening? Does
>> jenkins restart happen if the pending queue is long enough or they really
>> had some scheduled maintenance? If its a maintenance thing we should be
>> able to find out the mailing list which informs users of this.
>>
>> Even if you submit the job manually, the current ptest is only able to
>> serve one job at a time. We will still end up with long pending queue and
>> any jenkins restart will drain the queue again. I think we need general
>> improvements to Ptest to speed things up. The testing infrastructure is not
>> able to keep up with the number of patches. Any suggestions to improve this
>> situation are welcome but I don't know if submitting manually is really
>> going to help with the problem.
>>
>>
>> On Thu, May 3, 2018 at 12:53 PM, Кривенко Ігор 
>> wrote:
>>
>>> My JIRAs also got disappeared.
>>> But, what other developers which have no access to submit manually on
>>> builds.apache.org will be doing?
>>>
>>> Thanks, Igor.
>>>
>>> 2018-05-03 22:47 GMT+03:00 Prasanth Jayachandran <
>>> pjayachand...@hortonworks.com>:
>>>
>>> > My JIRAs also got disappeared mysteriously and had to manually submit
>>> the
>>> > patch.
>>> > +1 on making it manual (atleast this will make the patch not disappear
>>> and
>>> > is more predictable/no babysitting).
>>> >
>>> > Thanks
>>> > Prasanth
>>> >
>>> > > On May 3, 2018, at 12:40 PM, Sergey Shelukhin <
>>> ser...@hortonworks.com>
>>> > wrote:
>>> > >
>>> > > Ping? I just had 3-4 JIRAs again disappear from the queue
>>> mysteriously,
>>> > for the 2nd or 3rd time in one week.
>>> > >
>>> > > Given that there doesn’t appear to be major interest in improving the
>>> > script in the community (other than some minor changes), should we just
>>> > nuke it and have manual submission w/2 fields, a patch file and jira
>>> > number? That way the tracking at least can be improved. And the patch
>>> > choice to run or not run.
>>> > > This would take less time than trying to appease the fickle spirits
>>> of
>>> > HiveQA as it is now. And also probably result in somewhat shorter
>>> queues.
>>> > >
>>> > > From: Sergey Shelukhin > > ser...@hortonworks.com>>
>>> > > Date: Thursday, April 26, 2018 at 17:57
>>> > > To: "dev@hive.apache.org" <
>>> > dev@hive.apache.org>
>>> > > Subject: switching HiveQA to manual submission?
>>> > >
>>> > > Given the constant cluster with HiveQA, namely queue constantly
>>> > losing or not picking up patches for whatever reason (not limited to
>>> dedup
>>> > - I uploaded some patches today where there’s only one attached to the
>>> JIRA
>>> > and still it didn’t get in the queue or got removed) I wonder if we
>>> should
>>> > add manual submission option to HiveQA. Or make it the only option
>>> actually.
>>> > > That way one can target a specific file and get a specific job link
>>> back
>>> > that won’t got anywhere (one can hope)...
>>> >
>>> >
>>>
>>
>>
>


[jira] [Created] (HIVE-19415) Support CORS for all HS2 web endpoints

2018-05-03 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-19415:


 Summary: Support CORS for all HS2 web endpoints
 Key: HIVE-19415
 URL: https://issues.apache.org/jira/browse/HIVE-19415
 Project: Hive
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.0.0, 3.1.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


HIVE-19277 changes alone are not sufficient to support CORS. CrossOriginFilter 
has to be added to jetty which will serve appropriate response for OPTIONS 
pre-flight request. 





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19414) Cover partitioned table stats update/retrieve cases

2018-05-03 Thread Steve Yeom (JIRA)
Steve Yeom created HIVE-19414:
-

 Summary: Cover partitioned table stats update/retrieve cases 
 Key: HIVE-19414
 URL: https://issues.apache.org/jira/browse/HIVE-19414
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Steve Yeom
 Fix For: 3.0.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19413) Add/use writeId and validWriteIdList during update and for reteive.

2018-05-03 Thread Steve Yeom (JIRA)
Steve Yeom created HIVE-19413:
-

 Summary: Add/use writeId and validWriteIdList during update and 
for reteive. 
 Key: HIVE-19413
 URL: https://issues.apache.org/jira/browse/HIVE-19413
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Steve Yeom
Assignee: Steve Yeom
 Fix For: 3.0.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19412) Possible invalid Full ACID stats

2018-05-03 Thread Steve Yeom (JIRA)
Steve Yeom created HIVE-19412:
-

 Summary: Possible invalid Full ACID stats
 Key: HIVE-19412
 URL: https://issues.apache.org/jira/browse/HIVE-19412
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Steve Yeom
 Fix For: 3.0.0


https://issues.apache.org/jira/browse/HIVE-19411?filter=-1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: switching HiveQA to manual submission?

2018-05-03 Thread Vihang Karajgaonkar
I had sent an email to bui...@apache.org (not sure if that is the right
list). Also created this JIRA
https://issues.apache.org/jira/browse/INFRA-16478 to see if there is some
response.

On Thu, May 3, 2018 at 2:21 PM, Vihang Karajgaonkar 
wrote:

> I saw patches getting disappeared also. I know the patch for HIVE-17824
> was not run since last 1 week. When we looked into this yesterday I found
> that for some reason the jenkins server was restarted overnight and we lose
> the pending queue. Does anyone know why this might be happening? Does
> jenkins restart happen if the pending queue is long enough or they really
> had some scheduled maintenance? If its a maintenance thing we should be
> able to find out the mailing list which informs users of this.
>
> Even if you submit the job manually, the current ptest is only able to
> serve one job at a time. We will still end up with long pending queue and
> any jenkins restart will drain the queue again. I think we need general
> improvements to Ptest to speed things up. The testing infrastructure is not
> able to keep up with the number of patches. Any suggestions to improve this
> situation are welcome but I don't know if submitting manually is really
> going to help with the problem.
>
>
> On Thu, May 3, 2018 at 12:53 PM, Кривенко Ігор 
> wrote:
>
>> My JIRAs also got disappeared.
>> But, what other developers which have no access to submit manually on
>> builds.apache.org will be doing?
>>
>> Thanks, Igor.
>>
>> 2018-05-03 22:47 GMT+03:00 Prasanth Jayachandran <
>> pjayachand...@hortonworks.com>:
>>
>> > My JIRAs also got disappeared mysteriously and had to manually submit
>> the
>> > patch.
>> > +1 on making it manual (atleast this will make the patch not disappear
>> and
>> > is more predictable/no babysitting).
>> >
>> > Thanks
>> > Prasanth
>> >
>> > > On May 3, 2018, at 12:40 PM, Sergey Shelukhin > >
>> > wrote:
>> > >
>> > > Ping? I just had 3-4 JIRAs again disappear from the queue
>> mysteriously,
>> > for the 2nd or 3rd time in one week.
>> > >
>> > > Given that there doesn’t appear to be major interest in improving the
>> > script in the community (other than some minor changes), should we just
>> > nuke it and have manual submission w/2 fields, a patch file and jira
>> > number? That way the tracking at least can be improved. And the patch
>> > choice to run or not run.
>> > > This would take less time than trying to appease the fickle spirits of
>> > HiveQA as it is now. And also probably result in somewhat shorter
>> queues.
>> > >
>> > > From: Sergey Shelukhin  > ser...@hortonworks.com>>
>> > > Date: Thursday, April 26, 2018 at 17:57
>> > > To: "dev@hive.apache.org" <
>> > dev@hive.apache.org>
>> > > Subject: switching HiveQA to manual submission?
>> > >
>> > > Given the constant cluster with HiveQA, namely queue constantly
>> > losing or not picking up patches for whatever reason (not limited to
>> dedup
>> > - I uploaded some patches today where there’s only one attached to the
>> JIRA
>> > and still it didn’t get in the queue or got removed) I wonder if we
>> should
>> > add manual submission option to HiveQA. Or make it the only option
>> actually.
>> > > That way one can target a specific file and get a specific job link
>> back
>> > that won’t got anywhere (one can hope)...
>> >
>> >
>>
>
>


Re: switching HiveQA to manual submission?

2018-05-03 Thread Vihang Karajgaonkar
I saw patches getting disappeared also. I know the patch for HIVE-17824 was
not run since last 1 week. When we looked into this yesterday I found that
for some reason the jenkins server was restarted overnight and we lose the
pending queue. Does anyone know why this might be happening? Does jenkins
restart happen if the pending queue is long enough or they really had some
scheduled maintenance? If its a maintenance thing we should be able to find
out the mailing list which informs users of this.

Even if you submit the job manually, the current ptest is only able to
serve one job at a time. We will still end up with long pending queue and
any jenkins restart will drain the queue again. I think we need general
improvements to Ptest to speed things up. The testing infrastructure is not
able to keep up with the number of patches. Any suggestions to improve this
situation are welcome but I don't know if submitting manually is really
going to help with the problem.


On Thu, May 3, 2018 at 12:53 PM, Кривенко Ігор 
wrote:

> My JIRAs also got disappeared.
> But, what other developers which have no access to submit manually on
> builds.apache.org will be doing?
>
> Thanks, Igor.
>
> 2018-05-03 22:47 GMT+03:00 Prasanth Jayachandran <
> pjayachand...@hortonworks.com>:
>
> > My JIRAs also got disappeared mysteriously and had to manually submit the
> > patch.
> > +1 on making it manual (atleast this will make the patch not disappear
> and
> > is more predictable/no babysitting).
> >
> > Thanks
> > Prasanth
> >
> > > On May 3, 2018, at 12:40 PM, Sergey Shelukhin 
> > wrote:
> > >
> > > Ping? I just had 3-4 JIRAs again disappear from the queue mysteriously,
> > for the 2nd or 3rd time in one week.
> > >
> > > Given that there doesn’t appear to be major interest in improving the
> > script in the community (other than some minor changes), should we just
> > nuke it and have manual submission w/2 fields, a patch file and jira
> > number? That way the tracking at least can be improved. And the patch
> > choice to run or not run.
> > > This would take less time than trying to appease the fickle spirits of
> > HiveQA as it is now. And also probably result in somewhat shorter queues.
> > >
> > > From: Sergey Shelukhin  ser...@hortonworks.com>>
> > > Date: Thursday, April 26, 2018 at 17:57
> > > To: "dev@hive.apache.org" <
> > dev@hive.apache.org>
> > > Subject: switching HiveQA to manual submission?
> > >
> > > Given the constant cluster with HiveQA, namely queue constantly
> > losing or not picking up patches for whatever reason (not limited to
> dedup
> > - I uploaded some patches today where there’s only one attached to the
> JIRA
> > and still it didn’t get in the queue or got removed) I wonder if we
> should
> > add manual submission option to HiveQA. Or make it the only option
> actually.
> > > That way one can target a specific file and get a specific job link
> back
> > that won’t got anywhere (one can hope)...
> >
> >
>


[jira] [Created] (HIVE-19411) Full-ACID table stats may not be valid. E.g., delete/insert

2018-05-03 Thread Steve Yeom (JIRA)
Steve Yeom created HIVE-19411:
-

 Summary: Full-ACID table stats may not be valid. E.g., 
delete/insert
 Key: HIVE-19411
 URL: https://issues.apache.org/jira/browse/HIVE-19411
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Steve Yeom
 Fix For: 0.10.1


E.g., per Sergey,. updating a row can ended up +2 rows instead of +0 
since it is translated to delete and insert and the physical writer 
may just add # of operations. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19410) don't create serde reader in LLAP if there's no cache

2018-05-03 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-19410:
---

 Summary: don't create serde reader in LLAP if there's no cache
 Key: HIVE-19410
 URL: https://issues.apache.org/jira/browse/HIVE-19410
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


Seems to crop up in some tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: switching HiveQA to manual submission?

2018-05-03 Thread Кривенко Ігор
My JIRAs also got disappeared.
But, what other developers which have no access to submit manually on
builds.apache.org will be doing?

Thanks, Igor.

2018-05-03 22:47 GMT+03:00 Prasanth Jayachandran <
pjayachand...@hortonworks.com>:

> My JIRAs also got disappeared mysteriously and had to manually submit the
> patch.
> +1 on making it manual (atleast this will make the patch not disappear and
> is more predictable/no babysitting).
>
> Thanks
> Prasanth
>
> > On May 3, 2018, at 12:40 PM, Sergey Shelukhin 
> wrote:
> >
> > Ping? I just had 3-4 JIRAs again disappear from the queue mysteriously,
> for the 2nd or 3rd time in one week.
> >
> > Given that there doesn’t appear to be major interest in improving the
> script in the community (other than some minor changes), should we just
> nuke it and have manual submission w/2 fields, a patch file and jira
> number? That way the tracking at least can be improved. And the patch
> choice to run or not run.
> > This would take less time than trying to appease the fickle spirits of
> HiveQA as it is now. And also probably result in somewhat shorter queues.
> >
> > From: Sergey Shelukhin >
> > Date: Thursday, April 26, 2018 at 17:57
> > To: "dev@hive.apache.org" <
> dev@hive.apache.org>
> > Subject: switching HiveQA to manual submission?
> >
> > Given the constant cluster with HiveQA, namely queue constantly
> losing or not picking up patches for whatever reason (not limited to dedup
> - I uploaded some patches today where there’s only one attached to the JIRA
> and still it didn’t get in the queue or got removed) I wonder if we should
> add manual submission option to HiveQA. Or make it the only option actually.
> > That way one can target a specific file and get a specific job link back
> that won’t got anywhere (one can hope)...
>
>


Re: switching HiveQA to manual submission?

2018-05-03 Thread Prasanth Jayachandran
My JIRAs also got disappeared mysteriously and had to manually submit the 
patch. 
+1 on making it manual (atleast this will make the patch not disappear and is 
more predictable/no babysitting).

Thanks
Prasanth

> On May 3, 2018, at 12:40 PM, Sergey Shelukhin  wrote:
> 
> Ping? I just had 3-4 JIRAs again disappear from the queue mysteriously, for 
> the 2nd or 3rd time in one week.
> 
> Given that there doesn’t appear to be major interest in improving the script 
> in the community (other than some minor changes), should we just nuke it and 
> have manual submission w/2 fields, a patch file and jira number? That way the 
> tracking at least can be improved. And the patch choice to run or not run.
> This would take less time than trying to appease the fickle spirits of HiveQA 
> as it is now. And also probably result in somewhat shorter queues.
> 
> From: Sergey Shelukhin >
> Date: Thursday, April 26, 2018 at 17:57
> To: "dev@hive.apache.org" 
> >
> Subject: switching HiveQA to manual submission?
> 
> Given the constant cluster with HiveQA, namely queue constantly losing or 
> not picking up patches for whatever reason (not limited to dedup - I uploaded 
> some patches today where there’s only one attached to the JIRA and still it 
> didn’t get in the queue or got removed) I wonder if we should add manual 
> submission option to HiveQA. Or make it the only option actually.
> That way one can target a specific file and get a specific job link back that 
> won’t got anywhere (one can hope)...



[jira] [Created] (HIVE-19409) Disable incremental rewriting with outdated materialized views

2018-05-03 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-19409:
--

 Summary: Disable incremental rewriting with outdated materialized 
views
 Key: HIVE-19409
 URL: https://issues.apache.org/jira/browse/HIVE-19409
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Affects Versions: 3.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


Add an option to disable incremental rewriting with outdated materialized 
views. It will be disabled by default, and this will be an opt-in feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19408) Improve show materialized views statement to show more information about invalidation

2018-05-03 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-19408:
--

 Summary: Improve show materialized views statement to show more 
information about invalidation
 Key: HIVE-19408
 URL: https://issues.apache.org/jira/browse/HIVE-19408
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Affects Versions: 3.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


We should show more useful information in addition to materialized view name. 
For instance, information about whether the materialized view contents are 
up-to-date or not, and which table(s) have changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: switching HiveQA to manual submission?

2018-05-03 Thread Sergey Shelukhin
Ping? I just had 3-4 JIRAs again disappear from the queue mysteriously, for the 
2nd or 3rd time in one week.

Given that there doesn’t appear to be major interest in improving the script in 
the community (other than some minor changes), should we just nuke it and have 
manual submission w/2 fields, a patch file and jira number? That way the 
tracking at least can be improved. And the patch choice to run or not run.
This would take less time than trying to appease the fickle spirits of HiveQA 
as it is now. And also probably result in somewhat shorter queues.

From: Sergey Shelukhin >
Date: Thursday, April 26, 2018 at 17:57
To: "dev@hive.apache.org" 
>
Subject: switching HiveQA to manual submission?

Given the constant cluster with HiveQA, namely queue constantly losing or 
not picking up patches for whatever reason (not limited to dedup - I uploaded 
some patches today where there’s only one attached to the JIRA and still it 
didn’t get in the queue or got removed) I wonder if we should add manual 
submission option to HiveQA. Or make it the only option actually.
That way one can target a specific file and get a specific job link back that 
won’t got anywhere (one can hope)...


[jira] [Created] (HIVE-19407) Only support materialized views stored either as ACID or in selected custom storage handlers

2018-05-03 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-19407:
--

 Summary: Only support materialized views stored either as ACID or 
in selected custom storage handlers
 Key: HIVE-19407
 URL: https://issues.apache.org/jira/browse/HIVE-19407
 Project: Hive
  Issue Type: Sub-task
  Components: Materialized views
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


We should not support, e.g., external HDFS tables. Storage handlers such as 
Druid should be fine. We will limit the support for sources that are actually 
not handled by Hive, which will in turn produce more predictable behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19406) HiveKVResultCache.setupOutput hangs when the file creation failed

2018-05-03 Thread John Doe (JIRA)
John Doe created HIVE-19406:
---

 Summary: HiveKVResultCache.setupOutput hangs when the file 
creation failed
 Key: HIVE-19406
 URL: https://issues.apache.org/jira/browse/HIVE-19406
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.2
Reporter: John Doe


The while loop in the HiveKVResultCache.setupOutput function hangs endlessly 
when the file creation failed, causing the delete() returns false.

The file creation failure can be caused by different reasons, e.g., disk full.

Here is the code snippet.

 
{code:java}
  private void setupOutput() throws IOException {
if (parentFile == null) {
  while (true) {
parentFile = File.createTempFile("hive-resultcache", "");
if (parentFile.delete() && parentFile.mkdir()) {
  parentFile.deleteOnExit();
  break;
}
if (LOG.isDebugEnabled()) {
  LOG.debug("Retry creating tmp result-cache directory...");
}
  }
}
...
  }
{code}

A similar case is [HIVE-19391|https://issues.apache.org/jira/browse/HIVE-19391]




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66935: HIVE-18977: Listing partitions returns different results with JDO and direct SQL

2018-05-03 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66935/#review202350
---


Ship it!




Ship It!

- Peter Vary


On May 3, 2018, 1:53 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66935/
> ---
> 
> (Updated May 3, 2018, 1:53 p.m.)
> 
> 
> Review request for hive, Alan Gates and Peter Vary.
> 
> 
> Bugs: HIVE-18977
> https://issues.apache.org/jira/browse/HIVE-18977
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Some of the test cases in TestListPartitions fail when directSQL is disabled.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  4601e09 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  6645e55 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
>  d608e50 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestListPartitions.java
>  a8b6e31 
> 
> 
> Diff: https://reviews.apache.org/r/66935/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>



[jira] [Created] (HIVE-19405) AddPartitionDesc should intern its fields

2018-05-03 Thread Sahil Takiar (JIRA)
Sahil Takiar created HIVE-19405:
---

 Summary: AddPartitionDesc should intern its fields
 Key: HIVE-19405
 URL: https://issues.apache.org/jira/browse/HIVE-19405
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Sahil Takiar
Assignee: Sahil Takiar


A lot of heap is wasted on duplicate strings between we accumulate tons of 
{{AddPartitionDesc}} objects during operations such as msck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19404) Revise DDL Task Result Logging

2018-05-03 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-19404:
--

 Summary: Revise DDL Task Result Logging
 Key: HIVE-19404
 URL: https://issues.apache.org/jira/browse/HIVE-19404
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 3.0.0, 2.4.0
Reporter: BELUGA BEHR


There is some logging in {{DDLTask}} that can be made better:

{code}
2018-05-03 03:08:32,524 INFO  hive.ql.exec.DDLTask: 
[HiveServer2-Background-Pool: Thread-101980]: results : 706
{code}

This logging should either be demoted to _debug_ level logging and/or requires 
additional context.

{code}
2018-05-03 03:08:32,524 INFO  hive.ql.exec.DDLTask: 
[HiveServer2-Background-Pool: Thread-101980]: Found 706 tables that match the 
SHOW DATABASE statement
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review Request 66935: HIVE-18977: Listing partitions returns different results with JDO and direct SQL

2018-05-03 Thread Marta Kuczora via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66935/
---

Review request for hive, Alan Gates and Peter Vary.


Bugs: HIVE-18977
https://issues.apache.org/jira/browse/HIVE-18977


Repository: hive-git


Description
---

Some of the test cases in TestListPartitions fail when directSQL is disabled.


Diffs
-

  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 4601e09 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
 6645e55 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
 d608e50 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/client/TestListPartitions.java
 a8b6e31 


Diff: https://reviews.apache.org/r/66935/diff/1/


Testing
---


Thanks,

Marta Kuczora



[jira] [Created] (HIVE-19403) Demote 'Pattern' Logging

2018-05-03 Thread BELUGA BEHR (JIRA)
BELUGA BEHR created HIVE-19403:
--

 Summary: Demote 'Pattern' Logging
 Key: HIVE-19403
 URL: https://issues.apache.org/jira/browse/HIVE-19403
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 3.0.0, 2.4.0
Reporter: BELUGA BEHR


In the {{DDLTask}} class, there is some logging that is not helpful to a 
cluster admin and should be demoted to _debug_ level logging.  In fact, in one 
place in the code, it already is.

{code}
LOG.info("pattern: {}", showDatabasesDesc.getPattern());
LOG.debug("pattern: {}", pattern);
LOG.info("pattern: {}", showFuncs.getPattern());
LOG.info("pattern: {}", showTblStatus.getPattern());
{code}

Here is an example... as an admin, I can already see what the pattern is, I do 
not need this extra logging.  It provides no additional context.

{code:java|title=Example}
2018-05-03 03:08:26,354 INFO  org.apache.hadoop.hive.ql.Driver: 
[HiveServer2-Background-Pool: Thread-101980]: Executing 
command(queryId=hive_20180503030808_e53c26ef-2280-4eca-929b-668503105e2e): SHOW 
TABLE EXTENDED FROM my_db LIKE '*'
2018-05-03 03:08:26,355 INFO  hive.ql.exec.DDLTask: 
[HiveServer2-Background-Pool: Thread-101980]: pattern: *
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19402) Handle explain analyze for reoptimization

2018-05-03 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created HIVE-19402:
---

 Summary: Handle explain analyze for reoptimization
 Key: HIVE-19402
 URL: https://issues.apache.org/jira/browse/HIVE-19402
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


This might also enable to remove "explain reoptimization"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19401) Correct Queue Name for LLAP should be updated in ATSHook of hive

2018-05-03 Thread anishek (JIRA)
anishek created HIVE-19401:
--

 Summary: Correct Queue Name for LLAP should be updated in ATSHook 
of hive
 Key: HIVE-19401
 URL: https://issues.apache.org/jira/browse/HIVE-19401
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 3.1.0
Reporter: anishek


The queue name provided as part of the ATSHook in hive currently by default 
picks up the queue from *mapreduce.job.queuename* config parameter. 
However for LLAP this is incorrect. Along with workload management which uses 
*hive.server2.tez.interactive.queue* this problem becomes slightly more 
complicated since we dont show the llap queues to users in yarn queues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19400) Adjust Hive 1.0 to 2.0 conversion utility to the upgrade

2018-05-03 Thread Miklos Gergely (JIRA)
Miklos Gergely created HIVE-19400:
-

 Summary: Adjust Hive 1.0 to 2.0 conversion utility to the upgrade
 Key: HIVE-19400
 URL: https://issues.apache.org/jira/browse/HIVE-19400
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Affects Versions: 3.0.0
Reporter: Miklos Gergely
Assignee: Miklos Gergely
 Fix For: 3.0.0


Conversion utility should allow specification of the output dir, and create 
files only if there is actually something to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19399) Download cast generating incorrect value for vectorization

2018-05-03 Thread Haifeng Chen (JIRA)
Haifeng Chen created HIVE-19399:
---

 Summary: Download cast generating incorrect value for vectorization
 Key: HIVE-19399
 URL: https://issues.apache.org/jira/browse/HIVE-19399
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 3.1.0
Reporter: Haifeng Chen


 

The following sql scripts generating different result for vectorization 
disabled and enabled.
  drop table test_schema;
  create table test_schema (f int) stored as parquet;
  insert into test_schema values ('9');
  select cast(f as tinyint) + 1 from test_schema;

For non-vectorization, the result is -96 while for vectorization mode, it is 
10

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19398) hive overwrite table1 fail insert

2018-05-03 Thread liaozhenjiang (JIRA)
liaozhenjiang created HIVE-19398:


 Summary: hive overwrite table1 fail insert 
 Key: HIVE-19398
 URL: https://issues.apache.org/jira/browse/HIVE-19398
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: liaozhenjiang
 Fix For: 1.1.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19397) IndexOutOfBoundsException throws when invoke next() method in HiveQueryResultSet

2018-05-03 Thread Chen Lantian (JIRA)
Chen Lantian created HIVE-19397:
---

 Summary: IndexOutOfBoundsException throws when invoke next() 
method in HiveQueryResultSet
 Key: HIVE-19397
 URL: https://issues.apache.org/jira/browse/HIVE-19397
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, JDBC
Affects Versions: 2.1.1
 Environment: CentOS-7.2.1511, Haoop-2.7.3, Hive-2.1.1
Reporter: Chen Lantian
Assignee: Chen Lantian


I try exporting data using HiveQueryResultSet.next() method, there is an 
IndexOutOfBoundsException sometimes when data is very large, what i catch is 
below

java.sql.SQLException: Error retrieving next row
 at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:396)
 ...
Caused by: java.lang.IndexOutOfBoundsException: Index: 1000, Size: 1000
 at java.util.ArrayList.rangeCheck(ArrayList.java:653)
 at java.util.ArrayList.get(ArrayList.java:429)
 at org.apache.hadoop.hive.serde2.thrift.ColumnBuffer.get(ColumnBuffer.java:292)
 at org.apache.hive.service.cli.ColumnBasedSet$1.next(ColumnBasedSet.java:186)
 at org.apache.hive.service.cli.ColumnBasedSet$1.next(ColumnBasedSet.java:173)
 at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:382)
 ... 32 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)