date:20180517

Re: [VOTE] Apache Hive 3.0.0 Release Candidate 0

2018-05-17 Thread Prasanth Jayachandran

Could you please increment the rc version when roll a new RC? To make it easy 
to verify latest RC version.
Also your email in public key seems to have a typo 
vg...@apche.org.

Thanks
Prasanth

On May 17, 2018, at 9:33 PM, Vineet Garg 
> wrote:

standalone-metastore/pom.xml’s dependency on storage-api wasn’t updated in this 
RC. I have fixed that and I am working on creating new RC. I’ll send another 
email for voting soon.

On May 15, 2018, at 5:56 PM, Vineet Garg 
> wrote:

Apache Hive 3.0.0 Release Candidate 0 is available here:

http://people.apache.org/~vgarg/apache-hive-3.0.0-rc-0

Tag: https://github.com/apache/hive/tree/release-3.0.0-rc0

Voting will conclude in 72 hours.

Hive PMC Members: Please test and vote.

Thanks.

Re: Review Request 67197: HIVE-19588: Several invocation of file listing when creating VectorizedOrcAcidRowBatchReader

2018-05-17 Thread j . prasanth . j


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67197/
---

(Updated May 18, 2018, 5:45 a.m.)


Review request for hive and Eugene Koifman.


Changes
---

Removed VectorizedOrcAcidRowBatchReader creation from inner loop for LLAP.


Bugs: HIVE-19588
https://issues.apache.org/jira/browse/HIVE-19588


Repository: hive-git


Description
---

HIVE-19588: Several invocation of file listing when creating 
VectorizedOrcAcidRowBatchReader


Diffs (updated)
-

  
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java
 7451ea4 
  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 183515a 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 2337a35 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 5655ee9 
  
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
 8caa265 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 
b28c126 
  
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedOrcAcidRowBatchReader.java
 3acc085 


Diff: https://reviews.apache.org/r/67197/diff/2/

Changes: https://reviews.apache.org/r/67197/diff/1-2/


Testing
---


Thanks,

Prasanth_J

[VOTE] Apache Hive 3.0.0 Release Candidate 0

2018-05-17 Thread Vineet Garg

Apache Hive 3.0.0 Release Candidate 0 is available here:

http://people.apache.org/~vgarg/apache-hive-3.0.0-rc-0


Tag: https://github.com/apache/hive/tree/release-3.0.0-rc0

My public key is available at 
hkps.pool.sks-keyservers.net (Lookup using 
‘vgarg’).

Voting will conclude in 72 hours.

Hive PMC Members: Please test and vote.

Thanks.

Re: [VOTE] Apache Hive 3.0.0 Release Candidate 0

2018-05-17 Thread Vineet Garg

standalone-metastore/pom.xml’s dependency on storage-api wasn’t updated in this 
RC. I have fixed that and I am working on creating new RC. I’ll send another 
email for voting soon.

> On May 15, 2018, at 5:56 PM, Vineet Garg  wrote:
> 
> Apache Hive 3.0.0 Release Candidate 0 is available here:
> 
> http://people.apache.org/~vgarg/apache-hive-3.0.0-rc-0
> 
> 
> Tag: https://github.com/apache/hive/tree/release-3.0.0-rc0
> 
> 
> Voting will conclude in 72 hours.
> 
> Hive PMC Members: Please test and vote.
> 
> Thanks.

[jira] [Created] (HIVE-19602) Refactor inplace progress code in Hive-on-spark progress monitor to use InplaceUpdate

2018-05-17 Thread Bharathkrishna Guruvayoor Murali (JIRA)

Bharathkrishna Guruvayoor Murali created HIVE-19602:
---

 Summary: Refactor inplace progress code in Hive-on-spark progress 
monitor to use InplaceUpdate
 Key: HIVE-19602
 URL: https://issues.apache.org/jira/browse/HIVE-19602
 Project: Hive
  Issue Type: Bug
Reporter: Bharathkrishna Guruvayoor Murali
Assignee: Bharathkrishna Guruvayoor Murali


We can refactor the HOS inplace progress monitor code to use InplaceUpdate.

This would be similar to :

[https://github.com/apache/hive/blob/0b6bea89f74b607299ad944b37e4b62c711aaa69/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/RenderStrategy.java#L181]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19601) Unsupported Post join function

2018-05-17 Thread slim bouguerra (JIRA)

slim bouguerra created HIVE-19601:
-

 Summary: Unsupported Post join function
 Key: HIVE-19601
 URL: https://issues.apache.org/jira/browse/HIVE-19601
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra


h1. As part of trying to use the Calcite rule {code} 
org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule#JOIN {code}
i got the following Calcite plan 
{code}
2018-05-17T09:26:02,781 DEBUG [80d6d405-ed78-4f60-bd93-b3e08e424f73 main] 
translator.PlanModifierForASTConv: Final plan after modifier
 HiveProject(_c0=[$1], _c1=[$2])
  HiveProject(zone=[$0], $f1=[$1], $f2=[$3])
HiveJoin(condition=[IS NOT DISTINCT FROM($0, $2)], joinType=[inner], 
algorithm=[none], cost=[not available])
  HiveProject(zone=[$0], $f1=[$1])
HiveAggregate(group=[{0}], agg#0=[count($1)])
  HiveProject(zone=[$0], interval_marker=[$1])
HiveAggregate(group=[{0, 1}])
  HiveProject(zone=[$3], interval_marker=[$1])
HiveTableScan(table=[[druid_test_dst.test_base_table]], 
table:alias=[test_base_table])
  HiveProject(zone=[$0], $f1=[$1])
HiveAggregate(group=[{0}], agg#0=[count($1)])
  HiveProject(zone=[$0], dim=[$1])
HiveAggregate(group=[{0, 1}])
  HiveProject(zone=[$3], dim=[$4])
HiveTableScan(table=[[druid_test_dst.test_base_table]], 
table:alias=[test_base_table])
{code}
I run into this issue
{code} 
2018-05-17T09:26:02,876 ERROR [80d6d405-ed78-4f60-bd93-b3e08e424f73 main] 
parse.CalcitePlanner: CBO failed, skipping CBO.
org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid function 
'IS NOT DISTINCT FROM'
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:1069)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1464)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19600) Hive and Calcite have different semantics for Grouping sets

2018-05-17 Thread slim bouguerra (JIRA)

slim bouguerra created HIVE-19600:
-

 Summary: Hive and Calcite have different semantics for Grouping 
sets
 Key: HIVE-19600
 URL: https://issues.apache.org/jira/browse/HIVE-19600
 Project: Hive
  Issue Type: Sub-task
Reporter: slim bouguerra
 Fix For: 3.1.0


h1. Issue:
Tried to use the calcite rule {code} 
org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule#AggregateExpandDistinctAggregatesRule(java.lang.Class, boolean, 
org.apache.calcite.tools.RelBuilderFactory) {code} to replace current rule used 
by Hive {code} 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule#HiveExpandDistinctAggregatesRule
{code}
But i got an exception when generating the Operator tree out of calcite plan.
This is the Calcite plan 
{code} 
HiveProject.HIVE.[](input=rel#50:HiveAggregate.HIVE.[](input=rel#48:HiveProject.HIVE.[](input=rel#44:HiveAggregate.HIVE.[](input=rel#38:HiveProject.HIVE.[](input=rel#0:HiveTableScan.HIVE.[]
(table=[druid_test_dst.test_base_table],table:alias=test_base_table)[false],$f0=$3,$f1=$1,$f2=$4),group={0,
 1, 2},groups=[{0, 1}, {0, 2}],$g=GROUPING($0, $1, 
$2)),$f0=$0,$f1=$1,$f2=$2,$g_1==($3, 1),$g_2==($3, 
2)),group={0},agg#0=count($1) FILTER $3,agg#1=count($2) FILTER 
$4),_o__c0=$1,_o__c1=$2)
{code}

This is the exception stack 
{code} 
2018-05-17T08:46:48,604 ERROR [649a61b0-d8c7-45d8-962d-b1d38397feb4 main] 
ql.Driver: FAILED: SemanticException Line 0:-1 Argument type mismatch 'zone': 
The first argument to grouping() must be an int/long. Got: STRING
org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Argument type 
mismatch 'zone': The first argument to grouping() must be an int/long. Got: 
STRING
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1467)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at 
org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:239)
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:185)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:12566)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:12521)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4525)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4298)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10487)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10426)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11339)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11196)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11223)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11209)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:517)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12074)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:164)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:643)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1686)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1633)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1628)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at

Review Request 67197: HIVE-19588: Several invocation of file listing when creating VectorizedOrcAcidRowBatchReader

2018-05-17 Thread j . prasanth . j


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67197/
---

Review request for hive and Eugene Koifman.


Bugs: HIVE-19588
https://issues.apache.org/jira/browse/HIVE-19588


Repository: hive-git


Description
---

HIVE-19588: Several invocation of file listing when creating 
VectorizedOrcAcidRowBatchReader


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 
183515a0ed2a0f48740825a4737d4bee63d8a90a 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 
2337a350e6bd44e52826b46707fb8727974e00bf 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
5655ee9407d78ebc28a6e2b9c39ba55efc603a54 
  
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
 8caa265e8b9bd5674a35083ccd81b873c9fc2c70 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 
b28c126dbca8f1796dae5e8c5c5e256d18b1f86d 
  
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedOrcAcidRowBatchReader.java
 3acc0855fb07b51377ef665eea9abf9927c260c6 


Diff: https://reviews.apache.org/r/67197/diff/1/


Testing
---


Thanks,

Prasanth_J

[jira] [Created] (HIVE-19599) Release Notes : Highlighting backwards incompatible changes

2018-05-17 Thread Eugene Koifman (JIRA)

Eugene Koifman created HIVE-19599:
-

 Summary: Release Notes : Highlighting backwards incompatible 
changes
 Key: HIVE-19599
 URL: https://issues.apache.org/jira/browse/HIVE-19599
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Vineet Garg


We need to highlight backwards incompatible changes.  A list Jira titles won't 
be sufficient.

For example, tables with Acid V1 (pre 3.0) data has to be major compacted 
before upgrade and may not process any update/delete/merge until after upgrade. 
 Not doing so may result in data corruption/loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19598) Acid V1 to V2 upgrade

2018-05-17 Thread Eugene Koifman (JIRA)

Eugene Koifman created HIVE-19598:
-

 Summary: Acid V1 to V2 upgrade
 Key: HIVE-19598
 URL: https://issues.apache.org/jira/browse/HIVE-19598
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


The on-disk layout for full acid (transactional) tables has changed 3.0.

Any transactional table that has any update/delete events in any deltas that 
have not been Major compacted, must go through a Major compaction before 
upgrading to 3.0.  No more update/delete/merge should be run after/during major 
compaction.

Not doing so will result in data corruption/loss.

 

Need to create a utility tool to help with this process.  HIVE-19233 started 
this but it needs more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19597) TestWorkloadManager sometimes hangs

2018-05-17 Thread Sergey Shelukhin (JIRA)

Sergey Shelukhin created HIVE-19597:
---

 Summary: TestWorkloadManager sometimes hangs
 Key: HIVE-19597
 URL: https://issues.apache.org/jira/browse/HIVE-19597
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


Seems like the tests randomly get stuck after the lines like
{noformat}
2018-05-17T01:54:27,111  INFO [Workload management master] tez.WorkloadManager: 
Processing current events
2018-05-17T01:54:27,603  INFO [TriggerValidator] 
tez.PerPoolTriggerValidatorRunnable: Creating trigger validator for pool: llap
2018-05-17T01:54:37,090 DEBUG [Thread-28] conf.HiveConf: Found metastore URI of 
null
{noformat}

Then they get killed by timeout. Happens to random tests in a few separate runs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 67186: HIVE-19585: Add UNKNOWN to PrincipalType

2018-05-17 Thread Sergio Pena via Review Board


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67186/#review203369
---




standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/PrincipalType.java
Lines 18 (patched)


This file is auto-generated by Thrift. To add a new field, you need to edit 
the hive_metastore.thrift file and generate the new thrift files.

Btw, this might add a behavior to all the authorization commands like: 
ALTER TABLE ... SET OWNER UNKNOWN 

Do we want to support that? Do we need the UNKNOWN on the PrincipalType?


- Sergio Pena


On May 17, 2018, 3:19 p.m., Arjun Mishra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67186/
> ---
> 
> (Updated May 17, 2018, 3:19 p.m.)
> 
> 
> Review request for hive and Sergio Pena.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> We need to include type UNKNOWN to PrincipalType to match with 
> HivePrincipal.HivePrincipalType.UKNOWN
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/PrincipalType.java
>  82eb8fd700 
> 
> 
> Diff: https://reviews.apache.org/r/67186/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Arjun Mishra
> 
>

Re: [VOTE] Stricter commit guidelines

2018-05-17 Thread Кривенко Ігор

+1 for stricter commit guidelines.
I faced with some flaky tests.  Mainly,  it is TestTezPerfCliTests and some
of LlapLocalCliDriver.

Thanks, Igor.

On Thu, May 17, 2018, 22:54 Mithun RK  wrote:

> bq. run tests on 10 “noop" patches
>
> Good idea, I think. I've upped NOOP patches on some of my JIRAs before,
> only to establish the "acceptable" baseline of failing tests. One problem
> was that failing-tests tended to change over time, so this needed
> repeating.:/
>
> The tighter commit rules are a welcome change.
>
> Mithun
>
> On Thu, May 17, 2018 at 12:31 PM Sergey Shelukhin 
> wrote:
>
> > I am actually hitting all kinds of test failures clearly unrelated to my
> > patches now…
> > Should we create 10 jiras and run tests on 10 “noop" patches to find
> which
> > tests are flaky?
> >
> > On 18/5/16, 22:58, "Jesus Camacho Rodriguez" 
> wrote:
> >
> > >The vote passes with 19 +1s. Thanks for voting and supporting the
> > >initiative, it has been encouraging to see this reaction from the
> > >community.
> > >
> > >I have changed the committers guide as agreed [2]. We do not have
> > >consistent clean runs yet, hence we have more work ahead. Please, get
> > >involved identifying and fixing those flaky tests so we can move to
> > >normal development speed as soon as possible.
> > >
> > >From now on, no commits should happen without a clean run, every
> > >committer should enforce this policy.
> > >
> > >Thanks,
> > >-Jesús
> > >
> > >
> > >On 5/16/18, 3:58 PM, "Mithun RK"  wrote:
> > >
> > >+1
> > >
> > >On Wed, May 16, 2018 at 1:40 PM Yongzhi Chen 
> > >wrote:
> > >
> > >> +1
> > >>
> > >> On Tue, May 15, 2018 at 9:59 PM, Siddharth Seth  >
> > >wrote:
> > >>
> > >> > +1
> > >> >
> > >> > On Mon, May 14, 2018 at 10:44 PM, Jesus Camacho Rodriguez <
> > >> > jcama...@apache.org> wrote:
> > >> >
> > >> > > After work has been done to ignore most of the tests that were
> > >failing
> > >> > > consistently/intermittently [1], I wanted to start this vote
> to
> > >gather
> > >> > > support from the community to be stricter wrt committing
> > >patches to
> > >> Hive.
> > >> > > The committers guide [2] already specifies that a +1 should be
> > >obtained
> > >> > > before committing, but there is another clause that allows
> > >committing
> > >> > under
> > >> > > the presence of flaky tests (clause 4). Flaky tests are as
> good
> > >as
> > >> having
> > >> > > no tests, hence I propose to remove clause 4 and enforce the
> +1
> > >from
> > >> > > testing infra before committing.
> > >> > >
> > >> > >
> > >> > >
> > >> > > As I see it, by enforcing that we always get a +1 from the
> > >testing
> > >> infra
> > >> > > before committing, 1) we will have a more stable project, and
> > >2) we
> > >> will
> > >> > > have another incentive as a community to create a more robust
> > >testing
> > >> > > infra, e.g., replacing flaky tests for similar unit tests that
> > >are not
> > >> > > flaky, trying to decrease running time for tests, etc.
> > >> > >
> > >> > >
> > >> > >
> > >> > > Please, share your thoughts about this.
> > >> > >
> > >> > >
> > >> > >
> > >> > > Here is my +1.
> > >> > >
> > >> > >
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > Jesús
> > >> > >
> > >> > >
> > >> > >
> > >> > > [1] http://mail-archives.apache.org/mod_mbox/hive-dev/201805.
> > >> > > mbox/%3C63023673-AEE5-41A9-BA52-5A5DFB2078B6%40apache.org%3E
> > >> > >
> > >> > > [2] https://cwiki.apache.org/confluence/display/Hive/
> > >> > > HowToCommit#HowToCommit-PreCommitruns,andcommittingpatches
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> > >
> >
> >
>

Re: [VOTE] Stricter commit guidelines

2018-05-17 Thread Mithun RK

bq. run tests on 10 “noop" patches

Good idea, I think. I've upped NOOP patches on some of my JIRAs before,
only to establish the "acceptable" baseline of failing tests. One problem
was that failing-tests tended to change over time, so this needed
repeating.:/

The tighter commit rules are a welcome change.

Mithun

On Thu, May 17, 2018 at 12:31 PM Sergey Shelukhin 
wrote:

> I am actually hitting all kinds of test failures clearly unrelated to my
> patches now…
> Should we create 10 jiras and run tests on 10 “noop" patches to find which
> tests are flaky?
>
> On 18/5/16, 22:58, "Jesus Camacho Rodriguez"  wrote:
>
> >The vote passes with 19 +1s. Thanks for voting and supporting the
> >initiative, it has been encouraging to see this reaction from the
> >community.
> >
> >I have changed the committers guide as agreed [2]. We do not have
> >consistent clean runs yet, hence we have more work ahead. Please, get
> >involved identifying and fixing those flaky tests so we can move to
> >normal development speed as soon as possible.
> >
> >From now on, no commits should happen without a clean run, every
> >committer should enforce this policy.
> >
> >Thanks,
> >-Jesús
> >
> >
> >On 5/16/18, 3:58 PM, "Mithun RK"  wrote:
> >
> >+1
> >
> >On Wed, May 16, 2018 at 1:40 PM Yongzhi Chen 
> >wrote:
> >
> >> +1
> >>
> >> On Tue, May 15, 2018 at 9:59 PM, Siddharth Seth 
> >wrote:
> >>
> >> > +1
> >> >
> >> > On Mon, May 14, 2018 at 10:44 PM, Jesus Camacho Rodriguez <
> >> > jcama...@apache.org> wrote:
> >> >
> >> > > After work has been done to ignore most of the tests that were
> >failing
> >> > > consistently/intermittently [1], I wanted to start this vote to
> >gather
> >> > > support from the community to be stricter wrt committing
> >patches to
> >> Hive.
> >> > > The committers guide [2] already specifies that a +1 should be
> >obtained
> >> > > before committing, but there is another clause that allows
> >committing
> >> > under
> >> > > the presence of flaky tests (clause 4). Flaky tests are as good
> >as
> >> having
> >> > > no tests, hence I propose to remove clause 4 and enforce the +1
> >from
> >> > > testing infra before committing.
> >> > >
> >> > >
> >> > >
> >> > > As I see it, by enforcing that we always get a +1 from the
> >testing
> >> infra
> >> > > before committing, 1) we will have a more stable project, and
> >2) we
> >> will
> >> > > have another incentive as a community to create a more robust
> >testing
> >> > > infra, e.g., replacing flaky tests for similar unit tests that
> >are not
> >> > > flaky, trying to decrease running time for tests, etc.
> >> > >
> >> > >
> >> > >
> >> > > Please, share your thoughts about this.
> >> > >
> >> > >
> >> > >
> >> > > Here is my +1.
> >> > >
> >> > >
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Jesús
> >> > >
> >> > >
> >> > >
> >> > > [1] http://mail-archives.apache.org/mod_mbox/hive-dev/201805.
> >> > > mbox/%3C63023673-AEE5-41A9-BA52-5A5DFB2078B6%40apache.org%3E
> >> > >
> >> > > [2] https://cwiki.apache.org/confluence/display/Hive/
> >> > > HowToCommit#HowToCommit-PreCommitruns,andcommittingpatches
> >> > >
> >> > >
> >> > >
> >> > >
> >> >
> >>
> >
> >
> >
>
>

Re: [VOTE] Stricter commit guidelines

2018-05-17 Thread Sergey Shelukhin

https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20tex
t%20~%20%22%5C%22NOOP%20jira%5C%22%22

Anything that fails on these JIRAs should be fixed or disabled :)

On 18/5/17, 12:32, "Sergey Shelukhin"  wrote:

>Let me just do that...
>
>On 18/5/17, 12:31, "Sergey Shelukhin"  wrote:
>
>>I am actually hitting all kinds of test failures clearly unrelated to my
>>patches now…
>>Should we create 10 jiras and run tests on 10 “noop" patches to find
>>which
>>tests are flaky?
>>
>>On 18/5/16, 22:58, "Jesus Camacho Rodriguez"  wrote:
>>
>>>The vote passes with 19 +1s. Thanks for voting and supporting the
>>>initiative, it has been encouraging to see this reaction from the
>>>community.
>>>
>>>I have changed the committers guide as agreed [2]. We do not have
>>>consistent clean runs yet, hence we have more work ahead. Please, get
>>>involved identifying and fixing those flaky tests so we can move to
>>>normal development speed as soon as possible.
>>>
>>>From now on, no commits should happen without a clean run, every
>>>committer should enforce this policy.
>>>
>>>Thanks,
>>>-Jesús
>>>
>>>
>>>On 5/16/18, 3:58 PM, "Mithun RK"  wrote:
>>>
>>>+1
>>>
>>>On Wed, May 16, 2018 at 1:40 PM Yongzhi Chen 
>>>wrote:
>>>
>>>> +1
>>>>
>>>> On Tue, May 15, 2018 at 9:59 PM, Siddharth Seth 
>>>wrote:
>>>>
>>>> > +1
>>>> >
>>>> > On Mon, May 14, 2018 at 10:44 PM, Jesus Camacho Rodriguez <
>>>> > jcama...@apache.org> wrote:
>>>> >
>>>> > > After work has been done to ignore most of the tests that were
>>>failing
>>>> > > consistently/intermittently [1], I wanted to start this vote
>>>to
>>>gather
>>>> > > support from the community to be stricter wrt committing
>>>patches to
>>>> Hive.
>>>> > > The committers guide [2] already specifies that a +1 should be
>>>obtained
>>>> > > before committing, but there is another clause that allows
>>>committing
>>>> > under
>>>> > > the presence of flaky tests (clause 4). Flaky tests are as
>>>good
>>>as
>>>> having
>>>> > > no tests, hence I propose to remove clause 4 and enforce the
>>>+1
>>>from
>>>> > > testing infra before committing.
>>>> > >
>>>> > >
>>>> > >
>>>> > > As I see it, by enforcing that we always get a +1 from the
>>>testing
>>>> infra
>>>> > > before committing, 1) we will have a more stable project, and
>>>2) we
>>>> will
>>>> > > have another incentive as a community to create a more robust
>>>testing
>>>> > > infra, e.g., replacing flaky tests for similar unit tests that
>>>are not
>>>> > > flaky, trying to decrease running time for tests, etc.
>>>> > >
>>>> > >
>>>> > >
>>>> > > Please, share your thoughts about this.
>>>> > >
>>>> > >
>>>> > >
>>>> > > Here is my +1.
>>>> > >
>>>> > >
>>>> > >
>>>> > > Thanks,
>>>> > >
>>>> > > Jesús
>>>> > >
>>>> > >
>>>> > >
>>>> > > [1] http://mail-archives.apache.org/mod_mbox/hive-dev/201805.
>>>> > > mbox/%3C63023673-AEE5-41A9-BA52-5A5DFB2078B6%40apache.org%3E
>>>> > >
>>>> > > [2] https://cwiki.apache.org/confluence/display/Hive/
>>>> > > HowToCommit#HowToCommit-PreCommitruns,andcommittingpatches
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> >
>>>>
>>>
>>>
>>>
>>
>

[jira] [Created] (HIVE-19596) CLONE - CLONE - CLONE - CLONE - CLONE - CLONE - NOOP jira to see which tests are flaky on HiveQA

2018-05-17 Thread Sergey Shelukhin (JIRA)

Sergey Shelukhin created HIVE-19596:
---

 Summary: CLONE - CLONE - CLONE - CLONE - CLONE - CLONE - NOOP jira 
to see which tests are flaky on HiveQA
 Key: HIVE-19596
 URL: https://issues.apache.org/jira/browse/HIVE-19596
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19594) CLONE - CLONE - CLONE - CLONE - NOOP jira to see which tests are flaky on HiveQA

2018-05-17 Thread Sergey Shelukhin (JIRA)

Sergey Shelukhin created HIVE-19594:
---

 Summary: CLONE - CLONE - CLONE - CLONE - NOOP jira to see which 
tests are flaky on HiveQA
 Key: HIVE-19594
 URL: https://issues.apache.org/jira/browse/HIVE-19594
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19592) CLONE - CLONE - NOOP jira to see which tests are flaky on HiveQA

2018-05-17 Thread Sergey Shelukhin (JIRA)

Sergey Shelukhin created HIVE-19592:
---

 Summary: CLONE - CLONE - NOOP jira to see which tests are flaky on 
HiveQA
 Key: HIVE-19592
 URL: https://issues.apache.org/jira/browse/HIVE-19592
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19593) CLONE - CLONE - CLONE - NOOP jira to see which tests are flaky on HiveQA

2018-05-17 Thread Sergey Shelukhin (JIRA)

Sergey Shelukhin created HIVE-19593:
---

 Summary: CLONE - CLONE - CLONE - NOOP jira to see which tests are 
flaky on HiveQA
 Key: HIVE-19593
 URL: https://issues.apache.org/jira/browse/HIVE-19593
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19591) CLONE - NOOP jira to see which tests are flaky on HiveQA

2018-05-17 Thread Sergey Shelukhin (JIRA)

Sergey Shelukhin created HIVE-19591:
---

 Summary: CLONE - NOOP jira to see which tests are flaky on HiveQA
 Key: HIVE-19591
 URL: https://issues.apache.org/jira/browse/HIVE-19591
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19595) CLONE - CLONE - CLONE - CLONE - CLONE - NOOP jira to see which tests are flaky on HiveQA

2018-05-17 Thread Sergey Shelukhin (JIRA)

Sergey Shelukhin created HIVE-19595:
---

 Summary: CLONE - CLONE - CLONE - CLONE - CLONE - NOOP jira to see 
which tests are flaky on HiveQA
 Key: HIVE-19595
 URL: https://issues.apache.org/jira/browse/HIVE-19595
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19589) NOOP jira to see which tests are flaky on HiveQA

2018-05-17 Thread Sergey Shelukhin (JIRA)

Sergey Shelukhin created HIVE-19589:
---

 Summary: NOOP jira to see which tests are flaky on HiveQA
 Key: HIVE-19589
 URL: https://issues.apache.org/jira/browse/HIVE-19589
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19590) CLONE - NOOP jira to see which tests are flaky on HiveQA

2018-05-17 Thread Sergey Shelukhin (JIRA)

Sergey Shelukhin created HIVE-19590:
---

 Summary: CLONE - NOOP jira to see which tests are flaky on HiveQA
 Key: HIVE-19590
 URL: https://issues.apache.org/jira/browse/HIVE-19590
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [VOTE] Stricter commit guidelines

2018-05-17 Thread Sergey Shelukhin

Let me just do that...

On 18/5/17, 12:31, "Sergey Shelukhin"  wrote:

>I am actually hitting all kinds of test failures clearly unrelated to my
>patches now…
>Should we create 10 jiras and run tests on 10 “noop" patches to find which
>tests are flaky?
>
>On 18/5/16, 22:58, "Jesus Camacho Rodriguez"  wrote:
>
>>The vote passes with 19 +1s. Thanks for voting and supporting the
>>initiative, it has been encouraging to see this reaction from the
>>community.
>>
>>I have changed the committers guide as agreed [2]. We do not have
>>consistent clean runs yet, hence we have more work ahead. Please, get
>>involved identifying and fixing those flaky tests so we can move to
>>normal development speed as soon as possible.
>>
>>From now on, no commits should happen without a clean run, every
>>committer should enforce this policy.
>>
>>Thanks,
>>-Jesús
>>
>>
>>On 5/16/18, 3:58 PM, "Mithun RK"  wrote:
>>
>>+1
>>
>>On Wed, May 16, 2018 at 1:40 PM Yongzhi Chen 
>>wrote:
>>
>>> +1
>>>
>>> On Tue, May 15, 2018 at 9:59 PM, Siddharth Seth 
>>wrote:
>>>
>>> > +1
>>> >
>>> > On Mon, May 14, 2018 at 10:44 PM, Jesus Camacho Rodriguez <
>>> > jcama...@apache.org> wrote:
>>> >
>>> > > After work has been done to ignore most of the tests that were
>>failing
>>> > > consistently/intermittently [1], I wanted to start this vote to
>>gather
>>> > > support from the community to be stricter wrt committing
>>patches to
>>> Hive.
>>> > > The committers guide [2] already specifies that a +1 should be
>>obtained
>>> > > before committing, but there is another clause that allows
>>committing
>>> > under
>>> > > the presence of flaky tests (clause 4). Flaky tests are as good
>>as
>>> having
>>> > > no tests, hence I propose to remove clause 4 and enforce the +1
>>from
>>> > > testing infra before committing.
>>> > >
>>> > >
>>> > >
>>> > > As I see it, by enforcing that we always get a +1 from the
>>testing
>>> infra
>>> > > before committing, 1) we will have a more stable project, and
>>2) we
>>> will
>>> > > have another incentive as a community to create a more robust
>>testing
>>> > > infra, e.g., replacing flaky tests for similar unit tests that
>>are not
>>> > > flaky, trying to decrease running time for tests, etc.
>>> > >
>>> > >
>>> > >
>>> > > Please, share your thoughts about this.
>>> > >
>>> > >
>>> > >
>>> > > Here is my +1.
>>> > >
>>> > >
>>> > >
>>> > > Thanks,
>>> > >
>>> > > Jesús
>>> > >
>>> > >
>>> > >
>>> > > [1] http://mail-archives.apache.org/mod_mbox/hive-dev/201805.
>>> > > mbox/%3C63023673-AEE5-41A9-BA52-5A5DFB2078B6%40apache.org%3E
>>> > >
>>> > > [2] https://cwiki.apache.org/confluence/display/Hive/
>>> > > HowToCommit#HowToCommit-PreCommitruns,andcommittingpatches
>>> > >
>>> > >
>>> > >
>>> > >
>>> >
>>>
>>
>>
>>
>

Re: [VOTE] Stricter commit guidelines

2018-05-17 Thread Sergey Shelukhin

I am actually hitting all kinds of test failures clearly unrelated to my
patches now…
Should we create 10 jiras and run tests on 10 “noop" patches to find which
tests are flaky?

On 18/5/16, 22:58, "Jesus Camacho Rodriguez"  wrote:

>The vote passes with 19 +1s. Thanks for voting and supporting the
>initiative, it has been encouraging to see this reaction from the
>community.
>
>I have changed the committers guide as agreed [2]. We do not have
>consistent clean runs yet, hence we have more work ahead. Please, get
>involved identifying and fixing those flaky tests so we can move to
>normal development speed as soon as possible.
>
>From now on, no commits should happen without a clean run, every
>committer should enforce this policy.
>
>Thanks,
>-Jesús
>
>
>On 5/16/18, 3:58 PM, "Mithun RK"  wrote:
>
>+1
>
>On Wed, May 16, 2018 at 1:40 PM Yongzhi Chen 
>wrote:
>
>> +1
>>
>> On Tue, May 15, 2018 at 9:59 PM, Siddharth Seth 
>wrote:
>>
>> > +1
>> >
>> > On Mon, May 14, 2018 at 10:44 PM, Jesus Camacho Rodriguez <
>> > jcama...@apache.org> wrote:
>> >
>> > > After work has been done to ignore most of the tests that were
>failing
>> > > consistently/intermittently [1], I wanted to start this vote to
>gather
>> > > support from the community to be stricter wrt committing
>patches to
>> Hive.
>> > > The committers guide [2] already specifies that a +1 should be
>obtained
>> > > before committing, but there is another clause that allows
>committing
>> > under
>> > > the presence of flaky tests (clause 4). Flaky tests are as good
>as
>> having
>> > > no tests, hence I propose to remove clause 4 and enforce the +1
>from
>> > > testing infra before committing.
>> > >
>> > >
>> > >
>> > > As I see it, by enforcing that we always get a +1 from the
>testing
>> infra
>> > > before committing, 1) we will have a more stable project, and
>2) we
>> will
>> > > have another incentive as a community to create a more robust
>testing
>> > > infra, e.g., replacing flaky tests for similar unit tests that
>are not
>> > > flaky, trying to decrease running time for tests, etc.
>> > >
>> > >
>> > >
>> > > Please, share your thoughts about this.
>> > >
>> > >
>> > >
>> > > Here is my +1.
>> > >
>> > >
>> > >
>> > > Thanks,
>> > >
>> > > Jesús
>> > >
>> > >
>> > >
>> > > [1] http://mail-archives.apache.org/mod_mbox/hive-dev/201805.
>> > > mbox/%3C63023673-AEE5-41A9-BA52-5A5DFB2078B6%40apache.org%3E
>> > >
>> > > [2] https://cwiki.apache.org/confluence/display/Hive/
>> > > HowToCommit#HowToCommit-PreCommitruns,andcommittingpatches
>> > >
>> > >
>> > >
>> > >
>> >
>>
>
>
>

[jira] [Created] (HIVE-19588) Several invocation of file listing when creating VectorizedOrcAcidRowBatchReader

2018-05-17 Thread Prasanth Jayachandran (JIRA)

Prasanth Jayachandran created HIVE-19588:


 Summary: Several invocation of file listing when creating 
VectorizedOrcAcidRowBatchReader
 Key: HIVE-19588
 URL: https://issues.apache.org/jira/browse/HIVE-19588
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.1.0
Reporter: Nita Dembla
Assignee: Prasanth Jayachandran
 Attachments: Screen Shot 2018-05-16 at 2.23.25 PM.png

Looks like we are doing file listing several times when creating one instance 
of VectorizedOrcAcidRowBatchReader
AcidUtils.parseBaseOrDeltaBucketFilename() does full file listing (when there 
are files with bucket_* prefix) just to get a single file out of a path to 
figure out if it has ACID schema (as part of 
https://issues.apache.org/jira/browse/HIVE-18190).
There is full file listing where we populate
1) ColumnizedDeleteEventRegistry
2) SortMergedDeleteEventRegistry
3) Twice in computeOffsetAndBucket()

 

Attaching profiles which [~gopalv] took while debugging. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re:Re: hive throw error "Invalid input path"

2018-05-17 Thread 赵杨

To whom it may concern
I set the parameter "hive.exec.mode.local.auto" to true. Then I used JDBC mode 
to submit sql to hiveserver2 in a multithreaded manner. Sometimes exceptions 
are thrown when several tasks run in parallel in the local mode.

"Invalid input path xxx"
But this path is not the input path of the current SQL, it is another SQL input 
path.
So I suspect that one of the SQL covers some information about another SQL.
So I looked up the source code and found a parameter, and when I set the 
"iocontext.input.name" to the unique value, I didn't get it wrong.
I don't know if this is a bug.
I modified the ExecDriver.execute method to add the following line.
Random rand = new Random();  
job.set(Utilities.INPUT_NAME, "mr_"+rand.nextInt(10));


Thank you for reading this email. This question has been bothering me for a 
long time. Your early reply will be appreciated.


Thanks

Pull Request for Proxy Settings

2018-05-17 Thread Ekrem YILMAZ

Hi guys,

I created a pull request for getting proxy settings from env or system
properties.
All descriptions are available on the pull request.

https://github.com/apache/hive/pull/338

Do you know when this pull request will be handled or if I have forgotten
some steps during pull request process ?

Thanks a lot.

Ekrem

RE: May 2018 Hive User Group Meeting

2018-05-17 Thread roberto.tardio

Hi,

 

If you have recorded the meeting share link please. I could not follow it 
online for the schedule (I live in Spain).

 

Kind Regards,

 

 

From: Luis Figueroa [mailto:lef...@outlook.com] 
Sent: miércoles, 9 de mayo de 2018 18:01
To: u...@hive.apache.org
Cc: dev@hive.apache.org
Subject: Re: May 2018 Hive User Group Meeting

 

Hey everyone,  

 

Was the meeting recorded by any chance? 

Luis


On May 8, 2018, at 5:31 PM, Sahil Takiar  > wrote:

Hey Everyone, 

 

Almost time for the meetup! The live stream can be viewed on this link: 
https://live.lifesizecloud.com/extension/2000992219?token=067078ac-a8df-45bc-b84c-4b371ecbc719
 

 ==en=Hive%20User%20Group%20Meetup

The stream won't be live until the meetup starts.

For those attending in person, there will be guest wifi:

Login: HiveMeetup
Password: ClouderaHive

 

On Mon, May 7, 2018 at 12:48 PM, Sahil Takiar  > wrote:

Hey Everyone, 

 

The meetup is only a day away! Here 

  is a link to all the abstracts we have compiled thus far. Several of you have 
asked about event streaming and recordings. The meetup will be both streamed 
live and recorded. We will post the links on this thread and on the meetup link 
tomorrow closer to the start of the meetup.

 

The meetup will be at Cloudera HQ - 395 Page Mill Rd. If you have any trouble 
getting into the building, feel free to post on the meetup link.

 

Meetup Link: https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/

 

On Wed, May 2, 2018 at 7:48 AM, Sahil Takiar  > wrote:

Hey Everyone,

 

The agenda for the meetup has been set and I'm excited to say we have lots of 
interesting talks scheduled! Below is final agenda, the full list of abstracts 
will be sent out soon. If you are planning to attend, please RSVP on the meetup 
link so we can get an accurate headcount of attendees ( 
 
https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/).


6:30 - 7:00 PM Networking and Refreshments 

7:00PM - 8:20 PM Lightning Talks (10 min each) - 8 talks total

· What's new in Hive 3.0.0 - Ashutosh Chauhan

* Hive-on-Spark at Uber: Efficiency & Scale - Xuefu Zhang

* Hive-on-S3 Performance: Past, Present, and Future - Sahil Takiar

* Dali: Data Access Layer at LinkedIn - Adwait Tumbde

· Parquet Vectorization in Hive - Vihang Karajgaonkar 

* ORC Column Level Encryption - Owen O’Malley

· Running Hive at Scale @ Lyft - Sharanya Santhanam, Rohit Menon

* Materialized Views in Hive - Jesus Camacho Rodriguez

8:30 PM - 9:00 PM Hive Metastore Panel

* Moderator: Vihang Karajgaonkar

* Participants: 

oDaniel Dai - Hive Metastore Caching

oAlan Gates - Hive Metastore Separation

oRituparna Agrawal - Customer Use Cases & Pain Points of (Big) Metadata

The Metastore panel will consist of a short presentation by each panelist 
followed by a Q session driven by the moderator.

 

On Tue, Apr 24, 2018 at 2:53 PM, Sahil Takiar  > wrote:

We still have a few slots open for lightening talks, so if anyone is interested 
in giving a presentation don't hesitate to reach out! 

 

If you are planning to attend the meetup, please RSVP on the Meetup link 
(https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/) so that we 
can get an accurate headcount for food.

 

Thanks!

 

--Sahil

 

On Wed, Apr 11, 2018 at 5:08 PM, Sahil Takiar  > wrote:

Hi all,

I'm happy to announce that the Hive community is organizing a Hive user group 
meeting in the Bay Area next month. The details can be found at 
https://www.meetup.com/Hive-User-Group-Meeting/events/249641278/ 


The format of this meetup will be slightly different from previous ones. There 
will be one hour dedicated to lightning talks, followed by a group discussion 
on the future of the Hive Metastore.

We are inviting talk proposals from Hive users as well as developers at this 
time. Please contact either myself (takiar.sa...@gmail.com 
 ), Vihang Karajgaonkar (vih...@cloudera.com 
 ), or Peter Vary (pv...@cloudera.com 
 ) with proposals. We currently have 5 openings.

Please let me know if you have any questions or suggestions.

Thanks,
Sahil





 

-- 

Sahil Takiar

Software Engineer
takiar.sa...@gmail.com   | (510) 673-0309





 

-- 

Sahil Takiar

Software

[jira] [Created] (HIVE-19587) HeartBeat thread uses cancelled delegation token while connecting to meta on KERBEROS cluster

2018-05-17 Thread Oleksiy Sayankin (JIRA)

Oleksiy Sayankin created HIVE-19587:
---

 Summary: HeartBeat thread uses cancelled delegation token while 
connecting to meta on KERBEROS cluster
 Key: HIVE-19587
 URL: https://issues.apache.org/jira/browse/HIVE-19587
 Project: Hive
  Issue Type: Bug
Reporter: Oleksiy Sayankin
Assignee: Oleksiy Sayankin


*STEP 1. Create test data*

{code}
create table t1 (id int) ROW FORMAT DELIMITED FIELDS TERMINATED BY  ",";
create table t2 (id int) ROW FORMAT DELIMITED FIELDS TERMINATED BY  ",";
{code}

Generate 10 000 000 lines of random data

{code}
package com.test.app;

import java.io.FileNotFoundException;
import java.io.PrintWriter;
import java.util.concurrent.ThreadLocalRandom;

public class App {
  public static void main(String[] args) throws FileNotFoundException {
try (PrintWriter out = new PrintWriter("table.data");) {
  int min = 0;
  int max = 10_000;
  int numRows = 10_000_000;
  for (int i = 0; i <= numRows - 1; i++){
int randomNum = ThreadLocalRandom.current().nextInt(min, max + 1);
out.println(randomNum);
  }
}
  }
}
{code}

Upload data to Hive tables

{code}
load data local inpath '/home/myuser/table.data' into table t1;
load data local inpath '/home/myuser/table.data' into table t2;
{code}

*STEP 2. Configure transactions in hive-site.xml*

{code}


   
  hive.exec.dynamic.partition.mode
  nonstrict
   
   
  hive.support.concurrency
  true
   
   
  hive.enforce.bucketing
  true
   
   
  hive.txn.manager
  org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
   
   
  hive.compactor.initiator.on
  true
   
   
  hive.compactor.worker.threads
  1
   
{code}

*STEP 3. Configure hive.txn.timeout in hive-site.xml*

{code}

   
  hive.txn.timeout
  10s
   
{code}

*STEP 4. Connect via beeline to HS2 with KERBEROS*

{code}
!connect 
jdbc:hive2://node8.cluster:1/default;principal=myuser/node8.cluster@NODE8;ssl=true;sslTrustStore=/opt/myuser/conf/ssl_truststore
{code}

{code}
select count(*) from t1;
{code}

*STEP 5. Close connection and reconnect*

{code}
!close
{code}

{code}
!connect 
jdbc:hive2://node8.cluster:1/default;principal=myuser/node8.cluster@NODE8;ssl=true;sslTrustStore=/opt/myuser/conf/ssl_truststore
{code}

*STEP 6. Perform long playing query*

This query lasts about 600s

{code}
select count(*) from t1 join t2 on t1.id = t2.id;
{code}


*EXPECTED RESULT*

Query finishes successfully

*ACTUAL RESULT*

{code}
2018-05-17T13:54:54,921 ERROR [pool-7-thread-10] transport.TSaslTransport: SASL 
negotiation failure
javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password
at 
com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598)
 
at 
com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
at 
org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:539)
 
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283)
at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
 
at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
 
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:663)
 
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:660)
 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:360)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1613)
 
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:660)
 
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: token 
expired or does not exist: owner=myuser, renewer=myuser, realUser=, 
issueDate=1526565229297, maxDate=1527170029297, sequenceNumber=1, masterKeyId=1
at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:104)
 
at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:56)
at

Review Request 67186: HIVE-19585: Add UNKNOWN to PrincipalType

2018-05-17 Thread Arjun Mishra via Review Board


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67186/
---

Review request for hive and Sergio Pena.


Repository: hive-git


Description
---

We need to include type UNKNOWN to PrincipalType to match with 
HivePrincipal.HivePrincipalType.UKNOWN


Diffs
-

  
standalone-metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/PrincipalType.java
 82eb8fd700 


Diff: https://reviews.apache.org/r/67186/diff/1/


Testing
---


Thanks,

Arjun Mishra

[jira] [Created] (HIVE-19586) Optimize Count(distinct X) pushdown based on the storage capabilities

2018-05-17 Thread slim bouguerra (JIRA)

slim bouguerra created HIVE-19586:
-

 Summary: Optimize Count(distinct X) pushdown based on the storage 
capabilities 
 Key: HIVE-19586
 URL: https://issues.apache.org/jira/browse/HIVE-19586
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration, Logical Optimizer
Reporter: slim bouguerra
Assignee: slim bouguerra
 Fix For: 3.0.0


h1. Goal
Provide a way to rewrite queries with combination of COUNT(Distinct) and 
Aggregates like SUM as a series of Group By.
This can be useful to push down to Druid queries like 
{code}
 select count(DISTINCT interval_marker), count (distinct dim), sum(num_l) FROM 
druid_test_table GROUP  BY `__time`, `zone` ;
{code}
In general this can be useful to be used in cases where storage handlers can 
not perform count (distinct column)

h1. How to do it.
Use the Calcite rule {code} 
org.apache.calcite.rel.rules.AggregateExpandDistinctAggregatesRule{code} that 
breaks down Count distinct to a single Group by with Grouping sets or multiple 
series of Group by that might be linked with Joins if multiple counts are 
present.
FYI today Hive does have a similar rule {code} 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveExpandDistinctAggregatesRule{code},
 but it only provides a rewrite to Grouping sets based plan.
I am planing to use the actual Calcite rule, [~ashutoshc] any concerns or 
caveats to be aware of?

h2. Concerns/questions
Need to have a way to switch between Grouping sets or Simple chained group by 
based on the plan cost. For instance for Druid based scan it makes always sense 
(at least today) to push down a series of Group by and stitch result sets in 
Hive later (as oppose to scan everything). 
But this might be not true for other storage handler that can handle Grouping 
sets it is better to push down the Grouping sets as one table scan.
Am still unsure how i can lean on the cost optimizer to select the best plan, 
[~ashutoshc]/[~jcamachorodriguez] any inputs?






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19585) Add UNKNOWN to PrincipalType

2018-05-17 Thread Arjun Mishra (JIRA)

Arjun Mishra created HIVE-19585:
---

 Summary: Add UNKNOWN to PrincipalType
 Key: HIVE-19585
 URL: https://issues.apache.org/jira/browse/HIVE-19585
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.3.2
Reporter: Arjun Mishra
Assignee: Arjun Mishra
 Fix For: 2.3.2






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19584) Dictionary encoding for string types

2018-05-17 Thread Teddy Choi (JIRA)

Teddy Choi created HIVE-19584:
-

 Summary: Dictionary encoding for string types
 Key: HIVE-19584
 URL: https://issues.apache.org/jira/browse/HIVE-19584
 Project: Hive
  Issue Type: Sub-task
Reporter: Teddy Choi
Assignee: Teddy Choi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 67073: HIVE-19370 : Retain time part in add_months function on timestamp datatype fields in hive

2018-05-17 Thread Peter Vary via Review Board


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67073/#review203340
---


Ship it!




+1 pending tests

- Peter Vary


On May 16, 2018, 5:48 p.m., Bharathkrishna Guruvayoor Murali wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67073/
> ---
> 
> (Updated May 16, 2018, 5:48 p.m.)
> 
> 
> Review request for hive, Peter Vary, Sahil Takiar, and Vihang Karajgaonkar.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Adding support to retain the time part (HH:mm:ss) for add_months UDF when the 
> input is given as timestamp format.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFAddMonths.java 
> dae4b97b4a17e98122431e5fda655fd9f873fdb5 
>   
> ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFAddMonths.java
>  af9b6c43c7dafc69c4944eab02894786af306f35 
>   ql/src/test/queries/clientpositive/udf_add_months.q 
> 0b8b444fd1657117dec18c2b4e7173767617 
>   ql/src/test/results/clientpositive/udf_add_months.q.out 
> 5ba720ae85d30c0da7f94f377d6b324bce850907 
> 
> 
> Diff: https://reviews.apache.org/r/67073/diff/3/
> 
> 
> Testing
> ---
> 
> Added unit tests.
> 
> 
> Thanks,
> 
> Bharathkrishna Guruvayoor Murali
> 
>

[jira] [Created] (HIVE-19583) Some yetus working dirs are left on hivepest-server-upstream disk after test

2018-05-17 Thread Adam Szita (JIRA)

Adam Szita created HIVE-19583:
-

 Summary: Some yetus working dirs are left on 
hivepest-server-upstream disk after test
 Key: HIVE-19583
 URL: https://issues.apache.org/jira/browse/HIVE-19583
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Adam Szita
Assignee: Adam Szita


PTest's PrepPhase is creating a yetus working folder for each build after 
checking out source code. The source code is then copied into that for yetus. 
This folder is cleaned up after the test executed, so if that doesn't happen 
e.g. due to patch not being applicable the folder is left on the disk. We need 
to remove it in this case too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19582) NPE during CREATE ROLE using SQL Standard Based Hive Authorization

2018-05-17 Thread Alexey Vakulenchuk (JIRA)

Alexey Vakulenchuk created HIVE-19582:
-

 Summary: NPE during CREATE ROLE using SQL Standard Based Hive 
Authorization
 Key: HIVE-19582
 URL: https://issues.apache.org/jira/browse/HIVE-19582
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.3.3
Reporter: Alexey Vakulenchuk


*General Info*

Hive version : 2.3.3
{code:java}
commit 3f7dde31aed44b5440563d3f9d8a8887beccf0be
Author: Daniel Dai 
Date:   Wed Mar 28 16:46:29 2018 -0700

Preparing for 2.3.3 release

{code}
Hadoop version: 2.7.2.

Engine
{code:java}
hive> set hive.execution.engine;
hive.execution.engine=mr{code}
Used official documentation:
 [SQL Standard Based Hive 
Authorization|https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+based+hive+authorization#SQLStandardBasedHiveAuthorization-SetRole]

*Step 1. Configure hive-site.xml*
{code}

hive.server2.enable.doAs
false



hive.users.in.admin.role
admin



hive.security.metastore.authorization.manager
 
org.apache.hadoop.hive.ql.security.authorization.MetaStoreAuthzAPIAuthorizerEmbedOnly



hive.security.authorization.manager

org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory




hive.security.authorization.enabled
true



hive.security.authenticator.manager

org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator

{code}

*Step 2. Create ROLE*

hive> CREATE ROLE role1;
Actual result:
{code}
org.apache.hadoop.hive.ql.metadata.HiveException: Error while invoking 
FailureHook. hooks: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.reexec.ReExecutionOverlayPlugin$LocalHook.run(ReExecutionOverlayPlugin.java:45)
at 
org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296)
at 
org.apache.hadoop.hive.ql.HookRunner.runFailureHooks(HookRunner.java:283)
at org.apache.hadoop.hive.ql.Driver.invokeFailureHooks(Driver.java:2353)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2123)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1756)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1497)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1491)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:204)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
at org.apache.hadoop.util.RunJar.main(RunJar.java:222)

FAILED: Hive Internal Error: 
org.apache.hadoop.hive.ql.metadata.HiveException(Error while invoking 
FailureHook. hooks: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.reexec.ReExecutionOverlayPlugin$LocalHook.run(ReExecutionOverlayPlugin.java:45)
at 
org.apache.hadoop.hive.ql.HookRunner.invokeGeneralHook(HookRunner.java:296)
at 
org.apache.hadoop.hive.ql.HookRunner.runFailureHooks(HookRunner.java:283)
at org.apache.hadoop.hive.ql.Driver.invokeFailureHooks(Driver.java:2353)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2123)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1756)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1497)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1491)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:204)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

[jira] [Created] (HIVE-19581) view do not support unicode characters well

2018-05-17 Thread kai (JIRA)

kai created HIVE-19581:
--

 Summary: view do not support unicode characters well
 Key: HIVE-19581
 URL: https://issues.apache.org/jira/browse/HIVE-19581
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: kai
 Attachments: explain.png, metastore.png

create table t_test (name ,string) ;
 insert into table t_test VALUES ('李四');
 create view t_view_test as select * from t_test where name='李四';

when select  * from t_view_test   no  records return



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-19580) Hive 2.3.2 with ORC files stored on S3 are case sensitive

2018-05-17 Thread Arthur Baudry (JIRA)

Arthur Baudry created HIVE-19580:


 Summary: Hive 2.3.2 with ORC files stored on S3 are case sensitive
 Key: HIVE-19580
 URL: https://issues.apache.org/jira/browse/HIVE-19580
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.3.2
 Environment: AWS S3 to store files

Spark 2.3 but also true for lower versions

Hive 2.3.2
Reporter: Arthur Baudry
 Fix For: 2.3.2


Original file is csv:

COL1,COL2
1,2

ORC file are created with Spark 2.3:

scala> val df = spark.read.option("header","true").csv("/user/hadoop/file")

scala> df.printSchema
root
|-- COL1: string (nullable = true)
|-- COL2: string (nullable = true)

scala> df.write.orc("s3://bucket/prefix")

In Hive:

hive> CREATE EXTERNAL TABLE test_orc(COL1 STRING, COL2 STRING) STORED AS ORC 
LOCATION ("s3://bucket/prefix");

hive> SELECT * FROM test_orc;
OK
NULL NULL

*Everything field is null. However if fields are generated using lower case in 
Spark schemas then everything works.*

The reason why I'm raising this bug is that we have customers using Hive 2.3.2 
to read files we generate through Spark and all our code base is addressing 
fields using upper case while this is incompatible with their Hive instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

39 matches

Mail list logo