date:20160420

[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-20 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251327#comment-15251327
 ] 

Rui Li commented on HIVE-13572:
---

Thanks Ashutosh for the review! I just thought maybe another way to solve this 
is to set the full status on the single file rather than the destination 
folder, which can be done concurrently in the threads. I'll do some test to see 
which is better for performance.

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13572.1.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251306#comment-15251306
 ] 

Ashutosh Chauhan commented on HIVE-13572:
-

+1

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13572.1.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-20 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251214#comment-15251214
 ] 

Rui Li commented on HIVE-13572:
---

[~ashutoshc] would you mind take a look at this? Thanks.

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13572.1.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-20 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-13572:
--
Status: Patch Available  (was: Open)

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13572.1.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-20 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-13572:
--
Attachment: HIVE-13572.1.patch

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-13572.1.patch
>
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-20 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-13572:
--
Description: 
We set full file status in each copy-file thread. I think it's redundant and 
hurts performance when we have multiple files to copy.
{code}
if (inheritPerms) {
  ShimLoader.getHadoopShims().setFullFileStatus(conf, 
fullDestStatus, destFs, destf);
}
{code}

> Redundant setting full file status in Hive::copyFiles
> -
>
> Key: HIVE-13572
> URL: https://issues.apache.org/jira/browse/HIVE-13572
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
>
> We set full file status in each copy-file thread. I think it's redundant and 
> hurts performance when we have multiple files to copy.
> {code}
> if (inheritPerms) {
>   ShimLoader.getHadoopShims().setFullFileStatus(conf, 
> fullDestStatus, destFs, destf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12159) Create vectorized readers for the complex types

2016-04-20 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-12159:
-
Attachment: HIVE-12159.patch

Ok, this is rebase to master, which took a fair amount of work because of 
HIVE-13523. In particular, I've removed the DataReaderFactory and 
MetadataReaderFactory, because they weren't providing any abstraction. They 
were baked into ReaderImpl with final fields. I also merged DataReader and 
MetadataReader so that RecordReaderImpl just has a single connection to HDFS 
open rather than two.

> Create vectorized readers for the complex types
> ---
>
> Key: HIVE-12159
> URL: https://issues.apache.org/jira/browse/HIVE-12159
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, 
> HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch
>
>
> We need vectorized readers for the complex types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12851) Add slider security setting support to LLAP packager

2016-04-20 Thread Andrew Sears (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251102#comment-15251102
 ] 

Andrew Sears commented on HIVE-12851:
-

Thanks Gopal, do you think this can be included as part of Hive packaging/LLAP 
setup or argparse dependency removed?  Likely won't be an issue for customers 
running later versions of Python, though there are probably many who aren't and 
are in locked-down environments.

Here's some information on the recommended way of installing argparse from CA.
https://docops.ca.com/ca-advanced-authentication/8-1/EN/upgrading/verify-prerequisties/install-argparse-without-pip

I think wiki needs some documentation around LLAP setup & dependencies.

> Add slider security setting support to LLAP packager
> 
>
> Key: HIVE-12851
> URL: https://issues.apache.org/jira/browse/HIVE-12851
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.0.0, 2.1.0
>
> Attachments: HIVE-12851.2.patch, HIVE-12851.patch
>
>
> {noformat}
> "slider.hdfs.keytab.dir": "...",
> "slider.am.login.keytab.name": "...",
> "slider.keytab.principal.name": "..."
> {noformat}
> should be emitted into appConfig.json for Slider AM. Right now, they have to 
> be added manually on a secure cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12851) Add slider security setting support to LLAP packager

2016-04-20 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251091#comment-15251091
 ] 

Gopal V commented on HIVE-12851:


Pip is probably not a repeatable install - using python-argparse RPMs is 
probably better

{code}
Installed Packages
Name: python-argparse
Arch: noarch
Version : 1.2.1
Release : 2.el6
Size: 232 k
Repo: installed
>From repo   : epel
Summary : Optparse inspired command line parser for Python
URL : http://code.google.com/p/argparse/
License : Python
{code}

> Add slider security setting support to LLAP packager
> 
>
> Key: HIVE-12851
> URL: https://issues.apache.org/jira/browse/HIVE-12851
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.0.0, 2.1.0
>
> Attachments: HIVE-12851.2.patch, HIVE-12851.patch
>
>
> {noformat}
> "slider.hdfs.keytab.dir": "...",
> "slider.am.login.keytab.name": "...",
> "slider.keytab.principal.name": "..."
> {noformat}
> should be emitted into appConfig.json for Slider AM. Right now, they have to 
> be added manually on a secure cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12851) Add slider security setting support to LLAP packager

2016-04-20 Thread Andrew Sears (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251081#comment-15251081
 ] 

Andrew Sears commented on HIVE-12851:
-

With a Centos 6.5 minimal install and HDP 2.4 / Python 2.6.6 need to run the 
following to get argparse installed

yum -y install epel-release
pip install argparse



> Add slider security setting support to LLAP packager
> 
>
> Key: HIVE-12851
> URL: https://issues.apache.org/jira/browse/HIVE-12851
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.0.0, 2.1.0
>
> Attachments: HIVE-12851.2.patch, HIVE-12851.patch
>
>
> {noformat}
> "slider.hdfs.keytab.dir": "...",
> "slider.am.login.keytab.name": "...",
> "slider.keytab.principal.name": "..."
> {noformat}
> should be emitted into appConfig.json for Slider AM. Right now, they have to 
> be added manually on a secure cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11550) ACID queries pollute HiveConf

2016-04-20 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11550:
--
Description: 
HiveConf is a SessionState level object.  Some ACID related logic makes changes 
to it (which are meant to be per query) but become per SessionState.

See SemanticAnalyzer.checkAcidConstraints()
Also note   HiveConf.setVar(conf, 
HiveConf.ConfVars.DYNAMICPARTITIONINGMODE, "nonstrict");
in UpdateDeleteSemancitAnalzyer

[~alangates], do you know of other cases or ideas on how to deal with this 
differently?


_SortedDynPartitionOptimizer.process()_ is the place to have the logic to do 
_conf.setBoolVar(ConfVars.HIVEOPTSORTDYNAMICPARTITION, false);_ on per query 
basis



  was:
HiveConf is a SessionState level object.  Some ACID related logic makes changes 
to it (which are meant to be per query) but become per SessionState.

See SemanticAnalyzer.checkAcidConstraints()
Also note   HiveConf.setVar(conf, 
HiveConf.ConfVars.DYNAMICPARTITIONINGMODE, "nonstrict");
in UpdateDeleteSemancitAnalzyer

[~alangates], do you know of other cases or ideas on how to deal with this 
differently?


_SortedDynPartitionOptimizer.process()_ is the place to have the logic to do 
_conf.setBoolVar(ConfVars.HIVEOPTSORTDYNAMICPARTITION, false);_ on per query 
basis


> ACID queries pollute HiveConf
> -
>
> Key: HIVE-11550
> URL: https://issues.apache.org/jira/browse/HIVE-11550
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> HiveConf is a SessionState level object.  Some ACID related logic makes 
> changes to it (which are meant to be per query) but become per SessionState.
> See SemanticAnalyzer.checkAcidConstraints()
> Also note   HiveConf.setVar(conf, 
> HiveConf.ConfVars.DYNAMICPARTITIONINGMODE, "nonstrict");
> in UpdateDeleteSemancitAnalzyer
> [~alangates], do you know of other cases or ideas on how to deal with this 
> differently?
> _SortedDynPartitionOptimizer.process()_ is the place to have the logic to do 
> _conf.setBoolVar(ConfVars.HIVEOPTSORTDYNAMICPARTITION, false);_ on per query 
> basis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13570) Some query with Union all fails when CBO is off

2016-04-20 Thread Yongzhi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-13570:

Status: Patch Available  (was: Open)

Need code review.

> Some query with Union all fails when CBO is off
> ---
>
> Key: HIVE-13570
> URL: https://issues.apache.org/jira/browse/HIVE-13570
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-13570.1.PATCH
>
>
> Some queries with union all throws IndexOutOfBoundsException
> when:
> set hive.cbo.enable=false;
> set hive.ppd.remove.duplicatefilters=true;
> The stack is as:
> {noformat}
> java.lang.IndexOutOfBoundsException: Index: 67, Size: 67 
> at java.util.ArrayList.rangeCheck(ArrayList.java:635) 
> at java.util.ArrayList.get(ArrayList.java:411) 
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx.genColLists(ColumnPrunerProcCtx.java:161)
>  
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx.handleFilterUnionChildren(ColumnPrunerProcCtx.java:273)
>  
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory$ColumnPrunerFilterProc.process(ColumnPrunerProcFactory.java:108)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
>  
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPruner$ColumnPrunerWalker.walk(ColumnPruner.java:172)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
>  
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPruner.transform(ColumnPruner.java:135)
>  
> at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:198) 
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10327)
>  
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)
>  
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
>  
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:432) 
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305) 
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1119) 
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1167) 
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1055) 
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:403) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:419) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708) 
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) 
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13570) Some query with Union all fails when CBO is off

2016-04-20 Thread Yongzhi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-13570:

Attachment: HIVE-13570.1.PATCH

> Some query with Union all fails when CBO is off
> ---
>
> Key: HIVE-13570
> URL: https://issues.apache.org/jira/browse/HIVE-13570
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-13570.1.PATCH
>
>
> Some queries with union all throws IndexOutOfBoundsException
> when:
> set hive.cbo.enable=false;
> set hive.ppd.remove.duplicatefilters=true;
> The stack is as:
> {noformat}
> java.lang.IndexOutOfBoundsException: Index: 67, Size: 67 
> at java.util.ArrayList.rangeCheck(ArrayList.java:635) 
> at java.util.ArrayList.get(ArrayList.java:411) 
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx.genColLists(ColumnPrunerProcCtx.java:161)
>  
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx.handleFilterUnionChildren(ColumnPrunerProcCtx.java:273)
>  
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory$ColumnPrunerFilterProc.process(ColumnPrunerProcFactory.java:108)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
>  
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPruner$ColumnPrunerWalker.walk(ColumnPruner.java:172)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
>  
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPruner.transform(ColumnPruner.java:135)
>  
> at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:198) 
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10327)
>  
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)
>  
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
>  
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:432) 
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305) 
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1119) 
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1167) 
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1055) 
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:403) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:419) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708) 
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) 
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13570) Some query with Union all fails when CBO is off

2016-04-20 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251042#comment-15251042
 ] 

Yongzhi Chen commented on HIVE-13570:
-

Using genColLists(curOp, child) to get Union's prunelist caused the issue, 
should use genColLists(child)
Attach patch1 with the fix and the test. 

> Some query with Union all fails when CBO is off
> ---
>
> Key: HIVE-13570
> URL: https://issues.apache.org/jira/browse/HIVE-13570
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-13570.1.PATCH
>
>
> Some queries with union all throws IndexOutOfBoundsException
> when:
> set hive.cbo.enable=false;
> set hive.ppd.remove.duplicatefilters=true;
> The stack is as:
> {noformat}
> java.lang.IndexOutOfBoundsException: Index: 67, Size: 67 
> at java.util.ArrayList.rangeCheck(ArrayList.java:635) 
> at java.util.ArrayList.get(ArrayList.java:411) 
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx.genColLists(ColumnPrunerProcCtx.java:161)
>  
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx.handleFilterUnionChildren(ColumnPrunerProcCtx.java:273)
>  
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory$ColumnPrunerFilterProc.process(ColumnPrunerProcFactory.java:108)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
>  
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPruner$ColumnPrunerWalker.walk(ColumnPruner.java:172)
>  
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
>  
> at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPruner.transform(ColumnPruner.java:135)
>  
> at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:198) 
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10327)
>  
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)
>  
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
>  
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:432) 
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305) 
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1119) 
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1167) 
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1055) 
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:403) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:419) 
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708) 
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) 
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13571) LLAP: Cannot find a jar for org.apache.hive.hcatalog.data.JsonSerDe

2016-04-20 Thread Andrew Sears (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sears updated HIVE-13571:

Description: 
hive --service llap -i 1 
produces an error.

Cannot find a jar for [org.apache.hive.hcatalog.data.JsonSerDe] due to an 
exception (org.apache.hive.hcatalog.data.JsonSerDe); not packaging the jar.

Does not impact creation of LLAP Slider package.

Fixed by copying 
http://central.maven.org/maven2/org/apache/hive/hcatalog/hive-hcatalog-core/2.0.0/hive-hcatalog-core-2.0.0.jar
 to hive/lib folder.

  was:
hive --service llap -i 1 
produces an error.

Cannot find a jar for [org.apache.hive.hcatalog.data.JsonSerDe] due to an 
exception (org.apache.hive.hcatalog.data.JsonSerDe); not packaging the jar.

Does not impact creation of LLAP Slider package.

Fixed by copying hive-hcatalog-core-1.2.1.2.3.4.0-3485.jar to hive/lib folder.


> LLAP: Cannot find a jar for org.apache.hive.hcatalog.data.JsonSerDe
> ---
>
> Key: HIVE-13571
> URL: https://issues.apache.org/jira/browse/HIVE-13571
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.0.0
>Reporter: Andrew Sears
>Priority: Minor
>
> hive --service llap -i 1 
> produces an error.
> Cannot find a jar for [org.apache.hive.hcatalog.data.JsonSerDe] due to an 
> exception (org.apache.hive.hcatalog.data.JsonSerDe); not packaging the jar.
> Does not impact creation of LLAP Slider package.
> Fixed by copying 
> http://central.maven.org/maven2/org/apache/hive/hcatalog/hive-hcatalog-core/2.0.0/hive-hcatalog-core-2.0.0.jar
>  to hive/lib folder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13258) LLAP: Add hdfs bytes read and spilled bytes to tez print summary

2016-04-20 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13258:
-
Attachment: llap-fs-counters-full-cache-hit.png

Adding image showing counters after 100% data and metadata cache hit.

> LLAP: Add hdfs bytes read and spilled bytes to tez print summary
> 
>
> Key: HIVE-13258
> URL: https://issues.apache.org/jira/browse/HIVE-13258
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13258.1.patch, llap-fs-counters-full-cache-hit.png, 
> llap-fs-counters.png
>
>
> When printing counters to console it will be useful to print hdfs bytes read 
> and spilled bytes which will help with debugging issues faster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13570) Some query with Union all fails when CBO is off

2016-04-20 Thread Yongzhi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-13570:

Description: 
Some queries with union all throws IndexOutOfBoundsException
when:
set hive.cbo.enable=false;
set hive.ppd.remove.duplicatefilters=true;
The stack is as:
{noformat}
java.lang.IndexOutOfBoundsException: Index: 67, Size: 67 
at java.util.ArrayList.rangeCheck(ArrayList.java:635) 
at java.util.ArrayList.get(ArrayList.java:411) 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx.genColLists(ColumnPrunerProcCtx.java:161)
 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx.handleFilterUnionChildren(ColumnPrunerProcCtx.java:273)
 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory$ColumnPrunerFilterProc.process(ColumnPrunerProcFactory.java:108)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPruner$ColumnPrunerWalker.walk(ColumnPruner.java:172)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPruner.transform(ColumnPruner.java:135)
 
at 
org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:198) 
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10327)
 
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)
 
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
 
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:432) 
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305) 
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1119) 
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1167) 
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1055) 
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045) 
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) 
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) 
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) 
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305) 
at 
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:403) 
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:419) 
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708) 
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) 
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) 
{noformat}

  was:
Some queries with union all throws IndexOutOfBoundsException
when:
set hive.cbo.enable=false;
set hive.ppd.remove.duplicatefilters=true;
The stack is as:
{noformat}
{code} 
java.lang.IndexOutOfBoundsException: Index: 67, Size: 67 
at java.util.ArrayList.rangeCheck(ArrayList.java:635) 
at java.util.ArrayList.get(ArrayList.java:411) 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx.genColLists(ColumnPrunerProcCtx.java:161)
 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcCtx.handleFilterUnionChildren(ColumnPrunerProcCtx.java:273)
 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory$ColumnPrunerFilterProc.process(ColumnPrunerProcFactory.java:108)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPruner$ColumnPrunerWalker.walk(ColumnPruner.java:172)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 
at 
org.apache.hadoop.hive.ql.optimizer.ColumnPruner.transform(ColumnPruner.java:135)
 
at 
org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:198) 
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10327)
 
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)
 
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
 
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:432) 
at org.apache.hadoop.hive.ql.Dri

[jira] [Commented] (HIVE-13569) Add test for llap file system counters after updating to tez 0.8.3

2016-04-20 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251032#comment-15251032
 ] 

Prasanth Jayachandran commented on HIVE-13569:
--

[~sseth] fyi..

> Add test for llap file system counters after updating to tez 0.8.3
> --
>
> Key: HIVE-13569
> URL: https://issues.apache.org/jira/browse/HIVE-13569
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Use post hook to print llap counters for *llap.q tests after tez 0.8.3 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13258) LLAP: Add hdfs bytes read and spilled bytes to tez print summary

2016-04-20 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13258:
-
Attachment: llap-fs-counters.png
HIVE-13258.1.patch

Initial patch and the sample output image showing fs counters on console.
I have also added post hook changes to print these counters that can be used 
for tests. But it will not work until we move to tez 0.8.3 version. After 
upgrading tez version we can change orc_llap.q to use the post hook for 
printing the counters. [~sseth] Can you please review this patch?

> LLAP: Add hdfs bytes read and spilled bytes to tez print summary
> 
>
> Key: HIVE-13258
> URL: https://issues.apache.org/jira/browse/HIVE-13258
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-13258.1.patch, llap-fs-counters.png
>
>
> When printing counters to console it will be useful to print hdfs bytes read 
> and spilled bytes which will help with debugging issues faster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11550) ACID queries pollute HiveConf

2016-04-20 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11550:
--
Description: 
HiveConf is a SessionState level object.  Some ACID related logic makes changes 
to it (which are meant to be per query) but become per SessionState.

See SemanticAnalyzer.checkAcidConstraints()
Also note   HiveConf.setVar(conf, 
HiveConf.ConfVars.DYNAMICPARTITIONINGMODE, "nonstrict");
in UpdateDeleteSemancitAnalzyer

[~alangates], do you know of other cases or ideas on how to deal with this 
differently?


_SortedDynPartitionOptimizer.process()_ is the place to have the logic to do 
_conf.setBoolVar(ConfVars.HIVEOPTSORTDYNAMICPARTITION, false);_ on per query 
basis

  was:
HiveConf is a SessionState level object.  Some ACID related logic makes changes 
to it (which are meant to be per query) but become permanent.

See SemanticAnalyzer.checkAcidConstraints()
Also note   HiveConf.setVar(conf, 
HiveConf.ConfVars.DYNAMICPARTITIONINGMODE, "nonstrict");
in UpdateDeleteSemancitAnalzyer

[~alangates], do you know of other cases or ideas on how to deal with this 
differently?


> ACID queries pollute HiveConf
> -
>
> Key: HIVE-11550
> URL: https://issues.apache.org/jira/browse/HIVE-11550
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> HiveConf is a SessionState level object.  Some ACID related logic makes 
> changes to it (which are meant to be per query) but become per SessionState.
> See SemanticAnalyzer.checkAcidConstraints()
> Also note   HiveConf.setVar(conf, 
> HiveConf.ConfVars.DYNAMICPARTITIONINGMODE, "nonstrict");
> in UpdateDeleteSemancitAnalzyer
> [~alangates], do you know of other cases or ideas on how to deal with this 
> differently?
> _SortedDynPartitionOptimizer.process()_ is the place to have the logic to do 
> _conf.setBoolVar(ConfVars.HIVEOPTSORTDYNAMICPARTITION, false);_ on per query 
> basis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HIVE-13568) Add UDFs to support column-masking

2016-04-20 Thread Madhan Neethiraj (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-13568 started by Madhan Neethiraj.
---
> Add UDFs to support column-masking
> --
>
> Key: HIVE-13568
> URL: https://issues.apache.org/jira/browse/HIVE-13568
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Madhan Neethiraj
>Assignee: Madhan Neethiraj
>
> HIVE-13125 added support to provide column-masking and row-filtering during 
> select via HiveAuthorizer interface. This JIRA is track addition of UDFs that 
> can be used by HiveAuthorizer implementations to mask column values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-13566) enable merging of bit vectors for insert into

2016-04-20 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-13566:
--

Assignee: Pengcheng Xiong

> enable merging of bit vectors for insert into
> -
>
> Key: HIVE-13566
> URL: https://issues.apache.org/jira/browse/HIVE-13566
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13564) Deprecate HIVE_STATS_COLLECT_RAWDATASIZE

2016-04-20 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13564:
---
Summary: Deprecate HIVE_STATS_COLLECT_RAWDATASIZE  (was: Depreciate 
HIVE_STATS_COLLECT_RAWDATASIZE)

> Deprecate HIVE_STATS_COLLECT_RAWDATASIZE
> 
>
> Key: HIVE-13564
> URL: https://issues.apache.org/jira/browse/HIVE-13564
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Statistics
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>Priority: Minor
>
> Reasons (1) It is only used in stats20.q (2) We already have a 
> "HIVESTATSAUTOGATHER" configuration to tell if we are going to collect 
> rawDataSize and #rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13341) Stats state is not captured correctly: differentiate load table and create table

2016-04-20 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13341:
---
Status: Patch Available  (was: Open)

> Stats state is not captured correctly: differentiate load table and create 
> table
> 
>
> Key: HIVE-13341
> URL: https://issues.apache.org/jira/browse/HIVE-13341
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Statistics
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13341.01.patch, HIVE-13341.02.patch, 
> HIVE-13341.03.patch, HIVE-13341.04.patch, HIVE-13341.05.patch, 
> HIVE-13341.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13341) Stats state is not captured correctly: differentiate load table and create table

2016-04-20 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13341:
---
Attachment: HIVE-13341.06.patch

> Stats state is not captured correctly: differentiate load table and create 
> table
> 
>
> Key: HIVE-13341
> URL: https://issues.apache.org/jira/browse/HIVE-13341
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Statistics
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13341.01.patch, HIVE-13341.02.patch, 
> HIVE-13341.03.patch, HIVE-13341.04.patch, HIVE-13341.05.patch, 
> HIVE-13341.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13341) Stats state is not captured correctly: differentiate load table and create table

2016-04-20 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13341:
---
Status: Open  (was: Patch Available)

> Stats state is not captured correctly: differentiate load table and create 
> table
> 
>
> Key: HIVE-13341
> URL: https://issues.apache.org/jira/browse/HIVE-13341
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Statistics
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13341.01.patch, HIVE-13341.02.patch, 
> HIVE-13341.03.patch, HIVE-13341.04.patch, HIVE-13341.05.patch, 
> HIVE-13341.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13562) Enable vector bridge for all non-vectorized udfs

2016-04-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250906#comment-15250906
 ] 

Ashutosh Chauhan commented on HIVE-13562:
-

I think it will be good idea because now udfs may get inserted into operator 
pipeline even when user has not specified any in her query text. e.g. 1) In 
security enabled deployment, we may insert udfs for column masking. 2) When 
column stats autogather is on, stats collector udf will show up in pipeline. 
So, I think its better to always have vectorization on. 

> Enable vector bridge for all non-vectorized udfs
> 
>
> Key: HIVE-13562
> URL: https://issues.apache.org/jira/browse/HIVE-13562
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Ashutosh Chauhan
>
> Mechanism already exists for this via {{VectorUDFAdaptor}} but we have 
> arbitrarily hand picked few udfs to go through it. I think we should enable 
> this by default for all udfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (HIVE-13366) Add data/conf/hive-site.xml file to test resources for metastore/service/ql components

2016-04-20 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-13366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-13366:
---
Comment: was deleted

(was: 

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12795554/HIVE-13366.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 9871 tests executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-vector_coalesce.q-auto_sortmerge_join_7.q-dynamic_partition_pruning.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not 
produce a TEST-*.xml file
org.apache.hive.jdbc.TestSchedulerQueue.testFairSchedulerPrimaryQueueMapping
org.apache.hive.jdbc.TestSchedulerQueue.testFairSchedulerSecondaryQueueMapping
org.apache.hive.service.auth.TestCustomAuthentication.org.apache.hive.service.auth.TestCustomAuthentication
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7401/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7401/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7401/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12795554 - PreCommit-HIVE-TRUNK-Build)

> Add data/conf/hive-site.xml file to test resources for metastore/service/ql 
> components
> --
>
> Key: HIVE-13366
> URL: https://issues.apache.org/jira/browse/HIVE-13366
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
> Attachments: HIVE-13366.1.patch, HIVE-13366.2.patch, 
> HIVE-13366.3.patch
>
>
> The {{hive-site.xml}} file should be added to the test JAR (..-tests.jar) to 
> allow running tests in environments where {{mvn surefire:test}} is executed, 
> and only sources and test JAR exist.
> So far, only metastore, service and ql components need this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13562) Enable vector bridge for all non-vectorized udfs

2016-04-20 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250888#comment-15250888
 ] 

Matt McCline commented on HIVE-13562:
-

Not that I am aware of.  Because LLAP is so performance sensitive when we don't 
vectorize, I think we should vectorize for all UDFs (I recall Gopal saying this 
recently).

> Enable vector bridge for all non-vectorized udfs
> 
>
> Key: HIVE-13562
> URL: https://issues.apache.org/jira/browse/HIVE-13562
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Ashutosh Chauhan
>
> Mechanism already exists for this via {{VectorUDFAdaptor}} but we have 
> arbitrarily hand picked few udfs to go through it. I think we should enable 
> this by default for all udfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13467) Show llap info on hs2 ui when available

2016-04-20 Thread Gunther Hagleitner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250883#comment-15250883
 ] 

Gunther Hagleitner commented on HIVE-13467:
---

Sure will do when i commit.

> Show llap info on hs2 ui when available
> ---
>
> Key: HIVE-13467
> URL: https://issues.apache.org/jira/browse/HIVE-13467
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-13467.1.patch, HIVE-13467.2.patch, 
> HIVE-13467.3.patch, HIVE-13467.4.patch, HIVE-13467.5.patch, 
> screen-shot-llap.png, screen.png
>
>
> When llap is on and hs2 is configured with access to an llap cluster, HS2 UI 
> should show some status of the daemons and provide a mechanism to click 
> through to their respective UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13562) Enable vector bridge for all non-vectorized udfs

2016-04-20 Thread Gunther Hagleitner (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250881#comment-15250881
 ] 

Gunther Hagleitner commented on HIVE-13562:
---

Makes sense to me. [~mmccline] any reason we didn't enable this for all udfs?

> Enable vector bridge for all non-vectorized udfs
> 
>
> Key: HIVE-13562
> URL: https://issues.apache.org/jira/browse/HIVE-13562
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Ashutosh Chauhan
>
> Mechanism already exists for this via {{VectorUDFAdaptor}} but we have 
> arbitrarily hand picked few udfs to go through it. I think we should enable 
> this by default for all udfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13480) Add hadoop2 metrics reporter for Codahale metrics

2016-04-20 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250864#comment-15250864
 ] 

Sushanth Sowmyan commented on HIVE-13480:
-

And Thejas, I agree, I'll work on another patch to try to set this 
automatically rather than via config.

> Add hadoop2 metrics reporter for Codahale metrics
> -
>
> Key: HIVE-13480
> URL: https://issues.apache.org/jira/browse/HIVE-13480
> Project: Hive
>  Issue Type: Bug
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-13480.2.patch, HIVE-13480.3.patch, 
> HIVE-13480.4.patch, HIVE-13480.patch
>
>
> Multiple other apache components allow sending metrics over to Hadoop2 
> metrics, which allow for monitoring solutions like Ambari Metrics Server to 
> work against that to show metrics for components in one place. Our Codahale 
> metrics works very well, so ideally, we would like to bridge the two, to 
> allow Codahale to add a Hadoop2 reporter that enables us to continue to use 
> Codahale metrics (i.e. not write another custom metrics impl) but report 
> using Hadoop2.
> Apache Phoenix also had such a recent usecase and were in the process of 
> adding in a stub piece that allows this forwarding. We should use the same 
> reporter to minimize redundancy while pushing metrics to a centralized 
> solution like Hadoop2 Metrics/AMS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13480) Add hadoop2 metrics reporter for Codahale metrics

2016-04-20 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-13480:

Attachment: HIVE-13480.4.patch

Attached update to reword HiveConf description per Lefty's comment.

> Add hadoop2 metrics reporter for Codahale metrics
> -
>
> Key: HIVE-13480
> URL: https://issues.apache.org/jira/browse/HIVE-13480
> Project: Hive
>  Issue Type: Bug
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-13480.2.patch, HIVE-13480.3.patch, 
> HIVE-13480.4.patch, HIVE-13480.patch
>
>
> Multiple other apache components allow sending metrics over to Hadoop2 
> metrics, which allow for monitoring solutions like Ambari Metrics Server to 
> work against that to show metrics for components in one place. Our Codahale 
> metrics works very well, so ideally, we would like to bridge the two, to 
> allow Codahale to add a Hadoop2 reporter that enables us to continue to use 
> Codahale metrics (i.e. not write another custom metrics impl) but report 
> using Hadoop2.
> Apache Phoenix also had such a recent usecase and were in the process of 
> adding in a stub piece that allows this forwarding. We should use the same 
> reporter to minimize redundancy while pushing metrics to a centralized 
> solution like Hadoop2 Metrics/AMS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13249) Hard upper bound on number of open transactions

2016-04-20 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13249:
-
Status: Patch Available  (was: Open)

> Hard upper bound on number of open transactions
> ---
>
> Key: HIVE-13249
> URL: https://issues.apache.org/jira/browse/HIVE-13249
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13249.1.patch, HIVE-13249.2.patch, 
> HIVE-13249.3.patch
>
>
> We need to have a safeguard by adding an upper bound for open transactions to 
> avoid huge number of open-transaction requests, usually due to improper 
> configuration of clients such as Storm.
> Once that limit is reached, clients will start failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13249) Hard upper bound on number of open transactions

2016-04-20 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13249:
-
Attachment: HIVE-13249.3.patch

[~ekoifman] Thanks for the detailed review. The comments make sense to me.
Attach patch 3 for test.

> Hard upper bound on number of open transactions
> ---
>
> Key: HIVE-13249
> URL: https://issues.apache.org/jira/browse/HIVE-13249
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13249.1.patch, HIVE-13249.2.patch, 
> HIVE-13249.3.patch
>
>
> We need to have a safeguard by adding an upper bound for open transactions to 
> avoid huge number of open-transaction requests, usually due to improper 
> configuration of clients such as Storm.
> Once that limit is reached, clients will start failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13562) Enable vector bridge for all non-vectorized udfs

2016-04-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250826#comment-15250826
 ] 

Ashutosh Chauhan commented on HIVE-13562:
-

cc: [~mmccline] , [~hagleitn] [~gopalv] thoughts ?

> Enable vector bridge for all non-vectorized udfs
> 
>
> Key: HIVE-13562
> URL: https://issues.apache.org/jira/browse/HIVE-13562
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Ashutosh Chauhan
>
> Mechanism already exists for this via {{VectorUDFAdaptor}} but we have 
> arbitrarily hand picked few udfs to go through it. I think we should enable 
> this by default for all udfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13548) hive-jdbc isn't escaping slashes during PreparedStatement

2016-04-20 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250823#comment-15250823
 ] 

Hive QA commented on HIVE-13548:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12799539/HIVE-13548.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 38 failed/errored test(s), 9484 tests 
executed
*Failed tests:*
{noformat}
TestCliDriver-alter_table_not_sorted.q-cbo_udf_max.q-udf_equal.q-and-12-more - 
did not produce a TEST-*.xml file
TestCliDriver-bool_literal.q-authorization_cli_createtab.q-explain_ddl.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-bucketmapjoin3.q-vector_partition_diff_num_cols.q-stats2.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-cbo_rp_join1.q-union_top_level.q-insert_update_delete.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-create_func1.q-enforce_order.q-interval_comparison.q-and-12-more 
- did not produce a TEST-*.xml file
TestCliDriver-cte_mat_1.q-groupby_sort_6.q-udf_regexp.q-and-12-more - did not 
produce a TEST-*.xml file
TestCliDriver-describe_xpath.q-autogen_colalias.q-skewjoinopt3.q-and-12-more - 
did not produce a TEST-*.xml file
TestCliDriver-encryption_join_with_different_encryption_keys.q-bucketcontext_3.q-udf_divide.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-groupby4.q-convert_enum_to_string.q-load_dyn_part3.q-and-12-more 
- did not produce a TEST-*.xml file
TestCliDriver-groupby_complex_types.q-auto_join9.q-vector_decimal_round.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-groupby_map_ppr_multi_distinct.q-vectorization_16.q-multi_insert_mixed.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-infer_bucket_sort_multi_insert.q-vector_custom_udf_configure.q-udf4.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-lateral_view_noalias.q-input11_limit.q-orc_llap.q-and-12-more - 
did not produce a TEST-*.xml file
TestCliDriver-metadataonly1.q-union13.q-udf1.q-and-12-more - did not produce a 
TEST-*.xml file
TestCliDriver-orc_merge10.q-groupby8_map.q-exim_14_managed_location_over_existing.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-order_null.q-part_inherit_tbl_props_with_star.q-join_filters.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-parquet_ppd_decimal.q-vector_complex_join.q-cluster.q-and-12-more 
- did not produce a TEST-*.xml file
TestCliDriver-ppd_union.q-udf_round_3.q-groupby12.q-and-12-more - did not 
produce a TEST-*.xml file
TestCliDriver-ptf_general_queries.q-unionDistinct_1.q-groupby1_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-rcfile_merge1.q-multigroupby_singlemr.q-vectorization_limit.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-rename_column.q-index_compact.q-merge_dynamic_partition2.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-sample_islocalmode_hook_use_metadata.q-cbo_rp_semijoin.q-custom_input_output_format.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-showparts.q-skewjoinopt21.q-udaf_percentile_approx_20.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-smb_mapjoin_4.q-udf_to_unix_timestamp.q-tez_union.q-and-12-more - 
did not produce a TEST-*.xml file
TestCliDriver-stats13.q-join_parse.q-sort_merge_join_desc_2.q-and-12-more - did 
not produce a TEST-*.xml file
TestCliDriver-stats_publisher_error_1.q-auto_join1.q-cast_to_int.q-and-12-more 
- did not produce a TEST-*.xml file
TestCliDriver-tez_joins_explain.q-varchar_serde.q-ivyDownload.q-and-12-more - 
did not produce a TEST-*.xml file
TestCliDriver-tez_smb_empty.q-char_2.q-udf_date_sub.q-and-12-more - did not 
produce a TEST-*.xml file
TestCliDriver-udf_asin.q-windowing_multipartitioning.q-bucketcontext_1.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-udf_double.q-join11.q-join18.q-and-12-more - did not produce a 
TEST-*.xml file
TestCliDriver-udf_locate.q-join32_lessSize.q-correlationoptimizer8.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-udf_to_float.q-decimal_precision2.q-ppd_gby_join.q-and-12-more - 
did not produce a TEST-*.xml file
TestCliDriver-udtf_posexplode.q-udf_exp.q-alter_numbuckets_partitioned_table_h23.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-unicode_notation.q-gen_udf_example_add10.q-ppd_join4.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-vector_distinct_2.q-update_after_multiple_inserts_special_characters.q-nullscript.q-and-12-more
 - did not produce a TEST-*.xml file
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hive.beeline.TestSchemaTool.testSchemaInit
{noformat}

Test results: 
http://ec2-174-129-184-35.co

[jira] [Commented] (HIVE-13561) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used

2016-04-20 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250816#comment-15250816
 ] 

Vaibhav Gumashta commented on HIVE-13561:
-

[~tleftwich] Thanks a lot for looking into this.

> HiveServer2 is leaking ClassLoaders when add jar / temporary functions are 
> used
> ---
>
> Key: HIVE-13561
> URL: https://issues.apache.org/jira/browse/HIVE-13561
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.0, 1.2.1, 2.0.0
>Reporter: Trystan Leftwich
>Assignee: Trystan Leftwich
> Attachments: HIVE-13561-branch-1.2.patch
>
>
> I can repo this on branch-1.2 and branch-2.0.
> It looks to be the same issues as: HIVE-11408
> The patch from HIVE-11408 looks to fix the issue as well.
> I've updated the patch from HIVE-11408 to be aligned with branch-1.2 and 
> master



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13561) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used

2016-04-20 Thread Trystan Leftwich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250810#comment-15250810
 ] 

Trystan Leftwich commented on HIVE-13561:
-

Ok, sounds like a better option, I'll go get that working and update the patch. 
Thanks

> HiveServer2 is leaking ClassLoaders when add jar / temporary functions are 
> used
> ---
>
> Key: HIVE-13561
> URL: https://issues.apache.org/jira/browse/HIVE-13561
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.0, 1.2.1, 2.0.0
>Reporter: Trystan Leftwich
>Assignee: Trystan Leftwich
> Attachments: HIVE-13561-branch-1.2.patch
>
>
> I can repo this on branch-1.2 and branch-2.0.
> It looks to be the same issues as: HIVE-11408
> The patch from HIVE-11408 looks to fix the issue as well.
> I've updated the patch from HIVE-11408 to be aligned with branch-1.2 and 
> master



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13561) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used

2016-04-20 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250803#comment-15250803
 ] 

Vaibhav Gumashta commented on HIVE-13561:
-

[~tleftwich] Thanks for the pointer. Indeed, there is still a dependence on 
org.apache.hadoop.util.ReflectionUtils. I think a better idea would be to use 
https://github.com/apache/hive/blob/branch-1.2/common/src/java/org/apache/hive/common/util/ReflectionUtil.java
 and remove hadoop's ReflectionUtils. What do you think?

> HiveServer2 is leaking ClassLoaders when add jar / temporary functions are 
> used
> ---
>
> Key: HIVE-13561
> URL: https://issues.apache.org/jira/browse/HIVE-13561
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.0, 1.2.1, 2.0.0
>Reporter: Trystan Leftwich
>Assignee: Trystan Leftwich
> Attachments: HIVE-13561-branch-1.2.patch
>
>
> I can repo this on branch-1.2 and branch-2.0.
> It looks to be the same issues as: HIVE-11408
> The patch from HIVE-11408 looks to fix the issue as well.
> I've updated the patch from HIVE-11408 to be aligned with branch-1.2 and 
> master



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13561) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used

2016-04-20 Thread Trystan Leftwich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250795#comment-15250795
 ] 

Trystan Leftwich commented on HIVE-13561:
-

It looks to be that when calling CREATE TEMPORARY FUNCTION via JDBC through 
Hiveserver2, The Function is still being created through the 
registerGenericUD*F methods:
https://github.com/apache/hive/blob/branch-1.2/ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java#L141

Which still uses the org.apache.hadoop.util.ReflectionUtils class.

> HiveServer2 is leaking ClassLoaders when add jar / temporary functions are 
> used
> ---
>
> Key: HIVE-13561
> URL: https://issues.apache.org/jira/browse/HIVE-13561
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.0, 1.2.1, 2.0.0
>Reporter: Trystan Leftwich
>Assignee: Trystan Leftwich
> Attachments: HIVE-13561-branch-1.2.patch
>
>
> I can repo this on branch-1.2 and branch-2.0.
> It looks to be the same issues as: HIVE-11408
> The patch from HIVE-11408 looks to fix the issue as well.
> I've updated the patch from HIVE-11408 to be aligned with branch-1.2 and 
> master



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10329) Hadoop reflectionutils has issues

2016-04-20 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-10329:

Fix Version/s: 2.0.0

> Hadoop reflectionutils has issues
> -
>
> Key: HIVE-10329
> URL: https://issues.apache.org/jira/browse/HIVE-10329
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: llap, 1.2.0, 2.0.0
>
> Attachments: HIVE-10329.patch
>
>
> 1) Constructor cache leaks classes and their attendant static overhead 
> forever.
> 2) Class cache inside conf used when getting JobConfigurable classes has an 
> epic lock.
> Both bugs are files in Hadoop but will hardly ever be fixed at this rate. 
> This version avoids both problems



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13561) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used

2016-04-20 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250788#comment-15250788
 ] 

Vaibhav Gumashta commented on HIVE-13561:
-

[~tleftwich] Can you elaborate a bit more? Looks like branch-1.2 and branch-2.0 
don't use Hadoop's reflection utils (replaced by Hive's reflection utils here: 
HIVE-10329). The patch for HIVE-11408 was targeted at pre 1.2 releases on 
branch-1.

> HiveServer2 is leaking ClassLoaders when add jar / temporary functions are 
> used
> ---
>
> Key: HIVE-13561
> URL: https://issues.apache.org/jira/browse/HIVE-13561
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.0, 1.2.1, 2.0.0
>Reporter: Trystan Leftwich
>Assignee: Trystan Leftwich
> Attachments: HIVE-13561-branch-1.2.patch
>
>
> I can repo this on branch-1.2 and branch-2.0.
> It looks to be the same issues as: HIVE-11408
> The patch from HIVE-11408 looks to fix the issue as well.
> I've updated the patch from HIVE-11408 to be aligned with branch-1.2 and 
> master



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13561) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used

2016-04-20 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-13561:

Assignee: Trystan Leftwich  (was: Vaibhav Gumashta)

> HiveServer2 is leaking ClassLoaders when add jar / temporary functions are 
> used
> ---
>
> Key: HIVE-13561
> URL: https://issues.apache.org/jira/browse/HIVE-13561
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.0, 1.2.1, 2.0.0
>Reporter: Trystan Leftwich
>Assignee: Trystan Leftwich
> Attachments: HIVE-13561-branch-1.2.patch
>
>
> I can repo this on branch-1.2 and branch-2.0.
> It looks to be the same issues as: HIVE-11408
> The patch from HIVE-11408 looks to fix the issue as well.
> I've updated the patch from HIVE-11408 to be aligned with branch-1.2 and 
> master



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13561) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used

2016-04-20 Thread Trystan Leftwich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trystan Leftwich updated HIVE-13561:

Attachment: HIVE-13561-branch-1.2.patch

> HiveServer2 is leaking ClassLoaders when add jar / temporary functions are 
> used
> ---
>
> Key: HIVE-13561
> URL: https://issues.apache.org/jira/browse/HIVE-13561
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.0, 1.2.1, 2.0.0
>Reporter: Trystan Leftwich
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-13561-branch-1.2.patch
>
>
> I can repo this on branch-1.2 and branch-2.0.
> It looks to be the same issues as: HIVE-11408
> The patch from HIVE-11408 looks to fix the issue as well.
> I've updated the patch from HIVE-11408 to be aligned with branch-1.2 and 
> master



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13561) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used

2016-04-20 Thread Trystan Leftwich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trystan Leftwich updated HIVE-13561:

Description: 
I can repo this on branch-1.2 and branch-2.0.

It looks to be the same issues as: HIVE-11408

The patch from HIVE-11408 looks to fix the issue as well.

I've updated the patch from HIVE-11408 to be aligned with branch-1.2 and master



  was:
I can repo this on branch-1.2 and branch-2.0.

It looks to be the same issues as:
https://issues.apache.org/jira/browse/HIVE-11408

The patch from HIVE-11408 looks to fix the issue as well.

I've updated the patch from HIVE-11408 to be aligned with branch-1.2 and master




> HiveServer2 is leaking ClassLoaders when add jar / temporary functions are 
> used
> ---
>
> Key: HIVE-13561
> URL: https://issues.apache.org/jira/browse/HIVE-13561
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.0, 1.2.1, 2.0.0
>Reporter: Trystan Leftwich
>Assignee: Vaibhav Gumashta
>
> I can repo this on branch-1.2 and branch-2.0.
> It looks to be the same issues as: HIVE-11408
> The patch from HIVE-11408 looks to fix the issue as well.
> I've updated the patch from HIVE-11408 to be aligned with branch-1.2 and 
> master



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13458) Heartbeater doesn't fail query when heartbeat fails

2016-04-20 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13458:
-
Attachment: HIVE-13458.4.patch

Had offline discussion with Eugene. patch 4 addresses review comments.

> Heartbeater doesn't fail query when heartbeat fails
> ---
>
> Key: HIVE-13458
> URL: https://issues.apache.org/jira/browse/HIVE-13458
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13458.1.patch, HIVE-13458.2.patch, 
> HIVE-13458.3.patch, HIVE-13458.4.patch
>
>
> When a heartbeat fails to locate a lock, it should fail the current query. 
> That doesn't happen, which is a bug.
> Another thing is, we need to make sure stopHeartbeat really stops the 
> heartbeat, i.e. no additional heartbeat will be sent, since that will break 
> the assumption and cause the query to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13560) Adding Omid as connection manager for HBase Metastore

2016-04-20 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250682#comment-15250682
 ] 

Daniel Dai commented on HIVE-13560:
---

The patch also change TestMiniTezCliDriver to use Omid for testing.

> Adding Omid as connection manager for HBase Metastore
> -
>
> Key: HIVE-13560
> URL: https://issues.apache.org/jira/browse/HIVE-13560
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-13560.1.patch
>
>
> Adding Omid as a transaction manager to HBase Metastore. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13560) Adding Omid as connection manager for HBase Metastore

2016-04-20 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-13560:
--
Status: Patch Available  (was: Open)

> Adding Omid as connection manager for HBase Metastore
> -
>
> Key: HIVE-13560
> URL: https://issues.apache.org/jira/browse/HIVE-13560
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-13560.1.patch
>
>
> Adding Omid as a transaction manager to HBase Metastore. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13560) Adding Omid as connection manager for HBase Metastore

2016-04-20 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-13560:
--
Attachment: HIVE-13560.1.patch

> Adding Omid as connection manager for HBase Metastore
> -
>
> Key: HIVE-13560
> URL: https://issues.apache.org/jira/browse/HIVE-13560
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-13560.1.patch
>
>
> Adding Omid as a transaction manager to HBase Metastore. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13349) Metastore Changes : API calls for retrieving primary keys and foreign keys information

2016-04-20 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250587#comment-15250587
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-13349:
--

[~alangates] 
1. Agree that the naming for the structs should have been better. SQLPrimaryKey 
= SQLPrimaryKeyColumn 
and SQLForeignKey = SQLForeignKeyColumn. Hence they are passed as lists since 
there can be multiple columns.
Agree with the redundancy here, this is a trade-off for creating an 
intermediate structure which should be mapped to a MConstraint at the server 
side.
The columns can be  distinguisied using the key_seq values. We expect the 
client to :
1. Send the key_seq correctly (1, 2, 3, ..) for the 1st, 2nd, 3rd, columns etc 
in the primary key. This position is important while retreiving the constraints.
2. Send the key_seq in order while creating the table. for e.g. in case of 
foreign keys - f1 = (t1.a, t1.b) f2 = (t1.c, t1.d) we expect the list (column 
name, key seq) to be as follows : ((t1.a, 1), (t1.b, 2), (t1.c, 1) (t1.d, 2)) . 
This way each "composite foreign key" is a sublist starting with 1 as the key 
sequence.

In case of foreign keys with same table, the above 2 restrictions are used to 
distinguish the foreign key each struct belongs to. While retreiving the keys, 
the constraint name can be used to distinguish the foreign key each struct 
belongs to.

Although, in some of my initial patches, the keys were part of the table 
struct, they were moved to a separate MConstraint model  to avoid Table table 
join with Constraint table each time the Table is retrieved. Hence the need to 
create create_table_with_constraints as a separate API.

Thanks
Hari

> Metastore Changes : API calls for retrieving primary keys and foreign keys 
> information
> --
>
> Key: HIVE-13349
> URL: https://issues.apache.org/jira/browse/HIVE-13349
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Logical Optimizer
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 2.1.0
>
> Attachments: 13449.2.patch, HIVE-13349.1.patch, HIVE-13349.3.patch, 
> HIVE-13349.4.patch, HIVE-13349.5.patch, HIVE-13349.6.patch, HIVE-13349.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-13558) Update LlapDump

2016-04-20 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere resolved HIVE-13558.
---
   Resolution: Fixed
Fix Version/s: llap

Committed to llap branch

> Update LlapDump
> ---
>
> Key: HIVE-13558
> URL: https://issues.apache.org/jira/browse/HIVE-13558
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap, p
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: llap
>
> Attachments: HIVE-13558.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13490) Change itests to be part of the main Hive build

2016-04-20 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-13490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250537#comment-15250537
 ] 

Sergio Peña commented on HIVE-13490:


I think "itests" should be kept separately from the root project to 
differentiate from real 'unit' tests vs 'integration' tests.

Some developers would like to run only unit tests, and they can execute {{mvn 
test}} from the root directory. But, with this change, if we run {{mvn test}}, 
it will execute all "itests" as well, and this can run for hours. 

Sorry, but I'll have to do a -1 until we decide what's better. Btw, I like your 
idea to manage IntelliJ, but I'm afraid this will cause others people testing 
infra to run more time.

> Change itests to be part of the main Hive build
> ---
>
> Key: HIVE-13490
> URL: https://issues.apache.org/jira/browse/HIVE-13490
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-13490.01.patch, HIVE-13490.02.patch
>
>
> Instead of having to build Hive, and then itests separately.
> With IntelliJ, this ends up being loaded as two separate dependencies, and 
> there's a lot of hops involved to make changes.
> Does anyone know why these have been kept separate ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13559) Pass exception to failure hooks

2016-04-20 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-13559:
---
Status: Patch Available  (was: Open)

> Pass exception to failure hooks
> ---
>
> Key: HIVE-13559
> URL: https://issues.apache.org/jira/browse/HIVE-13559
> Project: Hive
>  Issue Type: Improvement
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Attachments: HIVE-13559.1.patch
>
>
> Pass exception to failure hooks so that they know more about the failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13559) Pass exception to failure hooks

2016-04-20 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-13559:
---
Attachment: HIVE-13559.1.patch

> Pass exception to failure hooks
> ---
>
> Key: HIVE-13559
> URL: https://issues.apache.org/jira/browse/HIVE-13559
> Project: Hive
>  Issue Type: Improvement
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Attachments: HIVE-13559.1.patch
>
>
> Pass exception to failure hooks so that they know more about the failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13349) Metastore Changes : API calls for retrieving primary keys and foreign keys information

2016-04-20 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250500#comment-15250500
 ] 

Alan Gates commented on HIVE-13349:
---

Sorry, I should have reviewed this sooner, but your message structure seems 
fundamentally busted.  A table can only have one primary key and a primary key 
can have multiple columns.  But your SQLPrimaryKey struct only has one column, 
and then in your new create_table_with_constraints method you pass a list of 
primary keys.  This is confusing.  And redundant, since all but the column name 
in those structs will be the same.

The situation is even worse with foreign keys, since you can have more than one 
of those per table.  How are you going to distinguish which foreign key each of 
the structs belongs to?

And why did you create a new "create_table_with_constraints" method?  We should 
not be proliferating methods in the thrift interface.  Instead primary key and 
foreign key should be added as optional fields in the table struct so that the 
code can continue to use the existing create_table methods.

> Metastore Changes : API calls for retrieving primary keys and foreign keys 
> information
> --
>
> Key: HIVE-13349
> URL: https://issues.apache.org/jira/browse/HIVE-13349
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Logical Optimizer
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 2.1.0
>
> Attachments: 13449.2.patch, HIVE-13349.1.patch, HIVE-13349.3.patch, 
> HIVE-13349.4.patch, HIVE-13349.5.patch, HIVE-13349.6.patch, HIVE-13349.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13558) Update LlapDump

2016-04-20 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-13558:
--
Attachment: HIVE-13558.1.patch

- Use the row input format
- Add option to set conf settings

> Update LlapDump
> ---
>
> Key: HIVE-13558
> URL: https://issues.apache.org/jira/browse/HIVE-13558
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap, p
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-13558.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-4806) Add more implementations of JDBC API methods to Hive and Hive2 drivers

2016-04-20 Thread Matt Burgess (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated HIVE-4806:
---
Assignee: (was: Matt Burgess)

> Add more implementations of JDBC API methods to Hive and Hive2 drivers
> --
>
> Key: HIVE-4806
> URL: https://issues.apache.org/jira/browse/HIVE-4806
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 0.11.0
>Reporter: Matt Burgess
> Attachments: HIVE-4806.patch
>
>
> Third-party client software such as Pentaho Data Integration (PDI) uses many 
> different JDBC API calls when interacting with JDBC data sources. Several of 
> these calls have not yet been implemented in the Hive and Hive 2 drivers and 
> by default will throw "Method not supported" SQLExceptions when there could 
> be default implementations instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HIVE-13507) Improved logging for ptest

2016-04-20 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña reopened HIVE-13507:


I just reverted the patch.

> Improved logging for ptest
> --
>
> Key: HIVE-13507
> URL: https://issues.apache.org/jira/browse/HIVE-13507
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Sergio Peña
> Fix For: 2.1.0
>
> Attachments: HIVE-13507.01.patch
>
>
> Include information about batch runtimes, outlier lists, host completion 
> times, etc. Try identifying tests which cause the build to take a long time 
> while holding onto resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13507) Improved logging for ptest

2016-04-20 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250463#comment-15250463
 ] 

Sergio Peña commented on HIVE-13507:


hi [~sseth], I will need to revert this patch as it is causing some issues with 
the ptest infra.
While I was running some tests, I found that ptest is spinning a lot of 
instances due to an error exception:

{noformat}
2016-04-20 13:01:25 INFO  CloudExecutionContextProvider:213 - Attempting to 
create 12 nodes
2016-04-20 13:02:34 INFO  CloudExecutionContextProvider:281 - Verify number of 
hots: 1
2016-04-20 13:02:34 INFO  CloudExecutionContextProvider:291 - Verifying node: 
{id=us-west-1/i-b245ef07, providerId=i-b245ef07, 
name=spena-hive-spark-ptest-slaves-b245ef07, location={scope=ZONE, 
id=us-west-1c, description=us-west-1c, parent=us-west-1, iso3166Codes=[US-CA]}, 
group=spena-hive-spark-ptest-slaves, imageId=us-west-1/ami-1ac6dc5f, 
os={family=unrecognized, arch=paravirtual, version=, 
description=360379543683/hive-spark-ptest-7, is64Bit=true}, 
status=RUNNING[running], loginPort=22, hostname=ip-10-236-128-180, 
privateAddresses=[10.236.128.180], publicAddresses=[54.241.234.115], 
hardware={id=c3.2xlarge, providerId=c3.2xlarge, processors=[{cores=8.0, 
speed=3.5}], ram=15360, volumes=[{type=LOCAL, size=80.0, device=/dev/sdb, 
bootDevice=false, durable=false}, {type=LOCAL, size=80.0, device=/dev/sdc, 
bootDevice=false, durable=false}, {id=vol-df82d662, type=SAN, device=/dev/sda1, 
bootDevice=true, durable=true}], hypervisor=xen, 
supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit())},
 loginUser=root, tags=[group=spena-hive-spark-ptest-slaves], 
userMetadata={owner=sergio.pena, Name=spena-hive-spark-ptest-slaves-b245ef07}}
2016-04-20 13:02:34 INFO  CloudExecutionContextProvider:45 - Starting 
LocalCommandId=ssh -v -i /home/hiveptest/.ssh/hive-ptest-user-key  -l hiveptest 
54.241.234.115 'pkill -f java': {}1
2016-04-20 13:02:35 INFO  CloudExecutionContextProvider:60 - Finished 
LocalCommandId=1. ElapsedTime(seconds)=0
2016-04-20 13:02:35 ERROR CloudExecutionContextProvider:296 - Node 
{id=us-west-1/i-b245ef07, providerId=i-b245ef07, 
name=spena-hive-spark-ptest-slaves-b245ef07, location={scope=ZONE, 
id=us-west-1c, description=us-west-1c, parent=us-west-1, iso3166Codes=[US-CA]}, 
group=spena-hive-spark-ptest-slaves, imageId=us-west-1/ami-1ac6dc5f, 
os={family=unrecognized, arch=paravirtual, version=, 
description=360379543683/hive-spark-ptest-7, is64Bit=true}, 
status=RUNNING[running], loginPort=22, hostname=ip-10-236-128-180, 
privateAddresses=[10.236.128.180], publicAddresses=[54.241.234.115], 
hardware={id=c3.2xlarge, providerId=c3.2xlarge, processors=[{cores=8.0, 
speed=3.5}], ram=15360, volumes=[{type=LOCAL, size=80.0, device=/dev/sdb, 
bootDevice=false, durable=false}, {type=LOCAL, size=80.0, device=/dev/sdc, 
bootDevice=false, durable=false}, {id=vol-df82d662, type=SAN, device=/dev/sda1, 
bootDevice=true, durable=true}], hypervisor=xen, 
supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit())},
 loginUser=root, tags=[group=spena-hive-spark-ptest-slaves], 
userMetadata={owner=sergio.pena, Name=spena-hive-spark-ptest-slaves-b245ef07}} 
is bad on startup
java.lang.IllegalStateException: This stopwatch is already stopped.
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:150) 
~[guava-15.0.jar:?]
at com.google.common.base.Stopwatch.stop(Stopwatch.java:177) 
~[guava-15.0.jar:?]
at 
org.apache.hive.ptest.execution.LocalCommand.getExitCode(LocalCommand.java:59) 
~[LocalCommand.class:?]
at 
org.apache.hive.ptest.execution.ssh.SSHCommandExecutor.execute(SSHCommandExecutor.java:72)
 ~[SSHCommandExecutor.class:?]
at 
org.apache.hive.ptest.execution.context.CloudExecutionContextProvider$3.run(CloudExecutionContextProvider.java:293)
 [CloudExecutionContextProvider$3.class:?]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
[?:1.7.0_45]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_45]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[?:1.7.0_45]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[?:1.7.0_45]
at java.lang.Thread.run(Thread.java:744) [?:1.7.0_45]
2016-04-20 13:02:35 INFO  CloudExecutionContextProvider:354 - Submitting 
termination for {id=us-west-1/i-b245ef07, providerId=i-b245ef07, 
name=spena-hive-spark-ptest-slaves-b245ef07, location={scope=ZONE, 
id=us-west-1c, description=us-west-1c, parent=us-west-1, iso3166Codes=[US-CA]}, 
group=spena-hive-spark-ptest-slaves, imageId=us-west-1/ami-1ac6dc5f, 
os={family=unrecognized, arch=paravirtual, version=, 
description=360379543683/hive-spark-ptest-7, is64Bit=true}, 
status=RUNNIN

[jira] [Assigned] (HIVE-13507) Improved logging for ptest

2016-04-20 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-13507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña reassigned HIVE-13507:
--

Assignee: Sergio Peña  (was: Siddharth Seth)

> Improved logging for ptest
> --
>
> Key: HIVE-13507
> URL: https://issues.apache.org/jira/browse/HIVE-13507
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Sergio Peña
> Fix For: 2.1.0
>
> Attachments: HIVE-13507.01.patch
>
>
> Include information about batch runtimes, outlier lists, host completion 
> times, etc. Try identifying tests which cause the build to take a long time 
> while holding onto resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13395) Lost Update problem in ACID

2016-04-20 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250422#comment-15250422
 ] 

Alan Gates commented on HIVE-13395:
---

I agree we need something that works with multi-statement transaction.  But I 
think we'll find in that case that we cannot clean until all open transactions 
that could potentially see a set of changes (not just the txns with read locks) 
have closed.  That is, if I have a partition with base_10 and delta_11_20 and 
then compact so that I now have base_20 I can't clean that compaction until all 
transactions < 20 have committed or aborted.  Otherwise one of those 
transactions could try to read this partition and get the wrong version.  Since 
you have to remember all this I think this will force us to keep the necessary 
information in COMPLETED_TXN_COMPONENTS long enough.

AFAICT you have a different way of remembering all the same information, which 
is completely fine.  As long as we agree on what has to be remembered for how 
long I'm fine with doing it in a new WRITE_SET table and dropping the 
TXN_COMPONENTS table.



> Lost Update problem in ACID
> ---
>
> Key: HIVE-13395
> URL: https://issues.apache.org/jira/browse/HIVE-13395
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13395.6.patch, HIVE-13395.7.patch
>
>
> ACID users can run into Lost Update problem.
> In Hive 1.2, Driver.recordValidTxns() (which records the snapshot to use for 
> the query) is called in Driver.compile().
> Now suppose to concurrent "update T set x = x + 1" are executed.  (for 
> simplicity assume there is exactly 1 row in T)
> What can happen is that both compile at the same time (more precisely before 
> acquireLocksAndOpenTxn() in runInternal() is called) and thus will lock in 
> the same snapshot, say the value of x = 7 in this snapshot.
> Now 1 will get the lock on the row, the second will block.  
> Now 1, makes x = 8 and commits.
> Now 2 proceeds and makes x = 8 again since in it's snapshot x is still 7.
> This specific issue is solved in Hive 1.3/2.0 (HIVE-11077 which is a large 
> patch that deals with multi-statement txns) by moving recordValidTxns() after 
> locks are acquired which reduces the likelihood of this but doesn't eliminate 
> the problem.
> 
> Even in 1.3 version of the code, you could have the same issue.  Assume the 
> same 2 queries:
> Both start a txn, say txnid 9 and 10.  Say 10 gets the lock first, 9 blocks.
> 10 updates the row (so x = 8) and thus ReaderKey.currentTransactionId=10.
> 10 commits.
> Now 9 can proceed and it will get a snapshot that includes 10, i.e. it will 
> see x = 8 and it will write x = 9, but it will set 
> ReaderKey.currentTransactionId = 9.  Thus when merge logic runs, it will see 
> x = 8 is the later version of this row, i.e. lost update.
> The problem is that locks alone are insufficient for MVCC architecture.  
> 
> At lower level Row ID has (originalTransactionId, rowid, bucket id, 
> currentTransactionId) and since on update/delete we do a table scan, we could 
> check that we are about to write a row with currentTransactionId < 
> (currentTransactionId of row we've read) and fail the query.  Currently, 
> currentTransactionId is not surfaced at higher level where this check can be 
> made.
> This would not work (efficiently) longer term where we want to support fast 
> update on user defined PK vis streaming ingest.
> Also, this would not work with multi statement txns since in that case we'd 
> lock in the snapshot at the start of the txn, but then 2nd, 3rd etc queries 
> would use the same snapshot and the locks for these queries would be acquired 
> after the snapshot is locked in so this would be the same situation as pre 
> HIVE-11077.
> 
>  
> A more robust solution (commonly used with MVCC) is to keep track of start 
> and commit time (logical counter) or each transaction to detect if two txns 
> overlap.  The 2nd part is to keep track of write-set, i.e. which data (rows, 
> partitions, whatever appropriate level of granularity is) were modified by 
> any txn and if 2 txns overlap in time and wrote the same element, abort later 
> one.  This is called first-committer-wins rule.  This requires a MS DB schema 
> change
> It would be most convenient to use the same sequence for txnId, start and 
> commit time (in which case txnid=start time).  In this case we'd need to add 
> 1 filed to TXNS table.  The complication here is that we'll be using elements 
> of the sequence faster and they are used as part of file name of delta and 
> base dir and currently limited to 7 digits which can be exceeded.  So this 
> would require some thought to han

[jira] [Commented] (HIVE-13467) Show llap info on hs2 ui when available

2016-04-20 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250392#comment-15250392
 ] 

Vikram Dixit K commented on HIVE-13467:
---

Nit: Can you add the apache header for some of the new files or eliminate them 
from the rat check? Otherwise LGTM.

> Show llap info on hs2 ui when available
> ---
>
> Key: HIVE-13467
> URL: https://issues.apache.org/jira/browse/HIVE-13467
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-13467.1.patch, HIVE-13467.2.patch, 
> HIVE-13467.3.patch, HIVE-13467.4.patch, HIVE-13467.5.patch, 
> screen-shot-llap.png, screen.png
>
>
> When llap is on and hs2 is configured with access to an llap cluster, HS2 UI 
> should show some status of the daemons and provide a mechanism to click 
> through to their respective UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13395) Lost Update problem in ACID

2016-04-20 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250391#comment-15250391
 ] 

Eugene Koifman commented on HIVE-13395:
---

I wanted to have a solution that can be extended to multi-statement 
transactions.
In the general, you have to keep WriteSet info post transaction commit which 
means it can be cleaned.
For example, T[10,70] and S[35,36].  If T decides to write X after S commits, 
you still need to know if S wrote X.


> Lost Update problem in ACID
> ---
>
> Key: HIVE-13395
> URL: https://issues.apache.org/jira/browse/HIVE-13395
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13395.6.patch, HIVE-13395.7.patch
>
>
> ACID users can run into Lost Update problem.
> In Hive 1.2, Driver.recordValidTxns() (which records the snapshot to use for 
> the query) is called in Driver.compile().
> Now suppose to concurrent "update T set x = x + 1" are executed.  (for 
> simplicity assume there is exactly 1 row in T)
> What can happen is that both compile at the same time (more precisely before 
> acquireLocksAndOpenTxn() in runInternal() is called) and thus will lock in 
> the same snapshot, say the value of x = 7 in this snapshot.
> Now 1 will get the lock on the row, the second will block.  
> Now 1, makes x = 8 and commits.
> Now 2 proceeds and makes x = 8 again since in it's snapshot x is still 7.
> This specific issue is solved in Hive 1.3/2.0 (HIVE-11077 which is a large 
> patch that deals with multi-statement txns) by moving recordValidTxns() after 
> locks are acquired which reduces the likelihood of this but doesn't eliminate 
> the problem.
> 
> Even in 1.3 version of the code, you could have the same issue.  Assume the 
> same 2 queries:
> Both start a txn, say txnid 9 and 10.  Say 10 gets the lock first, 9 blocks.
> 10 updates the row (so x = 8) and thus ReaderKey.currentTransactionId=10.
> 10 commits.
> Now 9 can proceed and it will get a snapshot that includes 10, i.e. it will 
> see x = 8 and it will write x = 9, but it will set 
> ReaderKey.currentTransactionId = 9.  Thus when merge logic runs, it will see 
> x = 8 is the later version of this row, i.e. lost update.
> The problem is that locks alone are insufficient for MVCC architecture.  
> 
> At lower level Row ID has (originalTransactionId, rowid, bucket id, 
> currentTransactionId) and since on update/delete we do a table scan, we could 
> check that we are about to write a row with currentTransactionId < 
> (currentTransactionId of row we've read) and fail the query.  Currently, 
> currentTransactionId is not surfaced at higher level where this check can be 
> made.
> This would not work (efficiently) longer term where we want to support fast 
> update on user defined PK vis streaming ingest.
> Also, this would not work with multi statement txns since in that case we'd 
> lock in the snapshot at the start of the txn, but then 2nd, 3rd etc queries 
> would use the same snapshot and the locks for these queries would be acquired 
> after the snapshot is locked in so this would be the same situation as pre 
> HIVE-11077.
> 
>  
> A more robust solution (commonly used with MVCC) is to keep track of start 
> and commit time (logical counter) or each transaction to detect if two txns 
> overlap.  The 2nd part is to keep track of write-set, i.e. which data (rows, 
> partitions, whatever appropriate level of granularity is) were modified by 
> any txn and if 2 txns overlap in time and wrote the same element, abort later 
> one.  This is called first-committer-wins rule.  This requires a MS DB schema 
> change
> It would be most convenient to use the same sequence for txnId, start and 
> commit time (in which case txnid=start time).  In this case we'd need to add 
> 1 filed to TXNS table.  The complication here is that we'll be using elements 
> of the sequence faster and they are used as part of file name of delta and 
> base dir and currently limited to 7 digits which can be exceeded.  So this 
> would require some thought to handling upgrade/migration.
> Also, write-set tracking requires either additional metastore table or 
> keeping info in HIVE_LOCKS around longer with new state.
> 
> In the short term, on SQL side of things we could (in auto commit mode only)
> acquire the locks first and then open the txn AND update these locks with txn 
> id.
> This implies another Thrift change to pass in lockId to openTxn.
> The same would not work for Streaming API since it opens several txns at once 
> and then acquires locks for each.
> (Not sure if that's is an issue or not since Streaming only does Insert).
> Either way this feels hacky.
>

[jira] [Commented] (HIVE-11160) Auto-gather column stats

2016-04-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250376#comment-15250376
 ] 

Ashutosh Chauhan commented on HIVE-11160:
-

I think it will be good to separate metastore thrift changes in a different 
jira, so that we are committing smaller contained changeset. 

> Auto-gather column stats
> 
>
> Key: HIVE-11160
> URL: https://issues.apache.org/jira/browse/HIVE-11160
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11160.01.patch, HIVE-11160.02.patch, 
> HIVE-11160.03.patch, HIVE-11160.04.patch, HIVE-11160.05.patch, 
> HIVE-11160.06.patch, HIVE-11160.07.patch, HIVE-11160.08.patch, 
> HIVE-11160.09.patch
>
>
> Hive will collect table stats when set hive.stats.autogather=true during the 
> INSERT OVERWRITE command. And then the users need to collect the column stats 
> themselves using "Analyze" command. In this patch, the column stats will also 
> be collected automatically. More specifically, INSERT OVERWRITE will 
> automatically create new column stats. INSERT INTO will automatically merge 
> new column stats with existing ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10176) skip.header.line.count causes values to be skipped when performing insert values

2016-04-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250361#comment-15250361
 ] 

Ashutosh Chauhan commented on HIVE-10176:
-

Can you create a ReviewBoard entry for this? Also, can you comment why you 
chose to create a new temp file and write data into it. This doesnt look 
efficient way of dealing with the problem.

> skip.header.line.count causes values to be skipped when performing insert 
> values
> 
>
> Key: HIVE-10176
> URL: https://issues.apache.org/jira/browse/HIVE-10176
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0, 1.2.1
>Reporter: Wenbo Wang
>Assignee: Vladyslav Pavlenko
> Fix For: 2.0.0
>
> Attachments: HIVE-10176.1.patch, HIVE-10176.10.patch, 
> HIVE-10176.11.patch, HIVE-10176.2.patch, HIVE-10176.3.patch, 
> HIVE-10176.4.patch, HIVE-10176.5.patch, HIVE-10176.6.patch, 
> HIVE-10176.7.patch, HIVE-10176.8.patch, HIVE-10176.9.patch, data
>
>
> When inserting values in to tables with TBLPROPERTIES 
> ("skip.header.line.count"="1") the first value listed is also skipped. 
> create table test (row int, name string) TBLPROPERTIES 
> ("skip.header.line.count"="1"); 
> load data local inpath '/root/data' into table test;
> insert into table test values (1, 'a'), (2, 'b'), (3, 'c');
> (1, 'a') isn't inserted into the table. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13395) Lost Update problem in ACID

2016-04-20 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250354#comment-15250354
 ] 

Alan Gates commented on HIVE-13395:
---

bq. TXN_COMPONENTS/COMPLETED_TXN_COMPONENTS have different retention policies 
[then WRITE_SET]. These are governed by compaction rather than transaction 
"liveness" which don't necessarily match.
Agreed, but the retention of TXN_COMPONENTS/COMPLETED_TXN_COMPONENTS > 
WRITE_SET (because you can't clean the compaction until all the readers are 
done).  I know in HIVE-13497 you propose to eliminate TXN_COMPONENTS.  So is 
your plan to have WRITE_SET and COMPLETED_TXN_COMPONENTS as the two tables?  
That seems fine, as long as the upgrade path isn't hard on users.

> Lost Update problem in ACID
> ---
>
> Key: HIVE-13395
> URL: https://issues.apache.org/jira/browse/HIVE-13395
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-13395.6.patch, HIVE-13395.7.patch
>
>
> ACID users can run into Lost Update problem.
> In Hive 1.2, Driver.recordValidTxns() (which records the snapshot to use for 
> the query) is called in Driver.compile().
> Now suppose to concurrent "update T set x = x + 1" are executed.  (for 
> simplicity assume there is exactly 1 row in T)
> What can happen is that both compile at the same time (more precisely before 
> acquireLocksAndOpenTxn() in runInternal() is called) and thus will lock in 
> the same snapshot, say the value of x = 7 in this snapshot.
> Now 1 will get the lock on the row, the second will block.  
> Now 1, makes x = 8 and commits.
> Now 2 proceeds and makes x = 8 again since in it's snapshot x is still 7.
> This specific issue is solved in Hive 1.3/2.0 (HIVE-11077 which is a large 
> patch that deals with multi-statement txns) by moving recordValidTxns() after 
> locks are acquired which reduces the likelihood of this but doesn't eliminate 
> the problem.
> 
> Even in 1.3 version of the code, you could have the same issue.  Assume the 
> same 2 queries:
> Both start a txn, say txnid 9 and 10.  Say 10 gets the lock first, 9 blocks.
> 10 updates the row (so x = 8) and thus ReaderKey.currentTransactionId=10.
> 10 commits.
> Now 9 can proceed and it will get a snapshot that includes 10, i.e. it will 
> see x = 8 and it will write x = 9, but it will set 
> ReaderKey.currentTransactionId = 9.  Thus when merge logic runs, it will see 
> x = 8 is the later version of this row, i.e. lost update.
> The problem is that locks alone are insufficient for MVCC architecture.  
> 
> At lower level Row ID has (originalTransactionId, rowid, bucket id, 
> currentTransactionId) and since on update/delete we do a table scan, we could 
> check that we are about to write a row with currentTransactionId < 
> (currentTransactionId of row we've read) and fail the query.  Currently, 
> currentTransactionId is not surfaced at higher level where this check can be 
> made.
> This would not work (efficiently) longer term where we want to support fast 
> update on user defined PK vis streaming ingest.
> Also, this would not work with multi statement txns since in that case we'd 
> lock in the snapshot at the start of the txn, but then 2nd, 3rd etc queries 
> would use the same snapshot and the locks for these queries would be acquired 
> after the snapshot is locked in so this would be the same situation as pre 
> HIVE-11077.
> 
>  
> A more robust solution (commonly used with MVCC) is to keep track of start 
> and commit time (logical counter) or each transaction to detect if two txns 
> overlap.  The 2nd part is to keep track of write-set, i.e. which data (rows, 
> partitions, whatever appropriate level of granularity is) were modified by 
> any txn and if 2 txns overlap in time and wrote the same element, abort later 
> one.  This is called first-committer-wins rule.  This requires a MS DB schema 
> change
> It would be most convenient to use the same sequence for txnId, start and 
> commit time (in which case txnid=start time).  In this case we'd need to add 
> 1 filed to TXNS table.  The complication here is that we'll be using elements 
> of the sequence faster and they are used as part of file name of delta and 
> base dir and currently limited to 7 digits which can be exceeded.  So this 
> would require some thought to handling upgrade/migration.
> Also, write-set tracking requires either additional metastore table or 
> keeping info in HIVE_LOCKS around longer with new state.
> 
> In the short term, on SQL side of things we could (in auto commit mode only)
> acquire the locks first and then open the txn AND update these locks with txn 
> id.
> This implies another Thrift change to

[jira] [Commented] (HIVE-10293) enabling travis-ci build?

2016-04-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250350#comment-15250350
 ] 

Ashutosh Chauhan commented on HIVE-10293:
-

Is it possible to use a different version of mvn on travis-ci ? If so, if we 
can use 3.0.5 on travis-ci and then it should work.

> enabling travis-ci build?
> -
>
> Key: HIVE-10293
> URL: https://issues.apache.org/jira/browse/HIVE-10293
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Gabor Liptak
>Assignee: Gabor Liptak
>Priority: Minor
> Attachments: HIVE-10293.1.patch, HIVE-10293.2.diff
>
>
> I would like to contribute a .travis.yml for Hive.
> In particular, this would allow contributors working through Github, to 
> validate their own commits on their own branches.
> Please comment.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13429) Tool to remove dangling scratch dir

2016-04-20 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250349#comment-15250349
 ] 

Daniel Dai commented on HIVE-13429:
---

Made a minor change to the wiki, looks good now.

hive.start.cleanup.scratchdir  is for HS2. Once set, HS2 will drop scratch dir 
on start. This is not an option for multi-user environment since it will 
accidentally remove scratch dir in use. We might be able to adopt the same 
logic here to remove spare scratch dir only, but finally we decide to make it a 
standalone tool first for simplicity.

> Tool to remove dangling scratch dir
> ---
>
> Key: HIVE-13429
> URL: https://issues.apache.org/jira/browse/HIVE-13429
> Project: Hive
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>  Labels: TODOC1.3, TODOC2.1
> Fix For: 1.3.0, 2.1.0
>
> Attachments: HIVE-13429.1.patch, HIVE-13429.2.patch, 
> HIVE-13429.3.patch, HIVE-13429.4.patch, HIVE-13429.5.patch, 
> HIVE-13429.branch-1.patch
>
>
> We have seen in some cases, user will leave the scratch dir behind, and 
> eventually eat out hdfs storage. This could happen when vm restarts and leave 
> no chance for Hive to run shutdown hook. This is applicable for both HiveCli 
> and HiveServer2. Here we provide an external tool to clear dead scratch dir 
> as needed.
> We need a way to identify which scratch dir is in use. We will rely on HDFS 
> write lock for that. Here is how HDFS write lock works:
> 1. A HDFS client open HDFS file for write and only close at the time of 
> shutdown
> 2. Cleanup process can try to open HDFS file for write. If the client holding 
> this file is still running, we will get exception. Otherwise, we know the 
> client is dead
> 3. If the HDFS client dies without closing the HDFS file, NN will reclaim the 
> lease after 10 min, ie, the HDFS file hold by the dead client is writable 
> again after 10 min
> So here is how we remove dangling scratch directory in Hive:
> 1. HiveCli/HiveServer2 opens a well-named lock file in scratch directory and 
> only close it when we about to drop scratch directory
> 2. A command line tool cleardanglingscratchdir  will check every scratch 
> directory and try open the lock file for write. If it does not get exception, 
> meaning the owner is dead and we can safely remove the scratch directory
> 3. The 10 min window means it is possible a HiveCli/HiveServer2 is dead but 
> we still cannot reclaim the scratch directory for another 10 min. But this 
> should be tolerable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12634) Add command to kill an ACID transacton

2016-04-20 Thread Wei Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250292#comment-15250292
 ] 

Wei Zheng commented on HIVE-12634:
--

Got an answer from thrift-user mailing list. That's consistent to my original 
guess.
{code}
There is a built-in mechanism for temporarily variables. It basically relies
on a special prefix plus an incremented counter. The numbers are incremented
to generate variable names that do not produce collisions.
{code}

> Add command to kill an ACID transacton
> --
>
> Key: HIVE-12634
> URL: https://issues.apache.org/jira/browse/HIVE-12634
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Attachments: HIVE-12634.1.patch, HIVE-12634.2.patch, 
> HIVE-12634.3.patch
>
>
> Should add a CLI command to abort a (runaway) transaction.
> This should clean up all state related to this txn.
> The initiator of this (if still alive) will get an error trying to 
> heartbeat/commit, i.e. will become aware that the txn is dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6476) Support Append with Dynamic Partitioning

2016-04-20 Thread Mariappan Asokan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250277#comment-15250277
 ] 

Mariappan Asokan commented on HIVE-6476:


Sushanth, thank you.  This is very helpful.  I will dig into the code and if I 
have any questions I will let you know.  Is FileOutputCommitterContainer.java a 
good place to start? Can I assign this Jira to me?



> Support Append with Dynamic Partitioning
> 
>
> Key: HIVE-6476
> URL: https://issues.apache.org/jira/browse/HIVE-6476
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog, Metastore, Query Processor, Thrift API
>Reporter: Sushanth Sowmyan
>
> Currently, we do not support mixing dynamic partitioning and append in the 
> same job. One reason is that we need exhaustive testing of corner cases for 
> that, and a second reason is the behaviour of add_partitions. To support 
> dynamic partitioning with append, we'd have to have a 
> add_partitions_if_not_exist call, rather than an add_partitions call.
> Thus, the current implementation in HIVE-6475 assumes immutability for all 
> dynamic partitioning jobs, irrespective of whether or not the table is marked 
> as mutable or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13510) Dynamic partitioning doesn’t work when remote metastore is used

2016-04-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250257#comment-15250257
 ] 

Ashutosh Chauhan commented on HIVE-13510:
-

+1

> Dynamic partitioning doesn’t work when remote metastore is used
> ---
>
> Key: HIVE-13510
> URL: https://issues.apache.org/jira/browse/HIVE-13510
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.0
> Environment: Hadoop 2.7.1
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
>Priority: Critical
> Attachments: HIVE-13510.1.patch
>
>
> *Steps to reproduce:*
> # Configure remote metastore (hive.metastore.uris)
> # Create table t1 (a string);
> # Create table t2 (a string) partitioned by (b string);
> # set hive.exec.dynamic.partition.mode=nonstrict;
> # Insert overwrite table t2 partition (b) select a,a from t1;
> *Result:*
> {noformat}
> FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
> 16/04/13 15:04:51 [c679e424-2501-4347-8146-cf1b1cae217c main]: ERROR 
> ql.Driver: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
> org.apache.hadoop.hive.ql.parse.SemanticException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
> at 
> org.apache.hadoop.hive.ql.plan.DynamicPartitionCtx.(DynamicPartitionCtx.java:84)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6550)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:9315)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9204)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10071)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9949)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10607)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:358)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10618)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:233)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:245)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:476)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:318)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1192)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1287)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1118)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1106)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:236)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:339)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:748)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:721)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:648)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
> at org.apache.hadoop.hive.ql.metadata.Hive.getMetaConf(Hive.java:3493)
> at 
> org.apache.hadoop.hive.ql.plan.DynamicPartitionCtx.(DynamicPartitionCtx.java:82)
> ... 29 more
> Caused by: org.apache.thrift.TApplicationException: getMetaConf failed: 
> unknown result
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_getMetaConf(ThriftHiveMetastore.java:666)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.getMetaConf(ThriftHiveMetastore.

[jira] [Updated] (HIVE-13510) Dynamic partitioning doesn’t work when remote metastore is used

2016-04-20 Thread Illya Yalovyy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Illya Yalovyy updated HIVE-13510:
-
Status: Patch Available  (was: Open)

> Dynamic partitioning doesn’t work when remote metastore is used
> ---
>
> Key: HIVE-13510
> URL: https://issues.apache.org/jira/browse/HIVE-13510
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.0
> Environment: Hadoop 2.7.1
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
>Priority: Critical
> Attachments: HIVE-13510.1.patch
>
>
> *Steps to reproduce:*
> # Configure remote metastore (hive.metastore.uris)
> # Create table t1 (a string);
> # Create table t2 (a string) partitioned by (b string);
> # set hive.exec.dynamic.partition.mode=nonstrict;
> # Insert overwrite table t2 partition (b) select a,a from t1;
> *Result:*
> {noformat}
> FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
> 16/04/13 15:04:51 [c679e424-2501-4347-8146-cf1b1cae217c main]: ERROR 
> ql.Driver: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
> org.apache.hadoop.hive.ql.parse.SemanticException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
> at 
> org.apache.hadoop.hive.ql.plan.DynamicPartitionCtx.(DynamicPartitionCtx.java:84)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6550)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:9315)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9204)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10071)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9949)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10607)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:358)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10618)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:233)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:245)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:476)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:318)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1192)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1287)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1118)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1106)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:236)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:339)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:748)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:721)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:648)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
> at org.apache.hadoop.hive.ql.metadata.Hive.getMetaConf(Hive.java:3493)
> at 
> org.apache.hadoop.hive.ql.plan.DynamicPartitionCtx.(DynamicPartitionCtx.java:82)
> ... 29 more
> Caused by: org.apache.thrift.TApplicationException: getMetaConf failed: 
> unknown result
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_getMetaConf(ThriftHiveMetastore.java:666)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.getMetaConf(ThriftHiveMetastore.java:646)
> at 
>

[jira] [Updated] (HIVE-13510) Dynamic partitioning doesn’t work when remote metastore is used

2016-04-20 Thread Illya Yalovyy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Illya Yalovyy updated HIVE-13510:
-
Attachment: HIVE-13510.1.patch

> Dynamic partitioning doesn’t work when remote metastore is used
> ---
>
> Key: HIVE-13510
> URL: https://issues.apache.org/jira/browse/HIVE-13510
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.0
> Environment: Hadoop 2.7.1
>Reporter: Illya Yalovyy
>Assignee: Illya Yalovyy
>Priority: Critical
> Attachments: HIVE-13510.1.patch
>
>
> *Steps to reproduce:*
> # Configure remote metastore (hive.metastore.uris)
> # Create table t1 (a string);
> # Create table t2 (a string) partitioned by (b string);
> # set hive.exec.dynamic.partition.mode=nonstrict;
> # Insert overwrite table t2 partition (b) select a,a from t1;
> *Result:*
> {noformat}
> FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
> 16/04/13 15:04:51 [c679e424-2501-4347-8146-cf1b1cae217c main]: ERROR 
> ql.Driver: FAILED: SemanticException 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
> org.apache.hadoop.hive.ql.parse.SemanticException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
> at 
> org.apache.hadoop.hive.ql.plan.DynamicPartitionCtx.(DynamicPartitionCtx.java:84)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6550)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:9315)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9204)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10071)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9949)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10607)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:358)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10618)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:233)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:245)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:476)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:318)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1192)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1287)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1118)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1106)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:236)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:339)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:748)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:721)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:648)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.TApplicationException: getMetaConf failed: unknown result
> at org.apache.hadoop.hive.ql.metadata.Hive.getMetaConf(Hive.java:3493)
> at 
> org.apache.hadoop.hive.ql.plan.DynamicPartitionCtx.(DynamicPartitionCtx.java:82)
> ... 29 more
> Caused by: org.apache.thrift.TApplicationException: getMetaConf failed: 
> unknown result
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_getMetaConf(ThriftHiveMetastore.java:666)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.getMetaConf(ThriftHiveMetastore.java:646)
> at 
> org.ap

[jira] [Commented] (HIVE-13523) Fix connection leak in ORC RecordReader and refactor for unit testing

2016-04-20 Thread Thomas Poepping (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250218#comment-15250218
 ] 

Thomas Poepping commented on HIVE-13523:


I do pass the options object down from ReaderImpl. I could have kept 
RecordReaderImpl as a constructor with 11 arguments, but I thought that a 
builder would be more readable. RecordReaderImpl needs more than just the 
Options object passed from ReaderImpl, like fileSystem, path, codec, etc.

If you mean that I could have passed a properties object for RecordReaderImpl 
from ReaderImpl, I could have created a properties object for RecordReaderImpl 
(the way I did for DataReader and MetadataReader) but I thought that a builder 
made more sense and was easier to use.

> Fix connection leak in ORC RecordReader and refactor for unit testing
> -
>
> Key: HIVE-13523
> URL: https://issues.apache.org/jira/browse/HIVE-13523
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.0.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Fix For: 2.1.0, 2.0.1
>
> Attachments: HIVE-13523.patch
>
>
> In RecordReaderImpl, a MetadataReaderImpl object was being created (opening a 
> file), but never closed, causing a leak. This change closes the Metadata 
> object in RecordReaderImpl, and does substantial refactoring to make 
> RecordReaderImpl testable:
>  * Created DataReaderFactory and MetadataReaderFactory (plus default 
> implementations) so that the create() methods can be mocked to verify that 
> the objects are actually closed in RecordReaderImpl.close()
>  * Created MetadataReaderProperties and DataReaderProperties to clean up 
> argument lists, making code more readable
>  * Created a builder() for RecordReaderImpl to make the code more readable
>  * DataReader and MetadataReader now extend closeable (there was no reason 
> for them not to in the first place) so I can use the guava Closer interface: 
> http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/io/Closer.html
>  * Use the Closer interface to guarantee that regardless of if either close() 
> call fails, both will be attempted (preventing further potential leaks)
>  * Create builders for MetadataReaderProperties, DataReaderProperties, and 
> RecordReaderImpl to help with code readability



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HIVE-13352) Seems unnecessary for HBase tests to call QTestUtil.tearDown to close zookeeper and others.

2016-04-20 Thread Balint Molnar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-13352 started by Balint Molnar.

> Seems unnecessary for HBase tests to call QTestUtil.tearDown to close 
> zookeeper and others.
> ---
>
> Key: HIVE-13352
> URL: https://issues.apache.org/jira/browse/HIVE-13352
> Project: Hive
>  Issue Type: Improvement
>  Components: Test
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Balint Molnar
>
> HBase tests TestHBaseCliDriver.java right now call QTestUtil.tearDown to turn 
> off Zookeeper and others after each test. Seems we can reuse them for all the 
> tests while we just need to clear all the test data similar to TestCliDriver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13523) Fix connection leak in ORC RecordReader and refactor for unit testing

2016-04-20 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250162#comment-15250162
 ] 

Owen O'Malley commented on HIVE-13523:
--

Why did you make a builder for RecordReaderImpl, which is an internal class 
rather than just passing down the options object from the ReaderImpl?

> Fix connection leak in ORC RecordReader and refactor for unit testing
> -
>
> Key: HIVE-13523
> URL: https://issues.apache.org/jira/browse/HIVE-13523
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.0.0
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Fix For: 2.1.0, 2.0.1
>
> Attachments: HIVE-13523.patch
>
>
> In RecordReaderImpl, a MetadataReaderImpl object was being created (opening a 
> file), but never closed, causing a leak. This change closes the Metadata 
> object in RecordReaderImpl, and does substantial refactoring to make 
> RecordReaderImpl testable:
>  * Created DataReaderFactory and MetadataReaderFactory (plus default 
> implementations) so that the create() methods can be mocked to verify that 
> the objects are actually closed in RecordReaderImpl.close()
>  * Created MetadataReaderProperties and DataReaderProperties to clean up 
> argument lists, making code more readable
>  * Created a builder() for RecordReaderImpl to make the code more readable
>  * DataReader and MetadataReader now extend closeable (there was no reason 
> for them not to in the first place) so I can use the guava Closer interface: 
> http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/io/Closer.html
>  * Use the Closer interface to guarantee that regardless of if either close() 
> call fails, both will be attempted (preventing further potential leaks)
>  * Create builders for MetadataReaderProperties, DataReaderProperties, and 
> RecordReaderImpl to help with code readability



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-13520) Don't allow any test to run for longer than 45minutes in the ptest setup

2016-04-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250102#comment-15250102
 ] 

Ashutosh Chauhan edited comment on HIVE-13520 at 4/20/16 3:35 PM:
--

Lets do 1 hour timeout for now. 


was (Author: ashutoshc):
Lets do 1 hours timeout for now. 

> Don't allow any test to run for longer than 45minutes in the ptest setup
> 
>
> Key: HIVE-13520
> URL: https://issues.apache.org/jira/browse/HIVE-13520
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-13520.01.txt, HIVE-13520.02.txt
>
>
> Current timeout for batches is 2hours. This needs to be lowered. 1hour may be 
> too much as well. We can start with this, and reduce timeouts further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13520) Don't allow any test to run for longer than 45minutes in the ptest setup

2016-04-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250102#comment-15250102
 ] 

Ashutosh Chauhan commented on HIVE-13520:
-

Lets do 1 hours timeout for now. 

> Don't allow any test to run for longer than 45minutes in the ptest setup
> 
>
> Key: HIVE-13520
> URL: https://issues.apache.org/jira/browse/HIVE-13520
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-13520.01.txt, HIVE-13520.02.txt
>
>
> Current timeout for batches is 2hours. This needs to be lowered. 1hour may be 
> too much as well. We can start with this, and reduce timeouts further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13490) Change itests to be part of the main Hive build

2016-04-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250100#comment-15250100
 ] 

Ashutosh Chauhan commented on HIVE-13490:
-

+1

> Change itests to be part of the main Hive build
> ---
>
> Key: HIVE-13490
> URL: https://issues.apache.org/jira/browse/HIVE-13490
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-13490.01.patch, HIVE-13490.02.patch
>
>
> Instead of having to build Hive, and then itests separately.
> With IntelliJ, this ends up being loaded as two separate dependencies, and 
> there's a lot of hops involved to make changes.
> Does anyone know why these have been kept separate ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13511) Run clidriver tests from within the qtest dir for the precommit tests

2016-04-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250087#comment-15250087
 ] 

Ashutosh Chauhan commented on HIVE-13511:
-

This patch fails to compile when I ran {{ mvn clean install -DskipTests}} from 
within ptest2 dir, with following trace:
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
(default-testCompile) on project hive-ptest: Compilation failure: Compilation 
failure:
[ERROR] 
/Users/ashutosh/workspace/apache-master/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/conf/TestQFileTestBatch.java:[45,28]
 constructor QFileTestBatch in class 
org.apache.hive.ptest.execution.conf.QFileTestBatch cannot be applied to given 
types;
[ERROR] required: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean,java.lang.String
[ERROR] found: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean
[ERROR] reason: actual and formal argument lists differ in length
[ERROR] 
/Users/ashutosh/workspace/apache-master/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/conf/TestQFileTestBatch.java:[55,28]
 constructor QFileTestBatch in class 
org.apache.hive.ptest.execution.conf.QFileTestBatch cannot be applied to given 
types;
[ERROR] required: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean,java.lang.String
[ERROR] found: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean
[ERROR] reason: actual and formal argument lists differ in length
[ERROR] 
/Users/ashutosh/workspace/apache-master/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/conf/TestQFileTestBatch.java:[61,28]
 constructor QFileTestBatch in class 
org.apache.hive.ptest.execution.conf.QFileTestBatch cannot be applied to given 
types;
[ERROR] required: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean,java.lang.String
[ERROR] found: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean
[ERROR] reason: actual and formal argument lists differ in length
[ERROR] 
/Users/ashutosh/workspace/apache-master/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.java:[72,17]
 constructor QFileTestBatch in class 
org.apache.hive.ptest.execution.conf.QFileTestBatch cannot be applied to given 
types;
[ERROR] required: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean,java.lang.String
[ERROR] found: 
java.lang.String,java.lang.String,java.lang.String,java.util.HashSet,boolean
[ERROR] reason: actual and formal argument lists differ in length
{code}

> Run clidriver tests from within the qtest dir for the precommit tests
> -
>
> Key: HIVE-13511
> URL: https://issues.apache.org/jira/browse/HIVE-13511
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-13511.01.patch, HIVE-13511.02.patch, 
> example_maven-test.txt, example_testExecution.txt
>
>
> The tests are currently run from the itests directory - which means there's 
> additional overhead of having to at least check whether files have changed. 
> Will attach a sample output - this adds up to 40+ seconds per batch. Getting 
> rid of this should be a reasonable saving overall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-13511) Run clidriver tests from within the qtest dir for the precommit tests

2016-04-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250087#comment-15250087
 ] 

Ashutosh Chauhan edited comment on HIVE-13511 at 4/20/16 3:30 PM:
--

This patch fails to compile when I ran {{mvn clean install -DskipTests}} from 
within ptest2 dir, with following trace:
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
(default-testCompile) on project hive-ptest: Compilation failure: Compilation 
failure:
[ERROR] 
/Users/ashutosh/workspace/apache-master/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/conf/TestQFileTestBatch.java:[45,28]
 constructor QFileTestBatch in class 
org.apache.hive.ptest.execution.conf.QFileTestBatch cannot be applied to given 
types;
[ERROR] required: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean,java.lang.String
[ERROR] found: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean
[ERROR] reason: actual and formal argument lists differ in length
[ERROR] 
/Users/ashutosh/workspace/apache-master/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/conf/TestQFileTestBatch.java:[55,28]
 constructor QFileTestBatch in class 
org.apache.hive.ptest.execution.conf.QFileTestBatch cannot be applied to given 
types;
[ERROR] required: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean,java.lang.String
[ERROR] found: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean
[ERROR] reason: actual and formal argument lists differ in length
[ERROR] 
/Users/ashutosh/workspace/apache-master/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/conf/TestQFileTestBatch.java:[61,28]
 constructor QFileTestBatch in class 
org.apache.hive.ptest.execution.conf.QFileTestBatch cannot be applied to given 
types;
[ERROR] required: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean,java.lang.String
[ERROR] found: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean
[ERROR] reason: actual and formal argument lists differ in length
[ERROR] 
/Users/ashutosh/workspace/apache-master/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.java:[72,17]
 constructor QFileTestBatch in class 
org.apache.hive.ptest.execution.conf.QFileTestBatch cannot be applied to given 
types;
[ERROR] required: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean,java.lang.String
[ERROR] found: 
java.lang.String,java.lang.String,java.lang.String,java.util.HashSet,boolean
[ERROR] reason: actual and formal argument lists differ in length
{code}


was (Author: ashutoshc):
This patch fails to compile when I ran {{ mvn clean install -DskipTests}} from 
within ptest2 dir, with following trace:
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
(default-testCompile) on project hive-ptest: Compilation failure: Compilation 
failure:
[ERROR] 
/Users/ashutosh/workspace/apache-master/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/conf/TestQFileTestBatch.java:[45,28]
 constructor QFileTestBatch in class 
org.apache.hive.ptest.execution.conf.QFileTestBatch cannot be applied to given 
types;
[ERROR] required: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean,java.lang.String
[ERROR] found: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean
[ERROR] reason: actual and formal argument lists differ in length
[ERROR] 
/Users/ashutosh/workspace/apache-master/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/conf/TestQFileTestBatch.java:[55,28]
 constructor QFileTestBatch in class 
org.apache.hive.ptest.execution.conf.QFileTestBatch cannot be applied to given 
types;
[ERROR] required: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean,java.lang.String
[ERROR] found: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean
[ERROR] reason: actual and formal argument lists differ in length
[ERROR] 
/Users/ashutosh/workspace/apache-master/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/conf/TestQFileTestBatch.java:[61,28]
 constructor QFileTestBatch in class 
org.apache.hive.ptest.execution.conf.QFileTestBatch cannot be applied to given 
types;
[ERROR] required: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean,java.lang.String
[ERROR] found: 
java.lang.String,java.lang.String,java.lang.String,java.util.Set,boolean
[ERROR] reason: actual and formal argument lists differ in length
[ERROR] 
/Users/ashutosh/workspace/apache-master/testutils/ptest2/src/test/java/org/apache/hive/ptest/execution/TestExecutionPhase.java:[72,17]
 constructor QFileTestBatch in class 
org.apache.hive.ptest.execution.conf.QFileTestBatch cannot be applied to given 
types;

[jira] [Commented] (HIVE-13557) Make interval keyword optional while specifying DAY in interval arithmetic

2016-04-20 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250074#comment-15250074
 ] 

Ashutosh Chauhan commented on HIVE-13557:
-

[~cartershanklin] noted that we do allow {{select date '2012-01-01' + (-30) 
days;}} syntax. So, I think we are good here. Last point here is to allow 
{{DAY}} wherever {{DAYS}} if standard supports it. MySQL support {{DAY}} 
however postgres doesnt.

> Make interval keyword optional while specifying DAY in interval arithmetic
> --
>
> Key: HIVE-13557
> URL: https://issues.apache.org/jira/browse/HIVE-13557
> Project: Hive
>  Issue Type: Sub-task
>  Components: Types
>Reporter: Ashutosh Chauhan
>
> Currently we support expressions like: {code}
> WHERE SOLD_DATE BETWEEN ((DATE('2000-01-31'))  - INTERVAL '30' DAY) AND 
> DATE('2000-01-31')
> {code}
> We should support:
> {code}
> WHERE SOLD_DATE BETWEEN ((DATE('2000-01-31')) + (-30) DAY) AND 
> DATE('2000-01-31')
> {code}
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-13548) hive-jdbc isn't escaping slashes during PreparedStatement

2016-04-20 Thread Nasron Cheong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nasron Cheong reassigned HIVE-13548:


Assignee: Nasron Cheong  (was: Vaibhav Gumashta)

> hive-jdbc isn't escaping slashes during PreparedStatement
> -
>
> Key: HIVE-13548
> URL: https://issues.apache.org/jira/browse/HIVE-13548
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Reporter: Nasron Cheong
>Assignee: Nasron Cheong
> Attachments: HIVE-13548.patch
>
>
> Calling setString on a prepared statement with a string containing a '\' will 
> cause the SQL construction to fail.
> I believe the slash should be escaped by the setString function.
> There may be other characters that require escaping during the same call.
> Failure from the unittest without the patch:
> {code}
> Running org.apache.hive.jdbc.TestJdbcDriver2
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.738 sec <<< 
> FAILURE! - in org.apache.hive.jdbc.TestJdbcDriver2
> testSlashPreparedStatement(org.apache.hive.jdbc.TestJdbcDriver2)  Time 
> elapsed: 3.867 sec  <<< FAILURE!
> java.lang.AssertionError: java.lang.StringIndexOutOfBoundsException: String 
> index out of range: -1
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hive.jdbc.TestJdbcDriver2.testSlashPreparedStatement(TestJdbcDriver2.java:522)
> Results :
> Failed tests: 
>   TestJdbcDriver2.testSlashPreparedStatement:522 
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-13352) Seems unnecessary for HBase tests to call QTestUtil.tearDown to close zookeeper and others.

2016-04-20 Thread Balint Molnar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balint Molnar reassigned HIVE-13352:


Assignee: Balint Molnar

> Seems unnecessary for HBase tests to call QTestUtil.tearDown to close 
> zookeeper and others.
> ---
>
> Key: HIVE-13352
> URL: https://issues.apache.org/jira/browse/HIVE-13352
> Project: Hive
>  Issue Type: Improvement
>  Components: Test
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Balint Molnar
>
> HBase tests TestHBaseCliDriver.java right now call QTestUtil.tearDown to turn 
> off Zookeeper and others after each test. Seems we can reuse them for all the 
> tests while we just need to clear all the test data similar to TestCliDriver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13539) HiveHFileOutputFormat searching the wrong directory for HFiles

2016-04-20 Thread Tim Robertson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Robertson updated HIVE-13539:
-
Status: Patch Available  (was: Open)

Patch is visible on
https://github.com/apache/hive/pull/74

> HiveHFileOutputFormat searching the wrong directory for HFiles
> --
>
> Key: HIVE-13539
> URL: https://issues.apache.org/jira/browse/HIVE-13539
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 1.1.0
> Environment: Built into CDH 5.4.7
>Reporter: Tim Robertson
>Assignee: Tim Robertson
>Priority: Blocker
>
> When creating HFiles for a bulkload in HBase I believe it is looking in the 
> wrong directory to find the HFiles, resulting in the following exception:
> {code}
> Error: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators: java.io.IOException: Multiple family directories found in 
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Multiple family directories found in 
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:188)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:958)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
>   ... 7 more
> Caused by: java.io.IOException: Multiple family directories found in 
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
>   at 
> org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:158)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:185)
>   ... 11 more
> {code}
> The issue is that is looks for the HFiles in 
> {{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary}}
>  when I believe it should be looking in the task attempt subfolder, such as 
> {{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary/attempt_1461004169450_0002_r_00_1000}}.
> This can be reproduced in any HFile creation such as:
> {code:sql}
> CREATE TABLE coords_hbase(id INT, x DOUBLE, y DOUBLE)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
>   'hbase.columns.mapping' = ':key,o:x,o:y',
>   'hbase.table.default.storage.type' = 'binary');
> SET hfile.family.path=/tmp/coords_hfiles/o; 
> SET hive.hbase.generatehfiles=true;
> INSERT OVERWRITE TABLE coords_hbase 
> SELECT id, decimalLongitude, decimalLatitude
> FROM source
> CLUSTER BY id; 
> {code}
> Any advice greatly appreciated



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13539) HiveHFileOutputFormat searching the wrong directory for HFiles

2016-04-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15249878#comment-15249878
 ] 

ASF GitHub Bot commented on HIVE-13539:
---

GitHub user timrobertson100 opened a pull request:

https://github.com/apache/hive/pull/74

HIVE-13539: HiveHFileOutputFormat searching the wrong directory for H…

I believe this is a fix for https://issues.apache.org/jira/browse/HIVE-13539

When there are several reducers (or speculative execution) there becomes 
multiple output dircetories for the task attempts.  Previous behaviour threw 
exception incorrectly as it assumed multiple HFiles.

Here, I am attempting to start the descending directory search from the 
task attempt folder and not from the higher directory for the table.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/timrobertson100/hive master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/74.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #74


commit a4abe6bec0c0141bddcbfa75408b49123af65066
Author: timrobertson100 
Date:   2016-04-20T13:23:24Z

HIVE-13539: HiveHFileOutputFormat searching the wrong directory for HFiles




> HiveHFileOutputFormat searching the wrong directory for HFiles
> --
>
> Key: HIVE-13539
> URL: https://issues.apache.org/jira/browse/HIVE-13539
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 1.1.0
> Environment: Built into CDH 5.4.7
>Reporter: Tim Robertson
>Assignee: Tim Robertson
>Priority: Blocker
>
> When creating HFiles for a bulkload in HBase I believe it is looking in the 
> wrong directory to find the HFiles, resulting in the following exception:
> {code}
> Error: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators: java.io.IOException: Multiple family directories found in 
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Multiple family directories found in 
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:188)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:958)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
>   ... 7 more
> Caused by: java.io.IOException: Multiple family directories found in 
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
>   at 
> org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:158)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:185)
>   ... 11 more
> {code}
> The issue is that is looks for the HFiles in 
> {{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary}}
>  when I believe it should be looking in the task attempt subfolder, such as 
> {{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary/attempt_1461004169450_0002_r_00_1000}}.
> This can be reproduced in any HFile creation such as:
> {code:sql}
> CREATE TABLE coords_hbase(id INT, x DOUBLE, y DOUBLE)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
>   'hbase.columns.mapping' = ':key,o:x,o:y',
>   'hbase.table.default.storage.type' = 'binary');
> SET hfile.family.path=/tmp/coords_hfiles/o; 
> SET hive.hbase.generatehfiles=true;
> INSERT OVERWRITE TABLE coords_hbase 
> SELECT id, decimalLongitude, decimalLatitude
> FROM source
> CLUSTER BY id; 
> {code}
> Any advice greatly appre

[jira] [Updated] (HIVE-13539) HiveHFileOutputFormat searching the wrong directory for HFiles

2016-04-20 Thread Tim Robertson (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Robertson updated HIVE-13539:
-
Summary: HiveHFileOutputFormat searching the wrong directory for HFiles  
(was: HiveHFileOutputFormat searching the wrong directory for HFiles?)

> HiveHFileOutputFormat searching the wrong directory for HFiles
> --
>
> Key: HIVE-13539
> URL: https://issues.apache.org/jira/browse/HIVE-13539
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 1.1.0
> Environment: Built into CDH 5.4.7
>Reporter: Tim Robertson
>Assignee: Tim Robertson
>Priority: Blocker
>
> When creating HFiles for a bulkload in HBase I believe it is looking in the 
> wrong directory to find the HFiles, resulting in the following exception:
> {code}
> Error: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators: java.io.IOException: Multiple family directories found in 
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Multiple family directories found in 
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:188)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:958)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
>   ... 7 more
> Caused by: java.io.IOException: Multiple family directories found in 
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
>   at 
> org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:158)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:185)
>   ... 11 more
> {code}
> The issue is that is looks for the HFiles in 
> {{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary}}
>  when I believe it should be looking in the task attempt subfolder, such as 
> {{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary/attempt_1461004169450_0002_r_00_1000}}.
> This can be reproduced in any HFile creation such as:
> {code:sql}
> CREATE TABLE coords_hbase(id INT, x DOUBLE, y DOUBLE)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
>   'hbase.columns.mapping' = ':key,o:x,o:y',
>   'hbase.table.default.storage.type' = 'binary');
> SET hfile.family.path=/tmp/coords_hfiles/o; 
> SET hive.hbase.generatehfiles=true;
> INSERT OVERWRITE TABLE coords_hbase 
> SELECT id, decimalLongitude, decimalLatitude
> FROM source
> CLUSTER BY id; 
> {code}
> Any advice greatly appreciated



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13539) HiveHFileOutputFormat searching the wrong directory for HFiles?

2016-04-20 Thread Lars Francke (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke updated HIVE-13539:

Assignee: Tim Robertson  (was: Sushanth Sowmyan)

> HiveHFileOutputFormat searching the wrong directory for HFiles?
> ---
>
> Key: HIVE-13539
> URL: https://issues.apache.org/jira/browse/HIVE-13539
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 1.1.0
> Environment: Built into CDH 5.4.7
>Reporter: Tim Robertson
>Assignee: Tim Robertson
>Priority: Blocker
>
> When creating HFiles for a bulkload in HBase I believe it is looking in the 
> wrong directory to find the HFiles, resulting in the following exception:
> {code}
> Error: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators: java.io.IOException: Multiple family directories found in 
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Multiple family directories found in 
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:188)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:958)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
>   ... 7 more
> Caused by: java.io.IOException: Multiple family directories found in 
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
>   at 
> org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:158)
>   at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:185)
>   ... 11 more
> {code}
> The issue is that is looks for the HFiles in 
> {{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary}}
>  when I believe it should be looking in the task attempt subfolder, such as 
> {{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary/attempt_1461004169450_0002_r_00_1000}}.
> This can be reproduced in any HFile creation such as:
> {code:sql}
> CREATE TABLE coords_hbase(id INT, x DOUBLE, y DOUBLE)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
>   'hbase.columns.mapping' = ':key,o:x,o:y',
>   'hbase.table.default.storage.type' = 'binary');
> SET hfile.family.path=/tmp/coords_hfiles/o; 
> SET hive.hbase.generatehfiles=true;
> INSERT OVERWRITE TABLE coords_hbase 
> SELECT id, decimalLongitude, decimalLatitude
> FROM source
> CLUSTER BY id; 
> {code}
> Any advice greatly appreciated



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13429) Tool to remove dangling scratch dir

2016-04-20 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15249397#comment-15249397
 ] 

Lefty Leverenz commented on HIVE-13429:
---

[~sladymon] documented the *cleardanglingscratchdir* tool in Setting Up 
HiveServer2, and the CLI doc has a link to it:

* [Setting Up HiveServer2 -- Scratch Directory Management | 
https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-ScratchDirectoryManagement]
** [ClearDanglingScratchDirTool | 
https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-ClearDanglingScratchDirTool]
* [Hive CLI -- Tool to Clear Dangling Scratch Directories | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-TooltoClearDanglingScratchDirectories]

[~daijy], please review.  If it's okay, we can remove the TODOC labels.  
Otherwise we'll make revisions as needed.

Question about another scratchdir parameter:  Is 
*hive.start.cleanup.scratchdir* just for the original Hive server, or is it 
also for HiveServer2?  In either case, its description should be updated to 
make this clear.

* [Configuration Properties -- hive.start.cleanup.scratchdir | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.start.cleanup.scratchdir]

> Tool to remove dangling scratch dir
> ---
>
> Key: HIVE-13429
> URL: https://issues.apache.org/jira/browse/HIVE-13429
> Project: Hive
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>  Labels: TODOC1.3, TODOC2.1
> Fix For: 1.3.0, 2.1.0
>
> Attachments: HIVE-13429.1.patch, HIVE-13429.2.patch, 
> HIVE-13429.3.patch, HIVE-13429.4.patch, HIVE-13429.5.patch, 
> HIVE-13429.branch-1.patch
>
>
> We have seen in some cases, user will leave the scratch dir behind, and 
> eventually eat out hdfs storage. This could happen when vm restarts and leave 
> no chance for Hive to run shutdown hook. This is applicable for both HiveCli 
> and HiveServer2. Here we provide an external tool to clear dead scratch dir 
> as needed.
> We need a way to identify which scratch dir is in use. We will rely on HDFS 
> write lock for that. Here is how HDFS write lock works:
> 1. A HDFS client open HDFS file for write and only close at the time of 
> shutdown
> 2. Cleanup process can try to open HDFS file for write. If the client holding 
> this file is still running, we will get exception. Otherwise, we know the 
> client is dead
> 3. If the HDFS client dies without closing the HDFS file, NN will reclaim the 
> lease after 10 min, ie, the HDFS file hold by the dead client is writable 
> again after 10 min
> So here is how we remove dangling scratch directory in Hive:
> 1. HiveCli/HiveServer2 opens a well-named lock file in scratch directory and 
> only close it when we about to drop scratch directory
> 2. A command line tool cleardanglingscratchdir  will check every scratch 
> directory and try open the lock file for write. If it does not get exception, 
> meaning the owner is dead and we can safely remove the scratch directory
> 3. The 10 min window means it is possible a HiveCli/HiveServer2 is dead but 
> we still cannot reclaim the scratch directory for another 10 min. But this 
> should be tolerable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

90 matches

Mail list logo