date:20160809

[jira] [Updated] (HIVE-14436) Hive 1.2.1/Hitting "ql.Driver: FAILED: IllegalArgumentException Error: , expected at the end of 'decimal(9'" after enabling hive.optimize.skewjoin and with MR engine

2016-08-09 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-14436:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: 2.2.0
  1.3.0
Target Version/s: 1.3.0, 2.2.0  (was: 2.2.0)
  Status: Resolved  (was: Patch Available)

Patch pushed to master and branch-1. Thanks Gopal, Ashutosh!

> Hive 1.2.1/Hitting "ql.Driver: FAILED: IllegalArgumentException Error: , 
> expected at the end of 'decimal(9'" after enabling hive.optimize.skewjoin and 
> with MR engine
> -
>
> Key: HIVE-14436
> URL: https://issues.apache.org/jira/browse/HIVE-14436
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
> Environment: HDP 2.4.2/ Hive 1.2.1
>Reporter: Ratish Maruthiyodan
>Assignee: Daniel Dai
>  Labels: code
> Fix For: 1.3.0, 2.2.0
>
> Attachments: HIVE-14436.1.patch, HIVE-14436.2.patch
>
>
> PROBLEM:
> The following Query run with MapReduce engine with "hive.optimize.skewjoin = 
> true" fails with error:
> "FAILED: IllegalArgumentException Error: , expected at the end of 
> 'decimal(9'" 
> > SELECT a.col1 FROM db.tableA a  INNER JOIN  db.tableB b  ON b.key=a.key 
> > limit 5;
> FAILED: IllegalArgumentException Error: , expected at the end of 'decimal(9'
> 16/08/04 12:47:50 [main]: ERROR ql.Driver: FAILED: IllegalArgumentException 
> Error: , expected at the end of 'decimal(9'
> java.lang.IllegalArgumentException: Error: , expected at the end of 
> 'decimal(9'
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:336)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseParams(TypeInfoUtils.java:378)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parsePrimitiveParts(TypeInfoUtils.java:518)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.parsePrimitiveParts(TypeInfoUtils.java:533)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.createPrimitiveTypeInfo(TypeInfoFactory.java:136)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.getPrimitiveTypeInfo(TypeInfoFactory.java:109)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.GenMRSkewJoinProcessor.processSkewJoin(GenMRSkewJoinProcessor.java:214)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.SkewJoinProcFactory$SkewJoinJoinProcessor.process(SkewJoinProcFactory.java:60)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.SkewJoinResolver$SkewJoinTaskDispatcher.dispatch(SkewJoinResolver.java:100)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.SkewJoinResolver.resolve(SkewJoinResolver.java:55)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:107)
>   at 
> org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:270)
>   at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10219)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:211)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:459)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:316)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1189)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126)

[jira] [Commented] (HIVE-14270) Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-09 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414740#comment-15414740
 ] 

Lefty Leverenz commented on HIVE-14270:
---

I requested a punctuation change on RB for the 
*hive.blobstore.use.blobstore.as.scratchdir* description.

> Write temporary data to HDFS when doing inserts on tables located on S3
> ---
>
> Key: HIVE-14270
> URL: https://issues.apache.org/jira/browse/HIVE-14270
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14270.1.patch, HIVE-14270.2.patch, 
> HIVE-14270.3.patch, HIVE-14270.4.patch, HIVE-14270.5.patch
>
>
> Currently, when doing INSERT statements on tables located at S3, Hive writes 
> and reads temporary (or intermediate) files to S3 as well. 
> If HDFS is still the default filesystem on Hive, then we can keep such 
> temporary files on HDFS to keep things run faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414674#comment-15414674
 ] 

Hive QA commented on HIVE-14448:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12822658/HIVE-14448.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10460 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/835/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/835/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-835/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12822658 - PreCommit-HIVE-MASTER-Build

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>

[jira] [Commented] (HIVE-14063) beeline to auto connect to the HiveServer2

2016-08-09 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414639#comment-15414639
 ] 

Vihang Karajgaonkar commented on HIVE-14063:


I had a offline discussion with [~dlo]. Copying his comment here for the record.

{quote}
Why is it a problem for there to be more properties in hive-site.xml than you 
need? Can't you just add what you need to hive-site.xml? Today, hive-site.xml 
already serves different use-cases that use different subsets of properties, 
such as the old hive CLI, HS2, and the Hive Metastore Server. It's not a 
problem that there are HS2-specific properties in that file. 

The most logical place for these configs is to add them to hive-site.xml - the 
existing standard client config file for hive. I'm sure some users are going to 
put configs into hive-site.xml and be confused when they have to actually go 
into this new config file.

When you say a "properties file", you mean it'll be like a java properties 
file, and NOT a Hadoop XML file? I strongly recommend against this. Hadoop XML 
is the standard config format, and using something else means you can't 
leverage standard Hadoop libraries like the Hadoop credential provider.
{quote}

> beeline to auto connect to the HiveServer2
> --
>
> Key: HIVE-14063
> URL: https://issues.apache.org/jira/browse/HIVE-14063
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: beeline.conf.template
>
>
> Currently one has to give an jdbc:hive2 url in order for Beeline to connect a 
> hiveserver2 instance. It would be great if Beeline can get the info somehow 
> (from a properties file at a well-known location?) and connect automatically 
> if user doesn't specify such a url. If the properties file is not present, 
> then beeline would expect user to provide the url and credentials using 
> !connect or ./beeline -u .. commands
> While Beeline is flexible (being a mere JDBC client), most environments would 
> have just a single HS2. Having users to manually connect into this via either 
> "beeline ~/.propsfile" or -u or !connect statements is lowering the 
> experience part.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-09 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414611#comment-15414611
 ] 

Eugene Koifman commented on HIVE-14448:
---

cc [~mmccline]  Matt probably knows most about this

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at

[jira] [Commented] (HIVE-14453) refactor physical writing of ORC data and metadata to FS from the logical writers

2016-08-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414597#comment-15414597
 ] 

Hive QA commented on HIVE-14453:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12822852/HIVE-14453.01.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10459 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testConnection
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValid
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValidNeg
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeTokenAuth
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testProxyAuth
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testTokenAuth
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/834/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/834/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-834/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12822852 - PreCommit-HIVE-MASTER-Build

> refactor physical writing of ORC data and metadata to FS from the logical 
> writers
> -
>
> Key: HIVE-14453
> URL: https://issues.apache.org/jira/browse/HIVE-14453
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14453.01.patch, HIVE-14453.patch
>
>
> ORC data doesn't have to go directly into an HDFS stream via buffers, it can 
> go somewhere else (e.g. a write-thru cache, or an addressable system that 
> doesn't require the stream blocks to be held in memory before writing them 
> all together).
> To that effect, it would be nice to abstract the data block/metadata 
> structure creating from the physical file concerns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414595#comment-15414595
 ] 

Sergey Shelukhin commented on HIVE-14448:
-

Grrr... this code now is a total confusion. reader schema has 3 user cols but 
file schema has 6 acid and 2 user cols. The code that tries to map that to get 
fileIncluded basically creates some completely bogus mapping. I'd like to read 
the code there more to clean up what is where and make it more explicit, rather 
than trying another band aid fix. I wonder why reader schema for ACID is like 
that; and if it should indeed be like that, one would need to check if 
SchemaEvolution class is correct there, and if it's being used correctly by 
split generation.

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at

[jira] [Commented] (HIVE-14412) Add a timezone-aware timestamp

2016-08-09 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414594#comment-15414594
 ] 

Rui Li commented on HIVE-14412:
---

The error is {{Invalid JDK version in profile 'doclint-java8-disable': 
Unbounded range [1.8, for project net.schmizz:sshj}}. It happens when we run
{code}
cd hive/testutils/ptest2
mvn clean package -DskipTests -Drat.numUnapprovedLicenses=1000 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-HIVE-MASTER-Build/.m2
{code}
I found the following in the pom of sshj-0.10.1-SNAPSHOT
{code}

doclint-java8-disable

[1.8,




org.apache.maven.plugins
maven-javadoc-plugin

-Xdoclint:none





{code}

But like I said, I can't reproduce the error locally with maven 3.3.9.

> Add a timezone-aware timestamp
> --
>
> Key: HIVE-14412
> URL: https://issues.apache.org/jira/browse/HIVE-14412
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-14412.1.patch, HIVE-14412.1.patch
>
>
> Java's Timestamp stores the time elapsed since the epoch. While it's by 
> itself unambiguous, ambiguity comes when we parse a string into timestamp, or 
> convert a timestamp to string, causing problems like HIVE-14305.
> To solve the issue, I think we should make timestamp aware of timezone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-09 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414583#comment-15414583
 ] 

Eugene Koifman commented on HIVE-14448:
---

[~sershe], [~prasanth_j], is the issue related to schema evolution something 
you can address here as well?  It's blocking some cases related to HIVE-14035

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
>

[jira] [Updated] (HIVE-14428) HadoopMetrics2Reporter leaks memory if the metrics sink is not configured correctly

2016-08-09 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14428:
-
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Patch committed to branch-2.1 and master
Thanks for the review [~sseth]!


> HadoopMetrics2Reporter leaks memory if the metrics sink is not configured 
> correctly
> ---
>
> Key: HIVE-14428
> URL: https://issues.apache.org/jira/browse/HIVE-14428
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Siddharth Seth
>Assignee: Thejas M Nair
>Priority: Critical
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14428.1.patch
>
>
> About 80MB held after 7 hours of running. Metrics2Collector aggregates only 
> when it's invoked by the Hadoop sink.
> Options - the first one is better IMO.
> 1. Fix Metrics2Collector to aggregate more often, and fix the dependency in 
> Hive accordingly
> 2. Don't setup the metrics sub-system if a sink is not configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414566#comment-15414566
 ] 

Sergey Shelukhin commented on HIVE-14502:
-

MiniTez runs HBase metastore, MiniLlap doesn't. It probably doesn't cause the 
slowdown, but we might want to switch MiniLlap to that to retain coverage.

> Convert MiniTez tests to MiniLlap tests
> ---
>
> Key: HIVE-14502
> URL: https://issues.apache.org/jira/browse/HIVE-14502
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
> than MiniTezCliDriver because of threaded executors and caching. 
> MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To 
> cut down this test time significantly it makes sense to move over mive tez 
> tests to mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14433) refactor LLAP plan cache avoidance and fix issue in merge processor

2016-08-09 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14433:

Attachment: HIVE-14433.02.patch

HiveQA failed for some stupid reason...

> refactor LLAP plan cache avoidance and fix issue in merge processor
> ---
>
> Key: HIVE-14433
> URL: https://issues.apache.org/jira/browse/HIVE-14433
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.1, 2.2.0, 2.1.1
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14433.01.patch, HIVE-14433.02.patch, 
> HIVE-14433.patch
>
>
> Map and reduce processors do this:
> {noformat}
> if (LlapProxy.isDaemon()) {
>   cache = new org.apache.hadoop.hive.ql.exec.mr.ObjectCache(); // do not 
> cache plan
> ...
> {noformat}
> but merge processor just gets the plan. If it runs in LLAP, it can get a 
> cached plan. Need to move this logic into ObjectCache itself, via a isPlan 
> arg or something. That will also fix this issue for merge processor



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14418) Hive config validation prevents unsetting the settings

2016-08-09 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14418:

Attachment: HIVE-14418.01.patch

> Hive config validation prevents unsetting the settings
> --
>
> Key: HIVE-14418
> URL: https://issues.apache.org/jira/browse/HIVE-14418
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14418.01.patch, HIVE-14418.patch
>
>
> {noformat}
> hive> set hive.tez.task.scale.memory.reserve.fraction.max=;
> Query returned non-zero code: 1, cause: 'SET 
> hive.tez.task.scale.memory.reserve.fraction.max=' FAILED because 
> hive.tez.task.scale.memory.reserve.fraction.max expects FLOAT type value.
> hive> set hive.tez.task.scale.memory.reserve.fraction.max=null;
> Query returned non-zero code: 1, cause: 'SET 
> hive.tez.task.scale.memory.reserve.fraction.max=null' FAILED because 
> hive.tez.task.scale.memory.reserve.fraction.max expects FLOAT type value.
> {noformat}
> unset also doesn't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14418) Hive config validation prevents unsetting the settings

2016-08-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414557#comment-15414557
 ] 

Sergey Shelukhin commented on HIVE-14418:
-

The patch works..
{noformat}
hive> set hive.execution.mode;
hive.execution.mode=llap
hive> set hive.execution.mode=;
hive> set hive.execution.mode;
hive.execution.mode=container
hive> set hive.execution.mode=foo;
Query returned non-zero code: 1, cause: 'SET hive.execution.mode=foo' FAILED in 
validation : Invalid value.. expects one of [container, llap].
{noformat}

Will reattach for HiveQA

> Hive config validation prevents unsetting the settings
> --
>
> Key: HIVE-14418
> URL: https://issues.apache.org/jira/browse/HIVE-14418
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14418.patch
>
>
> {noformat}
> hive> set hive.tez.task.scale.memory.reserve.fraction.max=;
> Query returned non-zero code: 1, cause: 'SET 
> hive.tez.task.scale.memory.reserve.fraction.max=' FAILED because 
> hive.tez.task.scale.memory.reserve.fraction.max expects FLOAT type value.
> hive> set hive.tez.task.scale.memory.reserve.fraction.max=null;
> Query returned non-zero code: 1, cause: 'SET 
> hive.tez.task.scale.memory.reserve.fraction.max=null' FAILED because 
> hive.tez.task.scale.memory.reserve.fraction.max expects FLOAT type value.
> {noformat}
> unset also doesn't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14376) Schema evolution tests takes a long time

2016-08-09 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414552#comment-15414552
 ] 

Prasanth Jayachandran commented on HIVE-14376:
--

[~sseth] Can you please review this patch?

> Schema evolution tests takes a long time
> 
>
> Key: HIVE-14376
> URL: https://issues.apache.org/jira/browse/HIVE-14376
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0, 2.1.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
> Attachments: HIVE-14376.1.patch
>
>
> schema_evol_* tests (14 tests) takes 47 mins in tez, 40mins in TestCliDriver. 
> Same set of tests are added to llap as well in HIVE-14355 which will some 
> more time. Most tests uses INSERT into table which can be slow and is not 
> needed. Even some individual tests takes about 5 mins to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14376) Schema evolution tests takes a long time

2016-08-09 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14376:
-
Status: Patch Available  (was: Open)

> Schema evolution tests takes a long time
> 
>
> Key: HIVE-14376
> URL: https://issues.apache.org/jira/browse/HIVE-14376
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0, 2.1.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
> Attachments: HIVE-14376.1.patch
>
>
> schema_evol_* tests (14 tests) takes 47 mins in tez, 40mins in TestCliDriver. 
> Same set of tests are added to llap as well in HIVE-14355 which will some 
> more time. Most tests uses INSERT into table which can be slow and is not 
> needed. Even some individual tests takes about 5 mins to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14376) Schema evolution tests takes a long time

2016-08-09 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14376:
-
Attachment: HIVE-14376.1.patch

This patch also removes all schema_evol* files from MiniTez. Schema evolution 
code path should not be dependent on the engine. MiniTez tests took 1.5 hrs on 
my laptop but MiniLlap took only 14 mins. It also removes the explicit order by 
from the queries.

> Schema evolution tests takes a long time
> 
>
> Key: HIVE-14376
> URL: https://issues.apache.org/jira/browse/HIVE-14376
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0, 2.1.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
> Attachments: HIVE-14376.1.patch
>
>
> schema_evol_* tests (14 tests) takes 47 mins in tez, 40mins in TestCliDriver. 
> Same set of tests are added to llap as well in HIVE-14355 which will some 
> more time. Most tests uses INSERT into table which can be slow and is not 
> needed. Even some individual tests takes about 5 mins to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13913) LLAP: introduce backpressure to recordreader

2016-08-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414513#comment-15414513
 ] 

Sergey Shelukhin commented on HIVE-13913:
-

la la la...

> LLAP: introduce backpressure to recordreader
> 
>
> Key: HIVE-13913
> URL: https://issues.apache.org/jira/browse/HIVE-13913
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13913.01.patch, HIVE-13913.02.patch, 
> HIVE-13913.03.patch, HIVE-13913.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14189) backport HIVE-13945 to branch-1

2016-08-09 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14189:

Attachment: HIVE-14189.03-branch-1.patch

The same patch again...

> backport HIVE-13945 to branch-1
> ---
>
> Key: HIVE-14189
> URL: https://issues.apache.org/jira/browse/HIVE-14189
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: TODOC1.3
> Attachments: HIVE-14189-branch-1.patch, HIVE-14189.01-branch-1.patch, 
> HIVE-14189.02-branch-1.patch, HIVE-14189.03-branch-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14509) AvroSerde mutates tinyint and smallint columns when specifying native columns

2016-08-09 Thread Mark Wagner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner updated HIVE-14509:
---
Attachment: avro-tinyint.demo.patch

Here's a patch for a q test which demonstrates the problem. The flow goes like 
this:
* AvroSerde is initialized with the right columns
* Avro doesn't have a tinyint type, so it is stored as int. Internally, the 
AvroSerde translates the column types to an Avro schema, then generates an OI 
from that. This OI then has a IntegerColumnInspector in it instead of a 
TinyintColumnInspector.
* Table.getColsInternal gets to this section:
{code}

try {
  // Do the lightweight check for general case.
  if (hasMetastoreBasedSchema(SessionState.getSessionConf(), 
serializationLib)) {
return tTable.getSd().getCols();
  } else if (forMs && !shouldStoreFieldsInMetastore(
  SessionState.getSessionConf(), serializationLib, 
tTable.getParameters())) {
return Hive.getFieldsFromDeserializerForMsStorage(this, 
getDeserializer());
  } else {
return MetaStoreUtils.getFieldsFromDeserializer(getTableName(), 
getDeserializer());
  }
{code}
which dutifully sets the columns according to the OI returned by the Serde.

> AvroSerde mutates tinyint and smallint columns when specifying native columns
> -
>
> Key: HIVE-14509
> URL: https://issues.apache.org/jira/browse/HIVE-14509
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 2.2.0
>Reporter: Mark Wagner
> Attachments: avro-tinyint.demo.patch
>
>
> tinyint and smallint go in, int comes out:
> {noformat}
> string1 string  
> int1int 
> tinyint1int 
> smallint1   int 
> bigint1 bigint  
> boolean1boolean 
> float1  float   
> double1 double  
> list1   array   
> map1map 
> struct1 struct  
>   
> enum1   string  
> nullableint int 
> bytes1  binary  
> fixed1  binary
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414488#comment-15414488
 ] 

Sergey Shelukhin commented on HIVE-14233:
-

Some comments on RB

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14479) Add some join tests for acid table

2016-08-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414460#comment-15414460
 ] 

Hive QA commented on HIVE-14479:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12822853/HIVE-14479.2.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10461 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_acid_mapjoin
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testDelayedLocalityNodeCommErrorImmediateAllocation
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testForcedLocalityMultiplePreemptionsSameHost2
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/833/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/833/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-833/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12822853 - PreCommit-HIVE-MASTER-Build

> Add some join tests for acid table
> --
>
> Key: HIVE-14479
> URL: https://issues.apache.org/jira/browse/HIVE-14479
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-14479.1.patch, HIVE-14479.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12181) Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver

2016-08-09 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-12181:

Attachment: HIVE-12181.15.patch

> Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver
> ---
>
> Key: HIVE-12181
> URL: https://issues.apache.org/jira/browse/HIVE-12181
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12181.1.patch, HIVE-12181.10.patch, 
> HIVE-12181.12.patch, HIVE-12181.13.patch, HIVE-12181.15.patch, 
> HIVE-12181.2.patch, HIVE-12181.3.patch, HIVE-12181.4.patch, 
> HIVE-12181.7.patch, HIVE-12181.8.patch, HIVE-12181.9.patch, HIVE-12181.patch, 
> HIVE-12181.patch
>
>
> There was a performance concern earlier, but HIVE-7587 has fixed that. We can 
> change the default to true now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12181) Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver

2016-08-09 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-12181:

Status: Patch Available  (was: Open)

> Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver
> ---
>
> Key: HIVE-12181
> URL: https://issues.apache.org/jira/browse/HIVE-12181
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12181.1.patch, HIVE-12181.10.patch, 
> HIVE-12181.12.patch, HIVE-12181.13.patch, HIVE-12181.15.patch, 
> HIVE-12181.2.patch, HIVE-12181.3.patch, HIVE-12181.4.patch, 
> HIVE-12181.7.patch, HIVE-12181.8.patch, HIVE-12181.9.patch, HIVE-12181.patch, 
> HIVE-12181.patch
>
>
> There was a performance concern earlier, but HIVE-7587 has fixed that. We can 
> change the default to true now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-12181) Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver

2016-08-09 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-12181:

Status: Open  (was: Patch Available)

> Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver
> ---
>
> Key: HIVE-12181
> URL: https://issues.apache.org/jira/browse/HIVE-12181
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12181.1.patch, HIVE-12181.10.patch, 
> HIVE-12181.12.patch, HIVE-12181.13.patch, HIVE-12181.2.patch, 
> HIVE-12181.3.patch, HIVE-12181.4.patch, HIVE-12181.7.patch, 
> HIVE-12181.8.patch, HIVE-12181.9.patch, HIVE-12181.patch, HIVE-12181.patch
>
>
> There was a performance concern earlier, but HIVE-7587 has fixed that. We can 
> change the default to true now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14412) Add a timezone-aware timestamp

2016-08-09 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-14412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414388#comment-15414388
 ] 

Sergio Peña commented on HIVE-14412:


[~lirui] The 'console output' link does not work anymore. Could you paste the 
error you're seeing instead?

> Add a timezone-aware timestamp
> --
>
> Key: HIVE-14412
> URL: https://issues.apache.org/jira/browse/HIVE-14412
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-14412.1.patch, HIVE-14412.1.patch
>
>
> Java's Timestamp stores the time elapsed since the epoch. While it's by 
> itself unambiguous, ambiguity comes when we parse a string into timestamp, or 
> convert a timestamp to string, causing problems like HIVE-14305.
> To solve the issue, I think we should make timestamp aware of timezone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14457) Partitions in encryption zone are still trashed though an exception is returned

2016-08-09 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414385#comment-15414385
 ] 

Sergio Peña commented on HIVE-14457:


LGTM +1

> Partitions in encryption zone are still trashed though an exception is 
> returned
> ---
>
> Key: HIVE-14457
> URL: https://issues.apache.org/jira/browse/HIVE-14457
> Project: Hive
>  Issue Type: Bug
>  Components: Encryption, Metastore
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-14457.patch
>
>
> drop_partition_common in HiveMetaStore still drops partitions in encryption 
> zone without PURGE even through it returns an exception. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-09 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414383#comment-15414383
 ] 

Saket Saurabh commented on HIVE-14035:
--

[~sershe] Thanks for the comments on RB. I am working on fixing those. No, the 
last run for patch 13 did not have split-update enabled by default. There are 
many tests that assert on number of files and directory layout that would 
anyway fail in PTest if we run those tests w/o modification. However, excluding 
those assert failures, when I ran these locally, the only other failures were 
NegativeArrayIndexException & IndexOutOfBoundException caused by HIVE-14448 and 
not related to this patch. However, I have created a subclass TestTxnCommands3 
that should ideally mimic this behavior with split-update enabled by default 
for a large number of ACID scenarios.

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14439) LlapTaskScheduler should try scheduling tasks when a node is disabled

2016-08-09 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14439:
--
   Resolution: Fixed
Fix Version/s: 2.1.1
   Status: Resolved  (was: Patch Available)

> LlapTaskScheduler should try scheduling tasks when a node is disabled
> -
>
> Key: HIVE-14439
> URL: https://issues.apache.org/jira/browse/HIVE-14439
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Fix For: 2.1.1
>
> Attachments: HIVE-14439.01.patch, HIVE-14439.02.patch, 
> HIVE-14439.03.patch
>
>
> When a node is disabled - try scheduling pending tasks. Tasks which may have 
> been waiting for the node to become available could become candidates for 
> scheduling on alternate nodes depending on the locality delay and disable 
> duration.
> This is what is causing an occasional timeout on 
> testDelayedLocalityNodeCommErrorImmediateAllocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-09 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414374#comment-15414374
 ] 

Eugene Koifman commented on HIVE-14035:
---


Can't make this the default yet because it will break RU (the downgrade part 
specifically).

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3

2016-08-09 Thread Abdullah Yousufi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdullah Yousufi updated HIVE-14373:

Attachment: HIVE-14373.patch

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Abdullah Yousufi
> Attachments: HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3

2016-08-09 Thread Abdullah Yousufi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdullah Yousufi updated HIVE-14373:

Attachment: (was: HIVE-14373.patch)

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Abdullah Yousufi
> Attachments: HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14439) LlapTaskScheduler should try scheduling tasks when a node is disabled

2016-08-09 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414369#comment-15414369
 ] 

Siddharth Seth commented on HIVE-14439:
---

Thanks for the reviews. Committing.

> LlapTaskScheduler should try scheduling tasks when a node is disabled
> -
>
> Key: HIVE-14439
> URL: https://issues.apache.org/jira/browse/HIVE-14439
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14439.01.patch, HIVE-14439.02.patch, 
> HIVE-14439.03.patch
>
>
> When a node is disabled - try scheduling pending tasks. Tasks which may have 
> been waiting for the node to become available could become candidates for 
> scheduling on alternate nodes depending on the locality delay and disable 
> duration.
> This is what is causing an occasional timeout on 
> testDelayedLocalityNodeCommErrorImmediateAllocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414362#comment-15414362
 ] 

Sergey Shelukhin commented on HIVE-14035:
-

Few minor comments on RB. Was the last HiveQA run with split-update enabled? 
Actually, should we have it on by default?
cc [~gopalv]
Probably still needs review from someone familiar with ACID.

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3

2016-08-09 Thread Abdullah Yousufi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdullah Yousufi updated HIVE-14373:

Attachment: HIVE-14373.patch

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Abdullah Yousufi
> Attachments: HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14439) LlapTaskScheduler should try scheduling tasks when a node is disabled

2016-08-09 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414358#comment-15414358
 ] 

Prasanth Jayachandran commented on HIVE-14439:
--

New changes lgtm +1

> LlapTaskScheduler should try scheduling tasks when a node is disabled
> -
>
> Key: HIVE-14439
> URL: https://issues.apache.org/jira/browse/HIVE-14439
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14439.01.patch, HIVE-14439.02.patch, 
> HIVE-14439.03.patch
>
>
> When a node is disabled - try scheduling pending tasks. Tasks which may have 
> been waiting for the node to become available could become candidates for 
> scheduling on alternate nodes depending on the locality delay and disable 
> duration.
> This is what is causing an occasional timeout on 
> testDelayedLocalityNodeCommErrorImmediateAllocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14373) Add integration tests for hive on S3

2016-08-09 Thread Abdullah Yousufi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdullah Yousufi updated HIVE-14373:

Status: Patch Available  (was: Open)

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Abdullah Yousufi
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14460) AccumuloCliDriver migration to junit4

2016-08-09 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-14460:

Attachment: HIVE-14460.1.patch

update test to junit4 + migrate to HIVE-1 style

> AccumuloCliDriver migration to junit4
> -
>
> Key: HIVE-14460
> URL: https://issues.apache.org/jira/browse/HIVE-14460
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-14460.1.patch
>
>
> This test have been left behind in HIVE-1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13756) Map failure attempts to delete reducer _temporary directory on multi-query pig query

2016-08-09 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414342#comment-15414342
 ] 

Mithun Radhakrishnan commented on HIVE-13756:
-

+1. 

> Map failure attempts to delete reducer _temporary directory on multi-query 
> pig query
> 
>
> Key: HIVE-13756
> URL: https://issues.apache.org/jira/browse/HIVE-13756
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13756-branch-1.patch, HIVE-13756.1-branch-1.patch, 
> HIVE-13756.1.patch, HIVE-13756.patch
>
>
> A pig script, executed with multi-query enabled, that reads the source data 
> and writes it as-is into TABLE_A as well as performing a group-by operation 
> on the data which is written into TABLE_B can produce erroneous results if 
> any map fails. This results in a single MR job that writes the map output to 
> a scratch directory relative to TABLE_A and the reducer output to a scratch 
> directory relative to TABLE_B.
> If one or more maps fail it will delete the attempt data relative to TABLE_A, 
> but it also deletes the _temporary directory relative to TABLE_B. This has 
> the unintended side-effect of preventing subsequent maps from committing 
> their data. This means that any maps which successfully completed before the 
> first map failure will have its data committed as expected, other maps not, 
> resulting in an incomplete result set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14439) LlapTaskScheduler should try scheduling tasks when a node is disabled

2016-08-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414271#comment-15414271
 ] 

Hive QA commented on HIVE-14439:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12822655/HIVE-14439.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10441 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/832/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/832/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-832/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12822655 - PreCommit-HIVE-MASTER-Build

> LlapTaskScheduler should try scheduling tasks when a node is disabled
> -
>
> Key: HIVE-14439
> URL: https://issues.apache.org/jira/browse/HIVE-14439
> Project: Hive
>  Issue Type: Bug
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14439.01.patch, HIVE-14439.02.patch, 
> HIVE-14439.03.patch
>
>
> When a node is disabled - try scheduling pending tasks. Tasks which may have 
> been waiting for the node to become available could become candidates for 
> scheduling on alternate nodes depending on the locality delay and disable 
> duration.
> This is what is causing an occasional timeout on 
> testDelayedLocalityNodeCommErrorImmediateAllocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14415) Upgrade qtest execution framework to junit4 - TestPerfCliDriver

2016-08-09 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-14415:

Resolution: Resolved
Status: Resolved  (was: Patch Available)

i'm resolving this issue - because these changes are inside HIVE-1

> Upgrade qtest execution framework to junit4 - TestPerfCliDriver
> ---
>
> Key: HIVE-14415
> URL: https://issues.apache.org/jira/browse/HIVE-14415
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-14415.1.patch, HIVE-14415.2.patch
>
>
> I would like to upgrade the current maven+ant+velocimacro+junit4 qtest 
> generation framework to use only junit4 - while (trying) to keep 
> all the existing features it provides.
> What I can't really do with the current one: execute easily a single qtests 
> from an IDE (as a matter of fact I can...but it's way too complicated; after 
> this it won't be a cake-walk either...but it will be a step closer ;)
> I think this change will make it more clear how these tests are configured 
> and executed.
> I will do this in two phases, currently i will only change 
> {{TestPerfCliDriver}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-09 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414224#comment-15414224
 ] 

Saket Saurabh commented on HIVE-14233:
--

Thanks [~sershe] for pointing that out. Have attached the link to review board 
for this JIRA. https://reviews.apache.org/r/50934/

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-09 Thread Saket Saurabh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Attachment: HIVE-14233.06.patch

This patch disallows VectorizedRowBatchReader creation on original files.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14376) Schema evolution tests takes a long time

2016-08-09 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14376:
-
Issue Type: Sub-task  (was: Bug)
Parent: HIVE-13503

> Schema evolution tests takes a long time
> 
>
> Key: HIVE-14376
> URL: https://issues.apache.org/jira/browse/HIVE-14376
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0, 2.1.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
>
> schema_evol_* tests (14 tests) takes 47 mins in tez, 40mins in TestCliDriver. 
> Same set of tests are added to llap as well in HIVE-14355 which will some 
> more time. Most tests uses INSERT into table which can be slow and is not 
> needed. Even some individual tests takes about 5 mins to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-14376) Schema evolution tests takes a long time

2016-08-09 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-14376:


Assignee: Prasanth Jayachandran

> Schema evolution tests takes a long time
> 
>
> Key: HIVE-14376
> URL: https://issues.apache.org/jira/browse/HIVE-14376
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 2.2.0, 2.1.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
>
> schema_evol_* tests (14 tests) takes 47 mins in tez, 40mins in TestCliDriver. 
> Same set of tests are added to llap as well in HIVE-14355 which will some 
> more time. Most tests uses INSERT into table which can be slow and is not 
> needed. Even some individual tests takes about 5 mins to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14501) MiniTez test for union_type_chk.q is slow

2016-08-09 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414159#comment-15414159
 ] 

Prasanth Jayachandran edited comment on HIVE-14501 at 8/9/16 8:13 PM:
--

[~sseth] Can you please take a look? Runtime is now <3.5 mins for this test


was (Author: prasanth_j):
[~sseth] Can you please take a look? Runtime is now <3 mins for this test

> MiniTez test for union_type_chk.q is slow
> -
>
> Key: HIVE-14501
> URL: https://issues.apache.org/jira/browse/HIVE-14501
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14501.1.patch
>
>
> union_type_chk.q runs on minimr and minitez but the test itself explicitly 
> sets execution engine as mr. It takes around 10 mins to run this test. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14501) MiniTez test for union_type_chk.q is slow

2016-08-09 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14501:
-
Status: Patch Available  (was: Open)

> MiniTez test for union_type_chk.q is slow
> -
>
> Key: HIVE-14501
> URL: https://issues.apache.org/jira/browse/HIVE-14501
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14501.1.patch
>
>
> union_type_chk.q runs on minimr and minitez but the test itself explicitly 
> sets execution engine as mr. It takes around 10 mins to run this test. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14501) MiniTez test for union_type_chk.q is slow

2016-08-09 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14501:
-
Attachment: HIVE-14501.1.patch

[~sseth] Can you please take a look? Runtime is now <3 mins for this test

> MiniTez test for union_type_chk.q is slow
> -
>
> Key: HIVE-14501
> URL: https://issues.apache.org/jira/browse/HIVE-14501
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14501.1.patch
>
>
> union_type_chk.q runs on minimr and minitez but the test itself explicitly 
> sets execution engine as mr. It takes around 10 mins to run this test. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-09 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414156#comment-15414156
 ] 

Prasanth Jayachandran commented on HIVE-14502:
--

Sorry posted the patch to wrong jira.

> Convert MiniTez tests to MiniLlap tests
> ---
>
> Key: HIVE-14502
> URL: https://issues.apache.org/jira/browse/HIVE-14502
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
> than MiniTezCliDriver because of threaded executors and caching. 
> MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To 
> cut down this test time significantly it makes sense to move over mive tez 
> tests to mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-09 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14502:
-
Comment: was deleted

(was: Runtime now becomes <4 mins. [~sseth] Can you please review this patch?)

> Convert MiniTez tests to MiniLlap tests
> ---
>
> Key: HIVE-14502
> URL: https://issues.apache.org/jira/browse/HIVE-14502
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
> than MiniTezCliDriver because of threaded executors and caching. 
> MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To 
> cut down this test time significantly it makes sense to move over mive tez 
> tests to mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-09 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14502:
-
Comment: was deleted

(was: Sorry posted the patch to wrong jira.)

> Convert MiniTez tests to MiniLlap tests
> ---
>
> Key: HIVE-14502
> URL: https://issues.apache.org/jira/browse/HIVE-14502
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
> than MiniTezCliDriver because of threaded executors and caching. 
> MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To 
> cut down this test time significantly it makes sense to move over mive tez 
> tests to mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-09 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414147#comment-15414147
 ] 

Prasanth Jayachandran commented on HIVE-14502:
--

Not sure if this needs a full precommit test run. 

> Convert MiniTez tests to MiniLlap tests
> ---
>
> Key: HIVE-14502
> URL: https://issues.apache.org/jira/browse/HIVE-14502
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14501.1.patch
>
>
> Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
> than MiniTezCliDriver because of threaded executors and caching. 
> MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To 
> cut down this test time significantly it makes sense to move over mive tez 
> tests to mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-09 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14502:
-
Attachment: HIVE-14501.1.patch

Runtime now becomes <4 mins. [~sseth] Can you please review this patch?

> Convert MiniTez tests to MiniLlap tests
> ---
>
> Key: HIVE-14502
> URL: https://issues.apache.org/jira/browse/HIVE-14502
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14501.1.patch
>
>
> Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
> than MiniTezCliDriver because of threaded executors and caching. 
> MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To 
> cut down this test time significantly it makes sense to move over mive tez 
> tests to mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14270) Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-09 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414136#comment-15414136
 ] 

Sergio Peña commented on HIVE-14270:


[~ashutoshc] I attached a new paths (it is on RB too) that added a new flag 
variable to use the table blobstorage location as scratch directory in case the 
user does not want to use this improvement. 

You can take a look at the new changes at:
- Context.java
- SemanticAnalyzer.java
- GenMapRedUtils.java

> Write temporary data to HDFS when doing inserts on tables located on S3
> ---
>
> Key: HIVE-14270
> URL: https://issues.apache.org/jira/browse/HIVE-14270
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14270.1.patch, HIVE-14270.2.patch, 
> HIVE-14270.3.patch, HIVE-14270.4.patch, HIVE-14270.5.patch
>
>
> Currently, when doing INSERT statements on tables located at S3, Hive writes 
> and reads temporary (or intermediate) files to S3 as well. 
> If HDFS is still the default filesystem on Hive, then we can keep such 
> temporary files on HDFS to keep things run faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14270) Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-09 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-14270:
---
Attachment: HIVE-14270.5.patch

> Write temporary data to HDFS when doing inserts on tables located on S3
> ---
>
> Key: HIVE-14270
> URL: https://issues.apache.org/jira/browse/HIVE-14270
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14270.1.patch, HIVE-14270.2.patch, 
> HIVE-14270.3.patch, HIVE-14270.4.patch, HIVE-14270.5.patch
>
>
> Currently, when doing INSERT statements on tables located at S3, Hive writes 
> and reads temporary (or intermediate) files to S3 as well. 
> If HDFS is still the default filesystem on Hive, then we can keep such 
> temporary files on HDFS to keep things run faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7239) Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result when input backed by Sequence/RC files

2016-08-09 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7239:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Illya!

> Fix bug in HiveIndexedInputFormat implementation that causes incorrect query 
> result when input backed by Sequence/RC files
> --
>
> Key: HIVE-7239
> URL: https://issues.apache.org/jira/browse/HIVE-7239
> Project: Hive
>  Issue Type: Bug
>  Components: Indexing
>Affects Versions: 2.1.0
>Reporter: Sumit Kumar
>Assignee: Illya Yalovyy
> Fix For: 2.2.0
>
> Attachments: HIVE-7239.2.patch, HIVE-7239.3.patch, HIVE-7239.4.patch, 
> HIVE-7239.patch
>
>
> In case of sequence files, it's crucial that splits are calculated around the 
> boundaries enforced by the input sequence file. However by default hadoop 
> creates input splits depending on the configuration parameters which may not 
> match the boundaries for the input sequence file. Hive provides 
> HiveIndexedInputFormat that provides extra logic and recalculates the split 
> boundaries for each split depending on the sequence file's boundaries.
> However we noticed this behavior of "over" reporting from data backed by 
> sequence file. We've a sample data on which we experimented and fixed this 
> bug, we have verified this fix by comparing the query output for input being 
> sequence file format, rc file and regular format. However we have not able to 
> find the right place to include this as a unit test that would execute as 
> part of hive tests. We tried writing a "clientpositive" test as part of ql 
> module but the output seems quite verbose and i couldn't interpret it that 
> well. Can someone please review this change and guide on how to write a test 
> that will execute as part of Hive testing?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-09 Thread Saket Saurabh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414106#comment-15414106
 ] 

Saket Saurabh commented on HIVE-14448:
--

While I was investigating test failures for some other scenarios, I realized 
that schema evolution also breaks when ETL strategy is chosen, which I believe 
might be related to this JIRA.  I think it would be good to add another test 
that evolves the schema, as [~prasanth_j] suggested.

Even with the current patch, the following schema evolution test still fails 
with IndexOutOfBoundsException:

{code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
@Test
  public void testAcidWithSchemaEvolution() throws Exception {
hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
String tblName = "acidTblWithSchemaEvol";
runStatementOnDriver("drop table if exists " + tblName);
runStatementOnDriver("CREATE TABLE " + tblName + "(a INT, b STRING) " +
  " CLUSTERED BY(a) INTO 2 BUCKETS" + //currently ACID requires table to be 
bucketed
  " STORED AS ORC TBLPROPERTIES ('transactional'='true')");

runStatementOnDriver("INSERT INTO " + tblName + " VALUES (1, 'foo'), (2, 
'bar')");

// Major compact to create a base that has ACID schema.
runStatementOnDriver("ALTER TABLE " + tblName + " COMPACT 'MAJOR'");
runWorker(hiveConf);

// Alter table for perform schema evolution.
runStatementOnDriver("ALTER TABLE " + tblName + " ADD COLUMNS(c int)");

// Validate there is an added NULL for column c.
List rs = runStatementOnDriver("SELECT * FROM " + tblName + " ORDER 
BY a");
String[] expectedResult = { "1\tfoo\tNULL", "2\tbar\tNULL" };
Assert.assertEquals(Arrays.asList(expectedResult), rs);
  }
{code}

Here is the back-trace for the failed test:
{code}
exec.Task: Job Submission failed with exception 'java.lang.RuntimeException(ORC 
split generation failed with exception: java.lang.IndexOutOfBoundsException: 
Index: 9, Size: 9)'
java.lang.RuntimeException: ORC split generation failed with exception: 
java.lang.IndexOutOfBoundsException: Index: 9, Size: 9
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1576)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1662)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
at 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1983)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1674)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1410)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1134)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1122)
at 
org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1315)
at 
org.apache.hadoop.hive.ql.TestTxnCommands2.testAcidWithSchemaEvolution(TestTxnCommands2.java:177)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

[jira] [Commented] (HIVE-12181) Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver

2016-08-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414083#comment-15414083
 ] 

Hive QA commented on HIVE-12181:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12822672/HIVE-12181.13.patch

{color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10441 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_smb_main
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mapjoin_mapjoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dynpart_hashjoin_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_mapjoin
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/831/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/831/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-831/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12822672 - PreCommit-HIVE-MASTER-Build

> Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver
> ---
>
> Key: HIVE-12181
> URL: https://issues.apache.org/jira/browse/HIVE-12181
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12181.1.patch, HIVE-12181.10.patch, 
> HIVE-12181.12.patch, HIVE-12181.13.patch, HIVE-12181.2.patch, 
> HIVE-12181.3.patch, HIVE-12181.4.patch, HIVE-12181.7.patch, 
> HIVE-12181.8.patch, HIVE-12181.9.patch, HIVE-12181.patch, HIVE-12181.patch
>
>
> There was a performance concern earlier, but HIVE-7587 has fixed that. We can 
> change the default to true now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-14474) Create datasource in Druid from Hive

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-14474:
--

Assignee: Jesus Camacho Rodriguez

> Create datasource in Druid from Hive
> 
>
> Key: HIVE-14474
> URL: https://issues.apache.org/jira/browse/HIVE-14474
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> We need to implement a DruidOutputFormat that can create Druid segments from 
> the output of the Hive query and store them directly in Druid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14499) Add HMS metrics for materialized views

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14499:
---
Description: 
As in HIVE-10761/HIVE-12499.

We should be able to show some metrics related to materialized views, such as 
the number of materialized views, size of the materialized views, number of 
accesses, etc.

  was:
Related to HIVE-10761/HIVE-12499.

We should be able to show some metrics related to materialized views, such as 
the number of materialized views, size of the materialized views, number of 
accesses, etc.


> Add HMS metrics for materialized views
> --
>
> Key: HIVE-14499
> URL: https://issues.apache.org/jira/browse/HIVE-14499
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>
> As in HIVE-10761/HIVE-12499.
> We should be able to show some metrics related to materialized views, such as 
> the number of materialized views, size of the materialized views, number of 
> accesses, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14499) Add HMS metrics for materialized views

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14499:
---
Description: 
Related to HIVE-10761/HIVE-12499.

We should be able to show some metrics related to materialized views, such as 
the number of materialized views, size of the materialized views, number of 
accesses, etc.

  was:
Related to HIVE-10761.

We should be able to show some metrics related to materialized views, such as 
the number of materialized views, size of the materialized views, number of 
accesses, etc.


> Add HMS metrics for materialized views
> --
>
> Key: HIVE-14499
> URL: https://issues.apache.org/jira/browse/HIVE-14499
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>
> Related to HIVE-10761/HIVE-12499.
> We should be able to show some metrics related to materialized views, such as 
> the number of materialized views, size of the materialized views, number of 
> accesses, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13936) Add streaming support for row_number

2016-08-09 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414030#comment-15414030
 ] 

Yongzhi Chen commented on HIVE-13936:
-

The failures are not related.

> Add streaming support for row_number
> 
>
> Key: HIVE-13936
> URL: https://issues.apache.org/jira/browse/HIVE-13936
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Johndee Burks
>Assignee: Yongzhi Chen
> Attachments: HIVE-13936.1.patch
>
>
> Without this support row_number will cause heap issues in reducers. Example 
> query below against 10 million records will cause failure. 
> {code}
> select a, row_number() over (partition by a order by a desc) as row_num from 
> j100mil;
> {code}
> Same issue different function in JIRA HIVE-7062



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14498) Freshness period for query rewriting using materialized views

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14498:
---
Summary: Freshness period for query rewriting using materialized views  
(was: Timeout for query rewriting using materialized views)

> Freshness period for query rewriting using materialized views
> -
>
> Key: HIVE-14498
> URL: https://issues.apache.org/jira/browse/HIVE-14498
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>
> Once we have query rewriting in place (HIVE-14496), one of the main issues is 
> data freshness in the materialized views.
> Since we will not support view maintenance at first, we could include a 
> HiveConf property to configure a max freshness period (_n timeunits_). If a 
> query comes, and the materialized view has been populated (by create, 
> refresh, etc.) for a longer period than _n_, then we should not use it for 
> rewriting the query.
> Optionally, we could print a warning for the user indicating that the 
> materialized was not used because it was not fresh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14457) Partitions in encryption zone are still trashed though an exception is returned

2016-08-09 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414018#comment-15414018
 ] 

Yongzhi Chen commented on HIVE-14457:
-

Yes, it should check first before the drop operation. LGTM  +1

> Partitions in encryption zone are still trashed though an exception is 
> returned
> ---
>
> Key: HIVE-14457
> URL: https://issues.apache.org/jira/browse/HIVE-14457
> Project: Hive
>  Issue Type: Bug
>  Components: Encryption, Metastore
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-14457.patch
>
>
> drop_partition_common in HiveMetaStore still drops partitions in encryption 
> zone without PURGE even through it returns an exception. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-12884) NullPointerException in HiveParser.regularBody()

2016-08-09 Thread Gabor Szadovszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Szadovszky resolved HIVE-12884.
-
Resolution: Cannot Reproduce

Closed as I was not able to reproduce. Feel free to reopen if reproduced or 
additional information is available.

> NullPointerException in HiveParser.regularBody()
> 
>
> Key: HIVE-12884
> URL: https://issues.apache.org/jira/browse/HIVE-12884
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.1.1
>Reporter: Bohumir Zamecnik
>Assignee: Gabor Szadovszky
> Attachments: HIVE-12884.q
>
>
> When I make a query like the following in Hive CLI I get a 
> NullPointerException in HiveParser.regularBody().
> {code}
> create table some_table
> (
> day_timestamp bigint,
> guid_count bigint
> )
> row format delimited fields terminated by ',' stored as textfile;
> SET hive.merge.mapredfiles=true;
> SET mapreduce.input.fileinputformat.split.maxsize=5368709120;
> SET hivevar:tz_offset=8;
> SET hivevar:day_in_millis=8640;
> SET hivevar:year=2015;
> SET hivevar:month=02;
> SET hivevar:next_month=03;
> insert into table some_table
> select
>   day_timestamp
>   count(*) as guid_count
> from (
>   select distinct
> guid,
> floor((`timestamp` / ${day_in_millis}) - ${tz_offset}) * ${day_in_millis} 
> as day_timestamp,
>   from source_table
>   where year = ${year} and ((month = ${month}) or ((month = ${next_month}) 
> and (day = '01')))
> ) guids
> group by day_timestamp;
> {code}
> /tmp/username/hive.log:
> {code}
> 2016-01-18 10:05:40,505 ERROR [main]: ql.Driver 
> (SessionState.java:printError(861)) - FAILED: NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:40975)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:40183)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:40059)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1519)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1057)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}
> Hive 1.1.1 compiled from source with checksum 
> c2d70ca009729fb13c073d599b4e5193.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14479) Add some join tests for acid table

2016-08-09 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-14479:
-
Attachment: HIVE-14479.2.patch

I move the part for testing compaction into a JUnit test

> Add some join tests for acid table
> --
>
> Key: HIVE-14479
> URL: https://issues.apache.org/jira/browse/HIVE-14479
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-14479.1.patch, HIVE-14479.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14453) refactor physical writing of ORC data and metadata to FS from the logical writers

2016-08-09 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14453:

Attachment: HIVE-14453.01.patch

Updated.

> refactor physical writing of ORC data and metadata to FS from the logical 
> writers
> -
>
> Key: HIVE-14453
> URL: https://issues.apache.org/jira/browse/HIVE-14453
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14453.01.patch, HIVE-14453.patch
>
>
> ORC data doesn't have to go directly into an HDFS stream via buffers, it can 
> go somewhere else (e.g. a write-thru cache, or an addressable system that 
> doesn't require the stream blocks to be held in memory before writing them 
> all together).
> To that effect, it would be nice to abstract the data block/metadata 
> structure creating from the physical file concerns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14479) Add some join tests for acid table

2016-08-09 Thread Wei Zheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-14479:
-
Status: Patch Available  (was: Open)

> Add some join tests for acid table
> --
>
> Key: HIVE-14479
> URL: https://issues.apache.org/jira/browse/HIVE-14479
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-14479.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-09 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14448:

Status: Patch Available  (was: Open)

Forgot to submit patch... grr

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>

[jira] [Updated] (HIVE-14495) Add SHOW MATERIALIZED VIEWS statement

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14495:
---
Description: 
In the spirit of {{SHOW TABLES}}, we should support the following statement:

{code:sql}
SHOW MATERIALIZED VIEWS [IN database_name] ['identifier_with_wildcards'];
{code}

In contrast to {{SHOW TABLES}}, this command would only list the materialized 
views.

  was:
In the spirit of {{SHOW TABLES}}, we should support the following statement:

{code:sql}
SHOW MATERIALIZED VIEWS [IN database_name] ['identifier_with_wildcards'];
{sql}

In contrast to {{SHOW TABLES}}, this command would only list the materialized 
views.


> Add SHOW MATERIALIZED VIEWS statement
> -
>
> Key: HIVE-14495
> URL: https://issues.apache.org/jira/browse/HIVE-14495
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>
> In the spirit of {{SHOW TABLES}}, we should support the following statement:
> {code:sql}
> SHOW MATERIALIZED VIEWS [IN database_name] ['identifier_with_wildcards'];
> {code}
> In contrast to {{SHOW TABLES}}, this command would only list the materialized 
> views.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14494) Add support for BUILD DEFERRED

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14494:
---
Description: 
This is an important feature, as it allows to declare materialized views but do 
not materialize them till they are used for the first use, or a REBUILD 
statement is executed. The extension for the CREATE MATERIALIZED VIEW syntax 
should be as follows:

{code:sql}
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_view_name
  [BUILD DEFERRED] -- NEW!
  [COMMENT materialized_view_comment]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
 | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]
  AS select_statement;
{code}

  was:
This is an important feature, as it allows to declare materialized views but do 
not materialize them till they are used for the first use, or a REBUILD 
statement is executed. The extension for the CREATE MATERIALIZED VIEW syntax 
should be as follows:

{code:sql}
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]table_name
  [BUILD DEFERRED] -- NEW!
  [COMMENT table_comment]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
 | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]
  AS select_statement;
{code}


> Add support for BUILD DEFERRED
> --
>
> Key: HIVE-14494
> URL: https://issues.apache.org/jira/browse/HIVE-14494
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>
> This is an important feature, as it allows to declare materialized views but 
> do not materialize them till they are used for the first use, or a REBUILD 
> statement is executed. The extension for the CREATE MATERIALIZED VIEW syntax 
> should be as follows:
> {code:sql}
> CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_view_name
>   [BUILD DEFERRED] -- NEW!
>   [COMMENT materialized_view_comment]
>   [
>[ROW FORMAT row_format] 
>[STORED AS file_format]
>  | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
>   ]
>   [LOCATION hdfs_path]
>   [TBLPROPERTIES (property_name=property_value, ...)]
>   AS select_statement;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14493) Partitioning support for materialized views

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14493:
---
Description: 
We should support defining a partitioning specification for materialized views 
and that the results of the materialized view evaluation are stored meeting the 
partitioning spec. 

The syntax should be extended as follows:

{code:sql}
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_view_name
  [COMMENT materialized_view_comment]
  [PARTITIONED ON (col_name, ...)] -- NEW!
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
 | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]
  AS select_statement;
{code}

  was:
We should support defining a partitioning specification for materialized views 
and that the results of the materialized view evaluation are stored meeting the 
partitioning spec. 

The syntax should be extended as follows:

{code:sql}
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]table_name
  [COMMENT table_comment]
  [PARTITIONED ON (col_name, ...)] -- NEW!
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
 | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]
  AS select_statement;
{code}


> Partitioning support for materialized views
> ---
>
> Key: HIVE-14493
> URL: https://issues.apache.org/jira/browse/HIVE-14493
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>
> We should support defining a partitioning specification for materialized 
> views and that the results of the materialized view evaluation are stored 
> meeting the partitioning spec. 
> The syntax should be extended as follows:
> {code:sql}
> CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_view_name
>   [COMMENT materialized_view_comment]
>   [PARTITIONED ON (col_name, ...)] -- NEW!
>   [
>[ROW FORMAT row_format] 
>[STORED AS file_format]
>  | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
>   ]
>   [LOCATION hdfs_path]
>   [TBLPROPERTIES (property_name=property_value, ...)]
>   AS select_statement;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14486) Add CREATE MATERIALIZED VIEW statement

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14486:
---
Description: 
Support for creating materialized views. The statement is the following:

{code:sql}
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_view_name
  [COMMENT materialized_view_comment]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
 | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]
  AS select_statement;
{code}

Thus, important features such as support for custom StorageHandler and location 
will be initially included.

  was:
Support for creating materialized views. The statement is the following:

{code:sql}
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_name
  [COMMENT materialized_comment]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
 | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]
  AS select_statement;
{code}

Thus, important features such as support for custom StorageHandler and location 
will be initially included.


> Add CREATE MATERIALIZED VIEW statement
> --
>
> Key: HIVE-14486
> URL: https://issues.apache.org/jira/browse/HIVE-14486
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Support for creating materialized views. The statement is the following:
> {code:sql}
> CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_view_name
>   [COMMENT materialized_view_comment]
>   [
>[ROW FORMAT row_format] 
>[STORED AS file_format]
>  | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
>   ]
>   [LOCATION hdfs_path]
>   [TBLPROPERTIES (property_name=property_value, ...)]
>   AS select_statement;
> {code}
> Thus, important features such as support for custom StorageHandler and 
> location will be initially included.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14487) Add REBUILD statement for materialized views

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14487:
---
Description: 
Support for rebuilding existing materialized views. The statement is the 
following:

{code:sql}
ALTER MATERIALIZED VIEW [db_name.]materialized_view_name REBUILD;
{code}

  was:
Support for rebuilding existing materialized views. The statement is the 
following:

{code:sql}
ALTER MATERIALIZED VIEW [db_name.]table_name REBUILD;
{code}


> Add REBUILD statement for materialized views
> 
>
> Key: HIVE-14487
> URL: https://issues.apache.org/jira/browse/HIVE-14487
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Alan Gates
>
> Support for rebuilding existing materialized views. The statement is the 
> following:
> {code:sql}
> ALTER MATERIALIZED VIEW [db_name.]materialized_view_name REBUILD;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14488) Add DROP MATERIALIZED VIEW statement

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14488:
---
Description: 
Support for dropping existing materialized views. The statement is the 
following:

{code:sql}
DROP MATERIALIZED VIEW [db_name.]materialized_view_name;
{code}

  was:
Support for dropping existing materialized views. The statement is the 
following:

{code:sql}
DROP MATERIALIZED VIEW [db_name.]table_name;
{code}


> Add DROP MATERIALIZED VIEW statement
> 
>
> Key: HIVE-14488
> URL: https://issues.apache.org/jira/browse/HIVE-14488
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Support for dropping existing materialized views. The statement is the 
> following:
> {code:sql}
> DROP MATERIALIZED VIEW [db_name.]materialized_view_name;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14486) Add CREATE MATERIALIZED VIEW statement

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14486:
---
Description: 
Support for creating materialized views. The statement is the following:

{code:sql}
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_name
  [COMMENT materialized_comment]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
 | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]
  AS select_statement;
{code}

Thus, important features such as support for custom StorageHandler and location 
will be initially included.

  was:
Support for creating materialized views. The statement is the following:

{code:sql}
CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]table_name
  [COMMENT table_comment]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
 | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]
  AS select_statement;
{code}

Thus, important features such as support for custom StorageHandler and location 
will be initially included.


> Add CREATE MATERIALIZED VIEW statement
> --
>
> Key: HIVE-14486
> URL: https://issues.apache.org/jira/browse/HIVE-14486
> Project: Hive
>  Issue Type: Sub-task
>  Components: Materialized views
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Support for creating materialized views. The statement is the following:
> {code:sql}
> CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_name
>   [COMMENT materialized_comment]
>   [
>[ROW FORMAT row_format] 
>[STORED AS file_format]
>  | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
>   ]
>   [LOCATION hdfs_path]
>   [TBLPROPERTIES (property_name=property_value, ...)]
>   AS select_statement;
> {code}
> Thus, important features such as support for custom StorageHandler and 
> location will be initially included.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13403) Make Streaming API not create empty buckets (at least as an option)

2016-08-09 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-13403:
--
Assignee: Wei Zheng  (was: Eugene Koifman)

> Make Streaming API not create empty buckets (at least as an option)
> ---
>
> Key: HIVE-13403
> URL: https://issues.apache.org/jira/browse/HIVE-13403
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
>Priority: Critical
>
> as of HIVE-11983, when a TransactionBatch is opened in StreamingAPI, a full 
> compliment of bucket files (AbstractRecordWriter.createRecordUpdaters()) is 
> created on disk even though some may end up receiving no data.
> It would be better to create them on demand and not clog the FS.
> Tez can handle missing (empty) buckets and on MR bucket join algorithms will 
> check if all buckets are there and bail out if not.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14405) Have tests log to the console along with hive.log

2016-08-09 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413880#comment-15413880
 ] 

Siddharth Seth commented on HIVE-14405:
---

Thanks for taking a look [~ashutoshc]. My concern is that this doubles the 
amount of logging. I'll take a look to see if we can disable DEBUG level 
logging for some of the noisy Hadoop components to cut the overall log size.

> Have tests log to the console along with hive.log
> -
>
> Key: HIVE-14405
> URL: https://issues.apache.org/jira/browse/HIVE-14405
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14405.01.patch
>
>
> When running tests from the IDE (not itests), logs end up going to hive.log - 
> making it difficult to debug tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13936) Add streaming support for row_number

2016-08-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-13936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413856#comment-15413856
 ] 

Hive QA commented on HIVE-13936:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12822810/HIVE-13936.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10441 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/830/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/830/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-830/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12822810 - PreCommit-HIVE-MASTER-Build

> Add streaming support for row_number
> 
>
> Key: HIVE-13936
> URL: https://issues.apache.org/jira/browse/HIVE-13936
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Johndee Burks
>Assignee: Yongzhi Chen
> Attachments: HIVE-13936.1.patch
>
>
> Without this support row_number will cause heap issues in reducers. Example 
> query below against 10 million records will cause failure. 
> {code}
> select a, row_number() over (partition by a order by a desc) as row_num from 
> j100mil;
> {code}
> Same issue different function in JIRA HIVE-7062



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7239) Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result when input backed by Sequence/RC files

2016-08-09 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413777#comment-15413777
 ] 

Ashutosh Chauhan commented on HIVE-7239:


+1

> Fix bug in HiveIndexedInputFormat implementation that causes incorrect query 
> result when input backed by Sequence/RC files
> --
>
> Key: HIVE-7239
> URL: https://issues.apache.org/jira/browse/HIVE-7239
> Project: Hive
>  Issue Type: Bug
>  Components: Indexing
>Affects Versions: 2.1.0
>Reporter: Sumit Kumar
>Assignee: Illya Yalovyy
> Attachments: HIVE-7239.2.patch, HIVE-7239.3.patch, HIVE-7239.4.patch, 
> HIVE-7239.patch
>
>
> In case of sequence files, it's crucial that splits are calculated around the 
> boundaries enforced by the input sequence file. However by default hadoop 
> creates input splits depending on the configuration parameters which may not 
> match the boundaries for the input sequence file. Hive provides 
> HiveIndexedInputFormat that provides extra logic and recalculates the split 
> boundaries for each split depending on the sequence file's boundaries.
> However we noticed this behavior of "over" reporting from data backed by 
> sequence file. We've a sample data on which we experimented and fixed this 
> bug, we have verified this fix by comparing the query output for input being 
> sequence file format, rc file and regular format. However we have not able to 
> find the right place to include this as a unit test that would execute as 
> part of hive tests. We tried writing a "clientpositive" test as part of ql 
> module but the output seems quite verbose and i couldn't interpret it that 
> well. Can someone please review this change and guide on how to write a test 
> that will execute as part of Hive testing?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14413) Extend HivePreFilteringRule to traverse inside elements of DNF/CNF and extract more deterministic pieces out

2016-08-09 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413769#comment-15413769
 ] 

Ashutosh Chauhan commented on HIVE-14413:
-

Sounds fine to me. We can take up reordering of rule-set in a follow-up.

> Extend HivePreFilteringRule to traverse inside elements of DNF/CNF and 
> extract more deterministic pieces out
> 
>
> Key: HIVE-14413
> URL: https://issues.apache.org/jira/browse/HIVE-14413
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14413.01.patch, HIVE-14413.02.patch, 
> HIVE-14413.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14249) Add simple materialized views with manual rebuilds

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14249:
---
Component/s: (was: Views)
 Materialized views

> Add simple materialized views with manual rebuilds
> --
>
> Key: HIVE-14249
> URL: https://issues.apache.org/jira/browse/HIVE-14249
> Project: Hive
>  Issue Type: New Feature
>  Components: Materialized views, Parser
>Reporter: Alan Gates
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-10459.2.patch, HIVE-14249.03.patch
>
>
> This patch is a start at implementing simple views. It doesn't have enough 
> testing yet (e.g. there's no negative testing). And I know it fails in the 
> partitioned case. I suspect things like security and locking don't work 
> properly yet either. But I'm posting it as a starting point.
> In this initial patch I'm just handling simple materialized views with manual 
> rebuilds. In later JIRAs we can add features such as allowing the optimizer 
> to rewrite queries to use materialized views rather than tables named in the 
> queries, giving the optimizer the ability to determine when a materialized 
> view is stale, etc.
> Also, I didn't rebase this patch against trunk after the migration from 
> svn->git so it may not apply cleanly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14249) Add simple materialized views with manual rebuilds

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-14249:
---
Issue Type: New Feature  (was: Sub-task)
Parent: (was: HIVE-10459)

> Add simple materialized views with manual rebuilds
> --
>
> Key: HIVE-14249
> URL: https://issues.apache.org/jira/browse/HIVE-14249
> Project: Hive
>  Issue Type: New Feature
>  Components: Parser, Views
>Reporter: Alan Gates
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-10459.2.patch, HIVE-14249.03.patch
>
>
> This patch is a start at implementing simple views. It doesn't have enough 
> testing yet (e.g. there's no negative testing). And I know it fails in the 
> partitioned case. I suspect things like security and locking don't work 
> properly yet either. But I'm posting it as a starting point.
> In this initial patch I'm just handling simple materialized views with manual 
> rebuilds. In later JIRAs we can add features such as allowing the optimizer 
> to rewrite queries to use materialized views rather than tables named in the 
> queries, giving the optimizer the ability to determine when a materialized 
> view is stale, etc.
> Also, I didn't rebase this patch against trunk after the migration from 
> svn->git so it may not apply cleanly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14436) Hive 1.2.1/Hitting "ql.Driver: FAILED: IllegalArgumentException Error: , expected at the end of 'decimal(9'" after enabling hive.optimize.skewjoin and with MR engine

2016-08-09 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413746#comment-15413746
 ] 

Ashutosh Chauhan commented on HIVE-14436:
-

+1

> Hive 1.2.1/Hitting "ql.Driver: FAILED: IllegalArgumentException Error: , 
> expected at the end of 'decimal(9'" after enabling hive.optimize.skewjoin and 
> with MR engine
> -
>
> Key: HIVE-14436
> URL: https://issues.apache.org/jira/browse/HIVE-14436
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
> Environment: HDP 2.4.2/ Hive 1.2.1
>Reporter: Ratish Maruthiyodan
>Assignee: Daniel Dai
>  Labels: code
> Attachments: HIVE-14436.1.patch, HIVE-14436.2.patch
>
>
> PROBLEM:
> The following Query run with MapReduce engine with "hive.optimize.skewjoin = 
> true" fails with error:
> "FAILED: IllegalArgumentException Error: , expected at the end of 
> 'decimal(9'" 
> > SELECT a.col1 FROM db.tableA a  INNER JOIN  db.tableB b  ON b.key=a.key 
> > limit 5;
> FAILED: IllegalArgumentException Error: , expected at the end of 'decimal(9'
> 16/08/04 12:47:50 [main]: ERROR ql.Driver: FAILED: IllegalArgumentException 
> Error: , expected at the end of 'decimal(9'
> java.lang.IllegalArgumentException: Error: , expected at the end of 
> 'decimal(9'
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:336)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseParams(TypeInfoUtils.java:378)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parsePrimitiveParts(TypeInfoUtils.java:518)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.parsePrimitiveParts(TypeInfoUtils.java:533)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.createPrimitiveTypeInfo(TypeInfoFactory.java:136)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.getPrimitiveTypeInfo(TypeInfoFactory.java:109)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.GenMRSkewJoinProcessor.processSkewJoin(GenMRSkewJoinProcessor.java:214)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.SkewJoinProcFactory$SkewJoinJoinProcessor.process(SkewJoinProcFactory.java:60)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.SkewJoinResolver$SkewJoinTaskDispatcher.dispatch(SkewJoinResolver.java:100)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.SkewJoinResolver.resolve(SkewJoinResolver.java:55)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:107)
>   at 
> org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:270)
>   at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10219)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:211)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:459)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:316)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1189)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1116)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:216)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:168)
>   at

[jira] [Commented] (HIVE-14436) Hive 1.2.1/Hitting "ql.Driver: FAILED: IllegalArgumentException Error: , expected at the end of 'decimal(9'" after enabling hive.optimize.skewjoin and with MR engine

2016-08-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413658#comment-15413658
 ] 

Hive QA commented on HIVE-14436:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12822740/HIVE-14436.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10427 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-acid_vectorization_missing_cols.q-load_dyn_part2.q-update_all_partitioned.q-and-12-more
 - did not produce a TEST-*.xml file
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/829/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/829/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-829/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12822740 - PreCommit-HIVE-MASTER-Build

> Hive 1.2.1/Hitting "ql.Driver: FAILED: IllegalArgumentException Error: , 
> expected at the end of 'decimal(9'" after enabling hive.optimize.skewjoin and 
> with MR engine
> -
>
> Key: HIVE-14436
> URL: https://issues.apache.org/jira/browse/HIVE-14436
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
> Environment: HDP 2.4.2/ Hive 1.2.1
>Reporter: Ratish Maruthiyodan
>Assignee: Daniel Dai
>  Labels: code
> Attachments: HIVE-14436.1.patch, HIVE-14436.2.patch
>
>
> PROBLEM:
> The following Query run with MapReduce engine with "hive.optimize.skewjoin = 
> true" fails with error:
> "FAILED: IllegalArgumentException Error: , expected at the end of 
> 'decimal(9'" 
> > SELECT a.col1 FROM db.tableA a  INNER JOIN  db.tableB b  ON b.key=a.key 
> > limit 5;
> FAILED: IllegalArgumentException Error: , expected at the end of 'decimal(9'
> 16/08/04 12:47:50 [main]: ERROR ql.Driver: FAILED: IllegalArgumentException 
> Error: , expected at the end of 'decimal(9'
> java.lang.IllegalArgumentException: Error: , expected at the end of 
> 'decimal(9'
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:336)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseParams(TypeInfoUtils.java:378)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parsePrimitiveParts(TypeInfoUtils.java:518)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.parsePrimitiveParts(TypeInfoUtils.java:533)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.createPrimitiveTypeInfo(TypeInfoFactory.java:136)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.getPrimitiveTypeInfo(TypeInfoFactory.java:109)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.GenMRSkewJoinProcessor.processSkewJoin(GenMRSkewJoinProcessor.java:214)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.SkewJoinProcFactory$SkewJoinJoinProcessor.process(SkewJoinProcFactory.java:60)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.SkewJoinResolver$SkewJoinTaskDispatcher.dispatch(SkewJoinResolver.java:100)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133)
>   at 
>

[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-09 Thread Sergey Zadoroshnyak (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413659#comment-15413659
 ] 

Sergey Zadoroshnyak commented on HIVE-14483:


[~owen.omalley]

Could you please take a look?

>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Owen O'Malley
>Priority: Critical
> Fix For: 2.2.0
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14413) Extend HivePreFilteringRule to traverse inside elements of DNF/CNF and extract more deterministic pieces out

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413654#comment-15413654
 ] 

Jesus Camacho Rodriguez commented on HIVE-14413:


I need to check further, we end up in a loop because I moved 
HivePreFilteringRule to the PPD block to cover more cases, and it interacts 
with other rules.

I've hit similar issues before; currently we might not be able to cover those 
additional cases without running the rules in different blocks (as we are 
currently doing), but executing those blocks multiple times (e.g. 
PreFil-PPD-PreFil-PPD). However, this comes with an planning time overhead. 
Given that we are already trying to minimize optimization time as it might 
become an issue with LLAP, we need to think carefully whether this is worth or 
not.

Thus, I think I will rebase the patch to cover only the DNF/CNF extension in 
this one, still executing HivePreFilteringRule only once before PPD, as we were 
doing before. Then, we might create a follow-up issue to discuss about merging 
HivePreFilteringRule with PPD block.

> Extend HivePreFilteringRule to traverse inside elements of DNF/CNF and 
> extract more deterministic pieces out
> 
>
> Key: HIVE-14413
> URL: https://issues.apache.org/jira/browse/HIVE-14413
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14413.01.patch, HIVE-14413.02.patch, 
> HIVE-14413.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13936) Add streaming support for row_number

2016-08-09 Thread Yongzhi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-13936:

Status: Patch Available  (was: Open)

Need code review.

> Add streaming support for row_number
> 
>
> Key: HIVE-13936
> URL: https://issues.apache.org/jira/browse/HIVE-13936
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Johndee Burks
>Assignee: Yongzhi Chen
> Attachments: HIVE-13936.1.patch
>
>
> Without this support row_number will cause heap issues in reducers. Example 
> query below against 10 million records will cause failure. 
> {code}
> select a, row_number() over (partition by a order by a desc) as row_num from 
> j100mil;
> {code}
> Same issue different function in JIRA HIVE-7062



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-13936) Add streaming support for row_number

2016-08-09 Thread Yongzhi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-13936:

Attachment: HIVE-13936.1.patch

Implement streaming for row_number in the similar way as other windowing 
function in HIVE-7062

> Add streaming support for row_number
> 
>
> Key: HIVE-13936
> URL: https://issues.apache.org/jira/browse/HIVE-13936
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Johndee Burks
>Assignee: Yongzhi Chen
> Attachments: HIVE-13936.1.patch
>
>
> Without this support row_number will cause heap issues in reducers. Example 
> query below against 10 million records will cause failure. 
> {code}
> select a, row_number() over (partition by a order by a desc) as row_num from 
> j100mil;
> {code}
> Same issue different function in JIRA HIVE-7062



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12181) Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver

2016-08-09 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413627#comment-15413627
 ] 

Jesus Camacho Rodriguez commented on HIVE-12181:


Changes in latest patch LGTM, +1

> Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver
> ---
>
> Key: HIVE-12181
> URL: https://issues.apache.org/jira/browse/HIVE-12181
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12181.1.patch, HIVE-12181.10.patch, 
> HIVE-12181.12.patch, HIVE-12181.13.patch, HIVE-12181.2.patch, 
> HIVE-12181.3.patch, HIVE-12181.4.patch, HIVE-12181.7.patch, 
> HIVE-12181.8.patch, HIVE-12181.9.patch, HIVE-12181.patch, HIVE-12181.patch
>
>
> There was a performance concern earlier, but HIVE-7587 has fixed that. We can 
> change the default to true now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12159) Create vectorized readers for the complex types

2016-08-09 Thread Sergey Zadoroshnyak (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413607#comment-15413607
 ] 

Sergey Zadoroshnyak commented on HIVE-12159:


This patch causes java.lang.ArrayIndexOutOfBoundsException 
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays.
 Tracking at https://issues.apache.org/jira/browse/HIVE-14483

> Create vectorized readers for the complex types
> ---
>
> Key: HIVE-12159
> URL: https://issues.apache.org/jira/browse/HIVE-12159
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.1.0
>
> Attachments: HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, 
> HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, HIVE-12159.patch, 
> HIVE-12159.patch, HIVE-12159.patch
>
>
> We need vectorized readers for the complex types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14481) Remove the comments from the query

2016-08-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413457#comment-15413457
 ] 

Hive QA commented on HIVE-14481:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12822731/HIVE-14481.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10442 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testDelayedLocalityNodeCommErrorImmediateAllocation
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/828/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/828/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-828/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12822731 - PreCommit-HIVE-MASTER-Build

> Remove the comments from the query
> --
>
> Key: HIVE-14481
> URL: https://issues.apache.org/jira/browse/HIVE-14481
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.1.0
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
> Attachments: HIVE-14481.1.patch
>
>
> The ability to delete a comment in CliDriver was created in the next ticket:
> HIVE-1926, HIVE-1953
> However, the following query will result in an error:
> {code}
> -- set abc=def;
> select -- comments;
>   -- comments;
>   replace('12345', '12', '--') -- comments;
> from
>   www_access
> limit 1;
> {code}
> It was to remove all of the comments in order to cope with this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-13936) Add streaming support for row_number

2016-08-09 Thread Yongzhi Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen reassigned HIVE-13936:
---

Assignee: Yongzhi Chen

> Add streaming support for row_number
> 
>
> Key: HIVE-13936
> URL: https://issues.apache.org/jira/browse/HIVE-13936
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Johndee Burks
>Assignee: Yongzhi Chen
>
> Without this support row_number will cause heap issues in reducers. Example 
> query below against 10 million records will cause failure. 
> {code}
> select a, row_number() over (partition by a order by a desc) as row_num from 
> j100mil;
> {code}
> Same issue different function in JIRA HIVE-7062



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14480) ORC ETLSplitStrategy should use thread pool when computing splits

2016-08-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413336#comment-15413336
 ] 

Hive QA commented on HIVE-14480:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12822716/HIVE-14480.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10441 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testForcedLocalityMultiplePreemptionsSameHost2
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/827/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/827/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-827/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12822716 - PreCommit-HIVE-MASTER-Build

> ORC ETLSplitStrategy should use thread pool when computing splits
> -
>
> Key: HIVE-14480
> URL: https://issues.apache.org/jira/browse/HIVE-14480
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14480.1.patch, HIVE-14480.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-09 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413127#comment-15413127
 ] 

Hive QA commented on HIVE-14035:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12822704/HIVE-14035.13.patch

{color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10466 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-script_pipe.q-orc_ppd_schema_evol_2a.q-join1.q-and-12-more 
- did not produce a TEST-*.xml file
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/826/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/826/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-826/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12822704 - PreCommit-HIVE-MASTER-Build

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14436) Hive 1.2.1/Hitting "ql.Driver: FAILED: IllegalArgumentException Error: , expected at the end of 'decimal(9'" after enabling hive.optimize.skewjoin and with MR engine

2016-08-09 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-14436:
--
Attachment: (was: HIVE-14436.2.patch)

> Hive 1.2.1/Hitting "ql.Driver: FAILED: IllegalArgumentException Error: , 
> expected at the end of 'decimal(9'" after enabling hive.optimize.skewjoin and 
> with MR engine
> -
>
> Key: HIVE-14436
> URL: https://issues.apache.org/jira/browse/HIVE-14436
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
> Environment: HDP 2.4.2/ Hive 1.2.1
>Reporter: Ratish Maruthiyodan
>Assignee: Daniel Dai
>  Labels: code
> Attachments: HIVE-14436.1.patch, HIVE-14436.2.patch
>
>
> PROBLEM:
> The following Query run with MapReduce engine with "hive.optimize.skewjoin = 
> true" fails with error:
> "FAILED: IllegalArgumentException Error: , expected at the end of 
> 'decimal(9'" 
> > SELECT a.col1 FROM db.tableA a  INNER JOIN  db.tableB b  ON b.key=a.key 
> > limit 5;
> FAILED: IllegalArgumentException Error: , expected at the end of 'decimal(9'
> 16/08/04 12:47:50 [main]: ERROR ql.Driver: FAILED: IllegalArgumentException 
> Error: , expected at the end of 'decimal(9'
> java.lang.IllegalArgumentException: Error: , expected at the end of 
> 'decimal(9'
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:336)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseParams(TypeInfoUtils.java:378)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parsePrimitiveParts(TypeInfoUtils.java:518)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.parsePrimitiveParts(TypeInfoUtils.java:533)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.createPrimitiveTypeInfo(TypeInfoFactory.java:136)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.getPrimitiveTypeInfo(TypeInfoFactory.java:109)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.GenMRSkewJoinProcessor.processSkewJoin(GenMRSkewJoinProcessor.java:214)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.SkewJoinProcFactory$SkewJoinJoinProcessor.process(SkewJoinProcFactory.java:60)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.SkewJoinResolver$SkewJoinTaskDispatcher.dispatch(SkewJoinResolver.java:100)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.SkewJoinResolver.resolve(SkewJoinResolver.java:55)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:107)
>   at 
> org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:270)
>   at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10219)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:211)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:459)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:316)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1189)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1116)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:216)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:168)
>   at

[jira] [Updated] (HIVE-14436) Hive 1.2.1/Hitting "ql.Driver: FAILED: IllegalArgumentException Error: , expected at the end of 'decimal(9'" after enabling hive.optimize.skewjoin and with MR engine

2016-08-09 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-14436:
--
Attachment: HIVE-14436.2.patch

> Hive 1.2.1/Hitting "ql.Driver: FAILED: IllegalArgumentException Error: , 
> expected at the end of 'decimal(9'" after enabling hive.optimize.skewjoin and 
> with MR engine
> -
>
> Key: HIVE-14436
> URL: https://issues.apache.org/jira/browse/HIVE-14436
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
> Environment: HDP 2.4.2/ Hive 1.2.1
>Reporter: Ratish Maruthiyodan
>Assignee: Daniel Dai
>  Labels: code
> Attachments: HIVE-14436.1.patch, HIVE-14436.2.patch
>
>
> PROBLEM:
> The following Query run with MapReduce engine with "hive.optimize.skewjoin = 
> true" fails with error:
> "FAILED: IllegalArgumentException Error: , expected at the end of 
> 'decimal(9'" 
> > SELECT a.col1 FROM db.tableA a  INNER JOIN  db.tableB b  ON b.key=a.key 
> > limit 5;
> FAILED: IllegalArgumentException Error: , expected at the end of 'decimal(9'
> 16/08/04 12:47:50 [main]: ERROR ql.Driver: FAILED: IllegalArgumentException 
> Error: , expected at the end of 'decimal(9'
> java.lang.IllegalArgumentException: Error: , expected at the end of 
> 'decimal(9'
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:336)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseParams(TypeInfoUtils.java:378)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parsePrimitiveParts(TypeInfoUtils.java:518)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.parsePrimitiveParts(TypeInfoUtils.java:533)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.createPrimitiveTypeInfo(TypeInfoFactory.java:136)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.getPrimitiveTypeInfo(TypeInfoFactory.java:109)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.GenMRSkewJoinProcessor.processSkewJoin(GenMRSkewJoinProcessor.java:214)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.SkewJoinProcFactory$SkewJoinJoinProcessor.process(SkewJoinProcFactory.java:60)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.SkewJoinResolver$SkewJoinTaskDispatcher.dispatch(SkewJoinResolver.java:100)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.SkewJoinResolver.resolve(SkewJoinResolver.java:55)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:107)
>   at 
> org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:270)
>   at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10219)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:211)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:459)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:316)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1189)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1116)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:216)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:168)
>   at

[jira] [Updated] (HIVE-14436) Hive 1.2.1/Hitting "ql.Driver: FAILED: IllegalArgumentException Error: , expected at the end of 'decimal(9'" after enabling hive.optimize.skewjoin and with MR engine

2016-08-09 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-14436:
--
Attachment: HIVE-14436.2.patch

That looks much better. Attach new patch.

> Hive 1.2.1/Hitting "ql.Driver: FAILED: IllegalArgumentException Error: , 
> expected at the end of 'decimal(9'" after enabling hive.optimize.skewjoin and 
> with MR engine
> -
>
> Key: HIVE-14436
> URL: https://issues.apache.org/jira/browse/HIVE-14436
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
> Environment: HDP 2.4.2/ Hive 1.2.1
>Reporter: Ratish Maruthiyodan
>Assignee: Daniel Dai
>  Labels: code
> Attachments: HIVE-14436.1.patch, HIVE-14436.2.patch
>
>
> PROBLEM:
> The following Query run with MapReduce engine with "hive.optimize.skewjoin = 
> true" fails with error:
> "FAILED: IllegalArgumentException Error: , expected at the end of 
> 'decimal(9'" 
> > SELECT a.col1 FROM db.tableA a  INNER JOIN  db.tableB b  ON b.key=a.key 
> > limit 5;
> FAILED: IllegalArgumentException Error: , expected at the end of 'decimal(9'
> 16/08/04 12:47:50 [main]: ERROR ql.Driver: FAILED: IllegalArgumentException 
> Error: , expected at the end of 'decimal(9'
> java.lang.IllegalArgumentException: Error: , expected at the end of 
> 'decimal(9'
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:336)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseParams(TypeInfoUtils.java:378)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parsePrimitiveParts(TypeInfoUtils.java:518)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.parsePrimitiveParts(TypeInfoUtils.java:533)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.createPrimitiveTypeInfo(TypeInfoFactory.java:136)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.getPrimitiveTypeInfo(TypeInfoFactory.java:109)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.GenMRSkewJoinProcessor.processSkewJoin(GenMRSkewJoinProcessor.java:214)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.SkewJoinProcFactory$SkewJoinJoinProcessor.process(SkewJoinProcFactory.java:60)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.SkewJoinResolver$SkewJoinTaskDispatcher.dispatch(SkewJoinResolver.java:100)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.SkewJoinResolver.resolve(SkewJoinResolver.java:55)
>   at 
> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:107)
>   at 
> org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:270)
>   at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:227)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10219)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:211)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:459)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:316)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1189)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1116)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:216)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:168)
>

[jira] [Updated] (HIVE-14399) Fix test flakiness of org.apache.hive.hcatalog.listener.TestDbNotificationListener.cleanupNotifs

2016-08-09 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-14399:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: 2.1.1
  2.2.0
Target Version/s: 2.2.0, 2.1.1  (was: 2.2.0)
  Status: Resolved  (was: Patch Available)

Patch pushed to master and 2.1-branch.

> Fix test flakiness of 
> org.apache.hive.hcatalog.listener.TestDbNotificationListener.cleanupNotifs
> 
>
> Key: HIVE-14399
> URL: https://issues.apache.org/jira/browse/HIVE-14399
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14399.1.patch
>
>
> We get intermittent test failure of TestDbNotificationListener.cleanupNotifs. 
> We shall make it stable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14482) Drop table partition is not audit logged in HMS

2016-08-09 Thread Eric Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin updated HIVE-14482:

Description: 
When running:

{code}
ALTER TABLE test DROP PARTITION (b=140);
{code}

I only see the following in the HMS log:

{code}
2016-08-08 23:12:34,081 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,082 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
2016-08-08 23:12:34,082 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
tbl=test
2016-08-08 23:12:34,094 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,095 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,095 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_partitions_by_expr : db=default 
tbl=test
2016-08-08 23:12:34,096 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_partitions_by_expr : 
db=default tbl=test
2016-08-08 23:12:34,112 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,172 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,173 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
2016-08-08 23:12:34,173 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
tbl=test
2016-08-08 23:12:34,186 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,186 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,187 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=default tbl=test
2016-08-08 23:12:34,187 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : db=default 
tbl=test
2016-08-08 23:12:34,199 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,203 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,215 INFO  org.apache.hadoop.hive.metastore.ObjectStore: 
[pool-4-thread-2]: JDO filter pushdown cannot be used: Filtering is supported 
only on partition keys of type string
2016-08-08 23:12:34,226 ERROR org.apache.hadoop.hdfs.KeyProviderCache: 
[pool-4-thread-2]: Could not find uri with key 
[dfs.encryption.key.provider.uri] to create a keyProvider !!
2016-08-08 23:12:34,239 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[pool-4-thread-2]: dropPartition() will move partition-directories to 
trash-directory.
2016-08-08 23:12:34,239 INFO  hive.metastore.hivemetastoressimpl: 
[pool-4-thread-2]: deleting  
hdfs://:8020/user/hive/warehouse/default/test/b=140
2016-08-08 23:12:34,247 INFO  org.apache.hadoop.fs.TrashPolicyDefault: 
[pool-4-thread-2]: Moved: 
'hdfs://:8020/user/hive/warehouse/default/test/b=140' to trash at: 
hdfs://:8020/user/hive/.Trash/Current/user/hive/warehouse/default/test/b=140
2016-08-08 23:12:34,247 INFO  hive.metastore.hivemetastoressimpl: 
[pool-4-thread-2]: Moved to trash: 
hdfs://:8020/user/hive/warehouse/default/test/b=140
2016-08-08 23:12:34,247 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
{code}

There is no entry in the "HiveMetaStore.audit" to show that partition b=140 was 
dropped.

When we add a new partition, we can see the following:

{code}
2016-08-08 23:04:48,534 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx append_partition : 
db=default tbl=test[130]
{code}

Ideally we should see the similar message when dropping partitions.

  was:
When running:

{code}
ALTER TABLE test DROP PARTITION (b=140);
{code}

I only see the following in the HMS log:

{code}
2016-08-08 23:12:34,081 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,082 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: 
[pool-4-thread-2]: 2: source:xx.xx.xxx.xxx get_table : db=case_104408 tbl=test
2016-08-08 23:12:34,082 INFO  
org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-4-thread-2]: 
ugi=hive ip=xx.xx.xxx.xxxcmd=source:xx.xx.xxx.xxx get_table : 
db=case_104408 tbl=test
2016-08-08 23:12:34,094 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,095 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: 
[pool-4-thread-2]: 
2016-08-08 23:12:34,095 INFO

1 2 >

100 matches

Mail list logo