[jira] [Created] (HIVE-23540) Fix Findbugs Warnings in EncodedColumnBatch

2020-05-22 Thread David Mollitor (Jira)
David Mollitor created HIVE-23540:
-

 Summary: Fix Findbugs Warnings in EncodedColumnBatch
 Key: HIVE-23540
 URL: https://issues.apache.org/jira/browse/HIVE-23540
 Project: Hive
  Issue Type: Improvement
  Components: storage-api
Reporter: David Mollitor
Assignee: David Mollitor
 Attachments: HIVE-23540.1.patch





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72433: Extract Create View analyzer from SemanticAnalyzer

2020-05-22 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72433/#review220850
---




common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
Line 442 (original)


Why is this error code going away? Even if it is not used probably makes 
sense to keep it around and mark it as deprecated for backwards reference.



ql/src/java/org/apache/hadoop/hive/ql/ddl/view/create/CreateMaterializedViewDesc.java
Lines 49 (patched)


It seems more appropiate to update the display name too at this point.



ql/src/java/org/apache/hadoop/hive/ql/ddl/view/create/CreateMaterializedViewDesc.java
Lines 67 (patched)


// only used for materialized views

These comments can go away.



ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
Line 1818 (original), 1818 (patched)


Can we make this protected again? I did not see any usage that is not by a 
subclass?



ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
Lines 566 (patched)


What is the goal of this block? Why is this needed now and it was not 
needed before? This seems more than a refactoring.

Fwiw there is some logic below executed when we create a view in 
_handleCreateViewDDL_.



ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
Line 1848 (original), 1854 (patched)


Why are we changing this? We should not have to change this logic since 
isView semantics should not change.



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Line 12614 (original), 12620 (patched)


Not sure why we need this change since information should be in qb.


- Jesús Camacho Rodríguez


On April 25, 2020, 11:23 a.m., Miklos Gergely wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72433/
> ---
> 
> (Updated April 25, 2020, 11:23 a.m.)
> 
> 
> Review request for hive and Jesús Camacho Rodríguez.
> 
> 
> Bugs: HIVE-23244
> https://issues.apache.org/jira/browse/HIVE-23244
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Create View commands are not queries, but commands which have queries as a 
> part of them. Therefore a separate CreateViewAnalyzer is needed which uses 
> SemanticAnalyer to analyze it's query.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 8e643fe844 
>   parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g b03b0989b8 
>   ql/src/java/org/apache/hadoop/hive/ql/ddl/DDLUtils.java b82fc5e91d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/ddl/view/create/AbstractCreateViewAnalyzer.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/ddl/view/create/AbstractCreateViewDesc.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/ddl/view/create/AlterViewAsAnalyzer.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/ddl/view/create/AlterViewAsDesc.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/ddl/view/create/AlterViewAsOperation.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/ddl/view/create/CreateMaterializedViewDesc.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/ddl/view/create/CreateMaterializedViewOperation.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/ddl/view/create/CreateViewAnalyzer.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/ddl/view/create/CreateViewDesc.java 
> d1f36945fb 
>   
> ql/src/java/org/apache/hadoop/hive/ql/ddl/view/create/CreateViewOperation.java
>  f7952a5cc1 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java 
> b578d48ce1 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
> 4f1e23d7a6 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 7b2e201e5a 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java bef02176c2 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java 9d94b6e2dd 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 0de3730351 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java 2350646c36 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java 2f3fc6c50a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java c75829c272 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadFileDesc.java 07bcef8ee3 
>   

[jira] [Created] (HIVE-23539) Optimize data copy during repl load operation for HDFS based staging location

2020-05-22 Thread Pravin Sinha (Jira)
Pravin Sinha created HIVE-23539:
---

 Summary: Optimize data copy during repl load operation for HDFS 
based staging location
 Key: HIVE-23539
 URL: https://issues.apache.org/jira/browse/HIVE-23539
 Project: Hive
  Issue Type: Improvement
Reporter: Pravin Sinha
Assignee: Pravin Sinha






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23538) Cannot run setBugDatabaseInfo from findbugs during preCommit

2020-05-22 Thread Pravin Sinha (Jira)
Pravin Sinha created HIVE-23538:
---

 Summary: Cannot run setBugDatabaseInfo from findbugs during 
preCommit
 Key: HIVE-23538
 URL: https://issues.apache.org/jira/browse/HIVE-23538
 Project: Hive
  Issue Type: Bug
Reporter: Pravin Sinha
Assignee: David Mollitor


During  the preCommit of the patch HIVE-23353 this is seen.
-1  findbugs1m 5s   patch/common cannot run setBugDatabaseInfo from 
findbugs
-1  findbugs10m 27s patch/ql cannot run setBugDatabaseInfo 
from findbugs
-1  findbugs1m 51s  patch/itests/hive-unit cannot run 
setBugDatabaseInfo from findbugs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23537) RecordReader support for row-filtering

2020-05-22 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-23537:
-

 Summary: RecordReader support for row-filtering
 Key: HIVE-23537
 URL: https://issues.apache.org/jira/browse/HIVE-23537
 Project: Hive
  Issue Type: Sub-task
  Components: llap, Reader
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


ORC-577 enables row-level filtering for the ORC format while HIVE-23167 is 
aiming to extend the existing compiler logic and push filters further down the 
pipeline wherever possible.

In this jira we extend the HIVE Record readers to utilize the above filtering 
functionality (similar to what we already do for PPD).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23536) Provide an option to skip stats generation for major compaction

2020-05-22 Thread Peter Vary (Jira)
Peter Vary created HIVE-23536:
-

 Summary: Provide an option to skip stats generation for major 
compaction
 Key: HIVE-23536
 URL: https://issues.apache.org/jira/browse/HIVE-23536
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Reporter: Peter Vary
Assignee: Peter Vary


Currently major MR compaction is regenerates stats every time if the column 
stats table contains some data. Some configurations do not use stats but 
because of historical reasons the column stats table can still contain some 
data.
We should provide a possibility to skip stats generation in these cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23535) Bump Minimum Required Version of Maven to 3.0.5

2020-05-22 Thread David Mollitor (Jira)
David Mollitor created HIVE-23535:
-

 Summary: Bump Minimum Required Version of Maven to 3.0.5
 Key: HIVE-23535
 URL: https://issues.apache.org/jira/browse/HIVE-23535
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor
 Attachments: HIVE-23535.1.patch

{code:xml|title=pom.xml}
  
2.2.1
  
{code}

Time to upgrade to 3.x

https://maven.apache.org/pom.html#prerequisites



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: HIVE-22066: Upgrade Apache parent POM to version 21

2020-05-22 Thread David Mollitor
Hello Gang,

I just pushed this change to the master.  Please be kind and patient and
let's try to correct any artifacts that may pop up as a result.

Thanks.

On Fri, Jan 17, 2020 at 9:31 AM David Mollitor  wrote:

> Hello Team,
>
> This ticket and patch is still waiting out there.  Any help would be
> greatly appreciated.
>
> Thanks.
>
> On Mon, Dec 9, 2019 at 2:38 PM David Mollitor  wrote:
>
>> Hello Gang,
>>
>> I have taken up the task of upgrading all of the Hive POM files to
>> upgrade the version of the Apache parent POM from which they inherent.
>>
>> I have hit a snag though and I would like to ask your support.
>>
>> The upgraded POM files builds locally for me but I cannot get it to pass
>> upstream.  I tracked the problem back to how YETUS is implemented.  YETUS's
>> first phase checks out and builds the 'master' branch of the project and
>> performs some sanity checks.  It builds the root project and all of its
>> modules.  For its second phase, it applies the submitted patch and builds
>> the project again.  However, YETUS does not build from the root POM file on
>> the second phase.  It only builds from directory of each module.  This is
>> problematic since my patch updates the parent POM file.  It looks to me
>> that YETUS will only build the parent POM file during phase 2 if the POM
>> file was modified to change the version number of the project.  I am not
>> incrementing the version here, I am hoping to get this into Hive 4.0.
>>
>> I would like to propose that we merge the latest patch I have provided
>> and work out any issues that may arise after the fact.  Future builds
>> should succeed since YETUS will build the modified root POM file for future
>> builds.
>>
>> Thanks
>>
>


[jira] [Created] (HIVE-23534) NPE in RetryingMetaStoreClient#invoke when catching MetaException with no message

2020-05-22 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-23534:
--

 Summary: NPE in RetryingMetaStoreClient#invoke when catching 
MetaException with no message
 Key: HIVE-23534
 URL: https://issues.apache.org/jira/browse/HIVE-23534
 Project: Hive
  Issue Type: Bug
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


RetryingMetaStoreClient#invoke method catches MetaException and attempts to 
classify it by checking the message. However there are cases (e.g., various 
places in 
[ObjectStore|https://github.com/apache/hive/blob/716f1f9a945a9a11e6702754667660d27e0a5cf4/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L3916])
 where the message of the MetaException is null and this leads to NPE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23533) Remove an FS#exists call from AcidUtils#getLogicalLength

2020-05-22 Thread Karen Coppage (Jira)
Karen Coppage created HIVE-23533:


 Summary: Remove an FS#exists call from AcidUtils#getLogicalLength
 Key: HIVE-23533
 URL: https://issues.apache.org/jira/browse/HIVE-23533
 Project: Hive
  Issue Type: Improvement
Reporter: Karen Coppage
Assignee: Karen Coppage


{code:java}
 Path lengths = OrcAcidUtils.getSideFile(file.getPath());
if(!fs.exists(lengths)) {
...
  return file.getLen();
}
long len = OrcAcidUtils.getLastFlushLength(fs, file.getPath());
{code}

OrcAcidUtils.getLastFlushLength also has an exists() check and returns 
Long.MAX_VALUE if false.
exists() is expensive on S3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23532) NPE when fetching incomplete column statistics from the metastore

2020-05-22 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-23532:
--

 Summary: NPE when fetching incomplete column statistics from the 
metastore
 Key: HIVE-23532
 URL: https://issues.apache.org/jira/browse/HIVE-23532
 Project: Hive
  Issue Type: Bug
Reporter: Stamatis Zampetakis


Certain operations may store in the metastore incomplete column statistics.  
Fetching those statistics back from the metastore leads to 
{{NullPointerException}} .

For instance consider a column "name" of type string. If we do have statistics 
for this column then the following info must be available:
* maxColLen; 
* avgColLen; 
* numNulls; 
* numDVs; 

Executing the following statement on a table with no stats updates a subset of 
the statistics for this column:

{code:sql}
ALTER TABLE example UPDATE STATISTICS for column name SET ('numDVs'='242', 
'numNulls'='5');
{code}

Fetching this kind of statistics leads to NPE that sometimes pops up in the 
client and some other times is buried in the logs leading to incomplete column 
stats during optimization and execution of a query.

Usually the stacktrace is similar to the one below:
{noformat}
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.metadata.Hive.getTableColumnStatistics(Hive.java:5251)
at 
org.apache.hadoop.hive.ql.ddl.table.info.desc.DescTableOperation.getColumnDataColPathSpecified(DescTableOperation.java:216)
at 
org.apache.hadoop.hive.ql.ddl.table.info.desc.DescTableOperation.execute(DescTableOperation.java:94)
at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:362)
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:335)
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:723)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:492)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:486)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:164)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:730)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:700)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
 

[jira] [Created] (HIVE-23531) Major CRUD QB compaction failing with ClassCastException when vectorization off

2020-05-22 Thread Karen Coppage (Jira)
Karen Coppage created HIVE-23531:


 Summary: Major CRUD QB compaction failing with ClassCastException 
when vectorization off
 Key: HIVE-23531
 URL: https://issues.apache.org/jira/browse/HIVE-23531
 Project: Hive
  Issue Type: Bug
Reporter: Karen Coppage
Assignee: Karen Coppage


Exception:
{code:java}
2020-05-22T01:33:09,944 ERROR [TezChild] tez.MapRecordSource: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:403)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to 
org.apache.hadoop.io.IntWritable
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:965)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:174)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
... 20 more
{code}
And some more in Tez.

Because when vectorization is turned on, primitives in the row are wrapped in 
Writables by VectorFileSinkOperator; when it is off, they are not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)