[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive

2015-12-10 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051359#comment-15051359
 ] 

Jason Dere commented on HIVE-11878:
---

Test failiures are not related

> ClassNotFoundException can possibly  occur if multiple jars are registered 
> one at a time in Hive
> 
>
> Key: HIVE-11878
> URL: https://issues.apache.org/jira/browse/HIVE-11878
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>  Labels: URLClassLoader
> Attachments: HIVE-11878 ClassLoader Issues when Registering 
> Jars.pptx, HIVE-11878.2.patch, HIVE-11878.3.patch, HIVE-11878.4.patch, 
> HIVE-11878.patch, HIVE-11878_approach3.patch, 
> HIVE-11878_approach3_per_session_clasloader.patch, 
> HIVE-11878_approach3_with_review_comments.patch, 
> HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch
>
>
> When we register a jar on the Hive console. Hive creates a fresh URL 
> classloader which includes the path of the current jar to be registered and 
> all the jar paths of the parent classloader. The parent classlaoder is the 
> current ThreadContextClassLoader. Once the URLClassloader is created Hive 
> sets that as the current ThreadContextClassloader.
> So if we register multiple jars in Hive, there will be multiple 
> URLClassLoaders created, each classloader including the jars from its parent 
> and the one extra jar to be registered. The last URLClassLoader created will 
> end up as the current ThreadContextClassLoader. (See details: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath)
> Now here's an example in which the above strategy can lead to a CNF exception.
> We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class 
> *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, 
> the URLClassLoader *u1* is created and also set as the 
> ThreadContextClassLoader. We register *j2* next, the new URLClassLoader 
> created will be *u2* with *u1* as parent and *u2* becomes the new 
> ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* 
> whereas *u1* only has paths to *j1* (For details see: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath).
> Now when we register class *c1* under a temporary function in Hive, we load 
> the class using {code} class.forName("c1", true, 
> Thread.currentThread().getContextClassLoader()) {code} . The 
> currentThreadContext class-loader is *u2*, and it has the path to the class 
> *c1*, but note that Class-loaders work by delegating to parent class-loader 
> first. In this case class *c1* will be found and *defined* by class-loader 
> *u1*.
> Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say 
> initialize) is called in *c1*, which references the class *c2*, *c2* will not 
> be found since the class-loader used to search for *c2* will be *u1* (Since 
> the caller's class-loader is used to load a class)
> I've added a qtest to explain the problem. Please see the attached patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive

2015-12-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051361#comment-15051361
 ] 

Ashutosh Chauhan commented on HIVE-11878:
-

Since this has been seen at other sites also, it will be good to land this in 
2.0 branch as well.

> ClassNotFoundException can possibly  occur if multiple jars are registered 
> one at a time in Hive
> 
>
> Key: HIVE-11878
> URL: https://issues.apache.org/jira/browse/HIVE-11878
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>  Labels: URLClassLoader
> Attachments: HIVE-11878 ClassLoader Issues when Registering 
> Jars.pptx, HIVE-11878.2.patch, HIVE-11878.3.patch, HIVE-11878.4.patch, 
> HIVE-11878.patch, HIVE-11878_approach3.patch, 
> HIVE-11878_approach3_per_session_clasloader.patch, 
> HIVE-11878_approach3_with_review_comments.patch, 
> HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch
>
>
> When we register a jar on the Hive console. Hive creates a fresh URL 
> classloader which includes the path of the current jar to be registered and 
> all the jar paths of the parent classloader. The parent classlaoder is the 
> current ThreadContextClassLoader. Once the URLClassloader is created Hive 
> sets that as the current ThreadContextClassloader.
> So if we register multiple jars in Hive, there will be multiple 
> URLClassLoaders created, each classloader including the jars from its parent 
> and the one extra jar to be registered. The last URLClassLoader created will 
> end up as the current ThreadContextClassLoader. (See details: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath)
> Now here's an example in which the above strategy can lead to a CNF exception.
> We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class 
> *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, 
> the URLClassLoader *u1* is created and also set as the 
> ThreadContextClassLoader. We register *j2* next, the new URLClassLoader 
> created will be *u2* with *u1* as parent and *u2* becomes the new 
> ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* 
> whereas *u1* only has paths to *j1* (For details see: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath).
> Now when we register class *c1* under a temporary function in Hive, we load 
> the class using {code} class.forName("c1", true, 
> Thread.currentThread().getContextClassLoader()) {code} . The 
> currentThreadContext class-loader is *u2*, and it has the path to the class 
> *c1*, but note that Class-loaders work by delegating to parent class-loader 
> first. In this case class *c1* will be found and *defined* by class-loader 
> *u1*.
> Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say 
> initialize) is called in *c1*, which references the class *c2*, *c2* will not 
> be found since the class-loader used to search for *c2* will be *u1* (Since 
> the caller's class-loader is used to load a class)
> I've added a qtest to explain the problem. Please see the attached patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12531) Implement fast-path for Year/Month UDFs for dates between 1999 and 2038

2015-12-10 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051442#comment-15051442
 ] 

Jason Dere commented on HIVE-12531:
---

Test failures do not appear to be related

> Implement fast-path for Year/Month UDFs for dates between 1999 and 2038
> ---
>
> Key: HIVE-12531
> URL: https://issues.apache.org/jira/browse/HIVE-12531
> Project: Hive
>  Issue Type: Improvement
>Reporter: Gopal V
>Assignee: Jason Dere
> Attachments: HIVE-12531.1.patch, HIVE-12531.2.patch, 
> HIVE-12531.3.patch
>
>
> Current codepath goes into the JDK Calendar implementation, which is very 
> slow for the simple cases in the current decade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-12572) select partitioned acid table order by throws java.io.FileNotFoundException

2015-12-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-12572.
-
Resolution: Duplicate

Sorry, didn't realize there was already a JIRA. The other one has a patch.

> select partitioned acid table order by throws java.io.FileNotFoundException
> ---
>
> Key: HIVE-12572
> URL: https://issues.apache.org/jira/browse/HIVE-12572
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Takahiko Saito
>Assignee: Alan Gates
>Priority: Critical
>
> Run the below queries:
> {noformat}
> create table test_acid (a int) partitioned by (b int) clustered by (a) into 2 
> buckets stored as orc tblproperties ('transactional'='true');
> insert into table test_acid partition (b=1) values (1), (2), (3), (4);
> select * from acid_partitioned order by a;
> {noformat}
> The above fails with the following error:
> {noformat}
> 15/12/02 21:12:30 INFO SessionState: Map 1: 0(+0,-4)/1Reducer 2: 0/1
> Status: Failed
> 15/12/02 21:12:30 ERROR SessionState: Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1449077191499_0023_1_00, 
> diagnostics=[Task failed, taskId=task_1449077191499_0023_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task: 
> attempt_1449077191499_0023_1_00_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.FileNotFoundException: Path is not a file: 
> /apps/hive/warehouse/test_acid/b=1
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:652)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:195)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:160)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:348)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.FileNotFoundException: Path is not a file: 
> /apps/hive/warehouse/test_acid/b=1
>   at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
>   at 
> 

[jira] [Updated] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-10 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11531:
---
Fix Version/s: (was: 2.0.0)
   2.1.0

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Fix For: 2.1.0
>
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051527#comment-15051527
 ] 

Sergey Shelukhin commented on HIVE-11531:
-

Makes sense. branch-2.0 doesn't require approval currently. I will send an 
update tomorrow morning-ish on the release.

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Fix For: 2.1.0
>
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11866) Add framework to enable testing using LDAPServer using LDAP protocol

2015-12-10 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051471#comment-15051471
 ] 

Naveen Gangam commented on HIVE-11866:
--

[~thejas] I am seeing no response on the LEGAL jira except a push from Apache 
DS team to adopt ApacheDS for testing. I have sought some internal legal advice 
on this matter. Here is what I got back
"When you say something "is not compatible with Apache 2" are you referring to 
a compatibility requirement imposed by the Apache Hive project particularly?  
Or are you just saying that the ENTIRE distribution, including all its 
dependencies, can no longer be released under Apache 2?

In either case, the UnboundIP license 
(https://docs.ldap.com/ldap-sdk/docs/LICENSE-UnboundID-LDAPSDK.txt) does 
include a clause allowing UnboundIP to unilaterally terminate the license at 
any time.

Notwithstanding anything contained in this Agreement to the contrary, UnboundID 
may also, in its sole discretion, terminate or suspend access to the SDK to You 
or any end user at any time. 

Naturally, that makes the SDK's inclusion in a project, particularly as a 
foundational tool, rather risky.  This may be why the jira considers the 
license "not compatible".  But from a cursory review I understand that 
UnboundID alternatively provides the SDK under the GPL or LGPL licenses 
(https://docs.ldap.com/ldap-sdk/docs/index.html) 
(https://docs.ldap.com/ldap-sdk/docs/LICENSE.txt).  If no one is modifying the 
SDK, but simply linking to it at arms-length as an independent file in its 
original form, I'm not sure why anyone would object to including it in Apache 
Hive as an LGPL dependency (Naveen seems to suggest below that he'll be using 
it at arms-length).   If Hive requires that ALL dependencies be Apache 
licensed, then this wouldn't be acceptable.  But taking a copy of the SDK under 
the LGPL license shouldn't prevent you from releasing the rest of Hive under 
the Apache license.  Furthermore, Naveen seems to indicate that the SDK will 
only appear internally on the Apache Hive machines (e.g., it won't be 
redistributed with the package provided to customers).   If that's the case, 
then use of an internal copy of the SDK solely on the Hive machines would be 
particularly benign.  See, e.g., 
(https://docs.ldap.com/ldap-sdk/docs/ldapsdk-faq.html#internal)

There are no restrictions on the use of the LDAP SDK in an application that is 
for internal use only and will not be redistributed outside of your 
organization.

(Still, if developers [inside and outside the Hive group] will be regularly 
downloading and uploading test framework builds that include the SDK from the 
Hive group, you might make the LGPL election explicit as a precaution)

So you may just have Naveen confirm that an LGPL 2.1 license would be 
acceptable to the Hive group and that the SDK copy will remain within Hive 
(e.g., the test framework is not distributed, except when developers push / 
pull the build from a Hive machine).

[I'm not sure which edition (standard, minimal, or commercial) of the SDK 
Naveen is contemplating using, but all three seem to have the same licensing 
options for his purpose (https://www.ldap.com/unboundid-ldap-sdk-for-java).]
"

Given that our usage is "at arms length" and the binaries wouldnt be packaged 
in the end product, do you think we have enough to re-insert this fix? Thanks

> Add framework to enable testing using LDAPServer using LDAP protocol
> 
>
> Key: HIVE-11866
> URL: https://issues.apache.org/jira/browse/HIVE-11866
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.3.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-11866.2.patch, HIVE-11866.patch
>
>
> Currently there is no unit test coverage for HS2's LDAP Atn provider using a 
> LDAP Server on the backend. This prevents testing of the LDAPAtnProvider with 
> some realistic usecases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-10 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051523#comment-15051523
 ] 

Jesus Camacho Rodriguez commented on HIVE-11531:


[~sershe], I want to backport this patch to 2.0.0 once HIVE-12644 goes in.

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Fix For: 2.1.0
>
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9544) Error dropping fully qualified partitioned table - Internal error processing get_partition_names

2015-12-10 Thread Ryan P (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051457#comment-15051457
 ] 

Ryan P commented on HIVE-9544:
--

This looks pretty similar to HIVE-10421. Are the offending tables partitioned?

> Error dropping fully qualified partitioned table - Internal error processing 
> get_partition_names
> 
>
> Key: HIVE-9544
> URL: https://issues.apache.org/jira/browse/HIVE-9544
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
> Environment: HDP 2.2
>Reporter: Hari Sekhon
>Assignee: Chaoyu Tang
>Priority: Minor
>
> When attempting to drop a partitioned table using a fully qualified name I 
> get this error:
> {code}
> hive -e 'drop table myDB.my_table_name;'
> Logging initialized using configuration in 
> file:/etc/hive/conf/hive-log4j.properties
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. 
> org.apache.thrift.TApplicationException: Internal error processing 
> get_partition_names
> {code}
> It succeeds if I instead do:
> {code}hive -e 'use myDB; drop table my_table_name;'{code}
> Regards,
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12645) ConstantPropagateProcCtx.resolve() should verify internal names first instead of alias to match 2 columns from different row schemas

2015-12-10 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-12645:
-
Summary: ConstantPropagateProcCtx.resolve() should verify internal names 
first instead of alias to match 2 columns from different row schemas   (was: 
ConstantPropagateProcCtx.resolve() should use internal names instead of alias 
to match 2 columns from different row schemas )

> ConstantPropagateProcCtx.resolve() should verify internal names first instead 
> of alias to match 2 columns from different row schemas 
> -
>
> Key: HIVE-12645
> URL: https://issues.apache.org/jira/browse/HIVE-12645
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> Currently, it seems that we look to match the ColumnInfo between the parent 
> and the child rowschemas by calling rci = rs.getColumnInfo(tblAlias, alias) 
> which might be a bit aggressive. i.e. we will lose opportunity to constant 
> propogate even if the columns are the same but the alias in the rowschemas do 
> not match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-10 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11531:
---
Fix Version/s: 2.0.0

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Fix For: 2.1.0
>
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12640) Allow StatsOptimizer to optimize the query for Constant GroupBy keys

2015-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051560#comment-15051560
 ] 

Hive QA commented on HIVE-12640:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776655/HIVE-12640.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 9887 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_metadata_only_queries
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6308/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6308/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6308/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776655 - PreCommit-HIVE-TRUNK-Build

> Allow StatsOptimizer to optimize the query for Constant GroupBy keys 
> -
>
> Key: HIVE-12640
> URL: https://issues.apache.org/jira/browse/HIVE-12640
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-12640.1.patch
>
>
> {code}
> hive> select count('1') from src group by '1';
> {code}
> In the above query, while performing StatsOptimizer optimization we can 
> safely ignore the group by on the constant key '1' since the above query will 
> return the same result as "select count('1') from src".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise

2015-12-10 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050382#comment-15050382
 ] 

Jesus Camacho Rodriguez commented on HIVE-11531:


+1

> Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
> -
>
> Key: HIVE-11531
> URL: https://issues.apache.org/jira/browse/HIVE-11531
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Hui Zheng
> Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, 
> HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, 
> HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, 
> HIVE-11531.patch
>
>
> For any UIs that involve pagination, it is useful to issue queries in the 
> form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be 
> paginated (which can be extremely large by itself). At present, ROW_NUMBER 
> can be used to achieve this effect, but optimizations for LIMIT such as TopN 
> in ReduceSink do not apply to ROW_NUMBER. We can add first class support for 
> "skip" to existing limit, or improve ROW_NUMBER for better performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12435) SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and vectorization is enabled.

2015-12-10 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12435:

Attachment: (was: vector_select_null2.q)

> SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and 
> vectorization is enabled.
> --
>
> Key: HIVE-12435
> URL: https://issues.apache.org/jira/browse/HIVE-12435
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.0.0
>Reporter: Takahiko Saito
>Assignee: Matt McCline
>Priority: Critical
>
> Run the following query:
> {noformat}
> create table count_case_groupby (key string, bool boolean) STORED AS orc;
> insert into table count_case_groupby values ('key1', true),('key2', 
> false),('key3', NULL),('key4', false),('key5',NULL);
> {noformat}
> The table contains the following:
> {noformat}
> key1  true
> key2  false
> key3  NULL
> key4  false
> key5  NULL
> {noformat}
> The below query returns:
> {noformat}
> SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) 
> AS cnt_bool0_ok FROM count_case_groupby GROUP BY key;
> key1  1
> key2  1
> key3  1
> key4  1
> key5  1
> {noformat}
> while it expects the following results:
> {noformat}
> key1  1
> key2  1
> key3  0
> key4  1
> key5  0
> {noformat}
> The query works with hive ver 1.2. Also it works when a table is not orc 
> format.
> Also even if it's an orc table, when vectorization is disabled, the query 
> works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12625) Backport to branch-1 HIVE-11981 ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)

2015-12-10 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-12625:

Component/s: (was: Hive)
 ORC

> Backport to branch-1 HIVE-11981 ORC Schema Evolution Issues (Vectorized, 
> ACID, and Non-Vectorized)
> --
>
> Key: HIVE-12625
> URL: https://issues.apache.org/jira/browse/HIVE-12625
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12625.1-branch1.patch, HIVE-12625.2-branch1.patch, 
> HIVE-12625.3-branch1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12435) SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and vectorization is enabled.

2015-12-10 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12435:

Attachment: HIVE-12435.01.patch

> SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and 
> vectorization is enabled.
> --
>
> Key: HIVE-12435
> URL: https://issues.apache.org/jira/browse/HIVE-12435
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.0.0
>Reporter: Takahiko Saito
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12435.01.patch
>
>
> Run the following query:
> {noformat}
> create table count_case_groupby (key string, bool boolean) STORED AS orc;
> insert into table count_case_groupby values ('key1', true),('key2', 
> false),('key3', NULL),('key4', false),('key5',NULL);
> {noformat}
> The table contains the following:
> {noformat}
> key1  true
> key2  false
> key3  NULL
> key4  false
> key5  NULL
> {noformat}
> The below query returns:
> {noformat}
> SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) 
> AS cnt_bool0_ok FROM count_case_groupby GROUP BY key;
> key1  1
> key2  1
> key3  1
> key4  1
> key5  1
> {noformat}
> while it expects the following results:
> {noformat}
> key1  1
> key2  1
> key3  0
> key4  1
> key5  0
> {noformat}
> The query works with hive ver 1.2. Also it works when a table is not orc 
> format.
> Also even if it's an orc table, when vectorization is disabled, the query 
> works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12632) LLAP: don't use IO elevator for ACID tables

2015-12-10 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-12632:

Component/s: llap

> LLAP: don't use IO elevator for ACID tables 
> 
>
> Key: HIVE-12632
> URL: https://issues.apache.org/jira/browse/HIVE-12632
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Takahiko Saito
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12632.patch
>
>
> Until HIVE-12631 is fixed, we need to avoid ACID tables in IO elevator. Right 
> now, a FileNotFound error is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12598) LLAP: disable fileId when not supported

2015-12-10 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-12598:

Component/s: llap

> LLAP: disable fileId when not supported
> ---
>
> Key: HIVE-12598
> URL: https://issues.apache.org/jira/browse/HIVE-12598
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.0.0, 2.1.0
>
> Attachments: HIVE-12598.01.patch, HIVE-12598.02.patch, 
> HIVE-12598.patch
>
>
> There is a TODO somewhere in code. We might get a synthetic fileId in absence 
> of the real one in some cases when another FS masquerades as HDFS, we should 
> be able to turn off fileID support explicitly for such cases as they are not 
> bulletproof.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive

2015-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050350#comment-15050350
 ] 

Hive QA commented on HIVE-11878:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776512/HIVE-11878.4.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 9874 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testAddPartitions
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6299/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6299/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6299/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776512 - PreCommit-HIVE-TRUNK-Build

> ClassNotFoundException can possibly  occur if multiple jars are registered 
> one at a time in Hive
> 
>
> Key: HIVE-11878
> URL: https://issues.apache.org/jira/browse/HIVE-11878
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>  Labels: URLClassLoader
> Attachments: HIVE-11878 ClassLoader Issues when Registering 
> Jars.pptx, HIVE-11878.2.patch, HIVE-11878.3.patch, HIVE-11878.4.patch, 
> HIVE-11878.patch, HIVE-11878_approach3.patch, 
> HIVE-11878_approach3_per_session_clasloader.patch, 
> HIVE-11878_approach3_with_review_comments.patch, 
> HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch
>
>
> When we register a jar on the Hive console. Hive creates a fresh URL 
> classloader which includes the path of the current jar to be registered and 
> all the jar paths of the parent classloader. The parent classlaoder is the 
> current ThreadContextClassLoader. Once the URLClassloader is created Hive 
> sets that as the current ThreadContextClassloader.
> So if we register multiple jars in Hive, there will be multiple 
> URLClassLoaders created, each classloader including the jars from its parent 
> and the one extra jar to be registered. The last URLClassLoader created will 
> end up as the current ThreadContextClassLoader. (See details: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath)
> Now here's an example in which the above strategy can lead to a CNF exception.
> We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class 
> *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, 
> the URLClassLoader *u1* is created and also set as the 
> ThreadContextClassLoader. We register *j2* next, the new URLClassLoader 
> created will be *u2* with *u1* as parent and *u2* becomes the new 
> ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* 
> whereas *u1* only has paths to *j1* (For details see: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath).
> Now when we register class *c1* under a temporary function in Hive, we load 
> the class using {code} class.forName("c1", true, 
> Thread.currentThread().getContextClassLoader()) {code} . The 
> currentThreadContext class-loader is *u2*, 

[jira] [Updated] (HIVE-12422) LLAP: add security to Web UI endpoint

2015-12-10 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-12422:

Component/s: llap

> LLAP: add security to Web UI endpoint
> -
>
> Key: HIVE-12422
> URL: https://issues.apache.org/jira/browse/HIVE-12422
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12422.01.patch, HIVE-12422.02.patch, 
> HIVE-12422.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11528) incrementally read query results when there's no ORDER BY

2015-12-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051602#comment-15051602
 ] 

Sergey Shelukhin commented on HIVE-11528:
-

Yes, IIRC order by query always ends with a single reducer to sort the data.
Without order by, multiple reducers output the results to multiple files, and 
then we read these. 
I think the pipeline is the same for both cases, the question is whether the 
last stage is a single reducer. 

> incrementally read query results when there's no ORDER BY
> -
>
> Key: HIVE-11528
> URL: https://issues.apache.org/jira/browse/HIVE-11528
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Keisuke Ogiwara
>
> May require HIVE-11527. When there's no ORDER BY and there's more than one 
> reducer on the last stage of the query, it should be possible to return data 
> to the user as it is produced, instead of waiting for all reducers to finish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12632) LLAP: don't use IO elevator for ACID tables

2015-12-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12632:

Target Version/s: 2.0.0

> LLAP: don't use IO elevator for ACID tables 
> 
>
> Key: HIVE-12632
> URL: https://issues.apache.org/jira/browse/HIVE-12632
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Takahiko Saito
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12632.patch
>
>
> Until HIVE-12631 is fixed, we need to avoid ACID tables in IO elevator. Right 
> now, a FileNotFound error is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11107) Support for Performance regression test suite with TPCDS

2015-12-10 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11107:
-
Attachment: HIVE-11107.7.patch

> Support for Performance regression test suite with TPCDS
> 
>
> Key: HIVE-11107
> URL: https://issues.apache.org/jira/browse/HIVE-11107
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-11107.1.patch, HIVE-11107.2.patch, 
> HIVE-11107.3.patch, HIVE-11107.4.patch, HIVE-11107.5.patch, 
> HIVE-11107.6.patch, HIVE-11107.7.patch
>
>
> Support to add TPCDS queries to the performance regression test suite with 
> Hive CBO turned on.
> This benchmark is intended to make sure that subsequent changes to the 
> optimizer or any hive code do not yield any unexpected plan changes. i.e.  
> the intention is to not run the entire TPCDS query set, but just "explain 
> plan" for the TPCDS queries.
> As part of this jira, we will manually verify that expected hive 
> optimizations kick in for the queries (for given stats/dataset). If there is 
> a difference in plan within this test suite due to a future commit, it needs 
> to be analyzed and we need to make sure that it is not a regression.
> The test suite can be run in master branch from itests by 
> {code}
> mvn test -Dtest=TestPerfCliDriver 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11107) Support for Performance regression test suite with TPCDS

2015-12-10 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11107:
-
Attachment: (was: HIVE-11107.6.patch)

> Support for Performance regression test suite with TPCDS
> 
>
> Key: HIVE-11107
> URL: https://issues.apache.org/jira/browse/HIVE-11107
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-11107.1.patch, HIVE-11107.2.patch, 
> HIVE-11107.3.patch, HIVE-11107.4.patch, HIVE-11107.5.patch, 
> HIVE-11107.6.patch, HIVE-11107.7.patch
>
>
> Support to add TPCDS queries to the performance regression test suite with 
> Hive CBO turned on.
> This benchmark is intended to make sure that subsequent changes to the 
> optimizer or any hive code do not yield any unexpected plan changes. i.e.  
> the intention is to not run the entire TPCDS query set, but just "explain 
> plan" for the TPCDS queries.
> As part of this jira, we will manually verify that expected hive 
> optimizations kick in for the queries (for given stats/dataset). If there is 
> a difference in plan within this test suite due to a future commit, it needs 
> to be analyzed and we need to make sure that it is not a regression.
> The test suite can be run in master branch from itests by 
> {code}
> mvn test -Dtest=TestPerfCliDriver 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12632) LLAP: don't use IO elevator for ACID tables

2015-12-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051738#comment-15051738
 ] 

Sergey Shelukhin commented on HIVE-12632:
-

Looks like LLAP tests are also broken as per the JIRA I just created. Don't 
know why I didn't hit this issue when I was originally running.

> LLAP: don't use IO elevator for ACID tables 
> 
>
> Key: HIVE-12632
> URL: https://issues.apache.org/jira/browse/HIVE-12632
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Takahiko Saito
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12632.patch
>
>
> Until HIVE-12631 is fixed, we need to avoid ACID tables in IO elevator. Right 
> now, a FileNotFound error is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12648) LLAP IO was disabled in CliDriver by accident (and tests are broken)

2015-12-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051758#comment-15051758
 ] 

Sergey Shelukhin commented on HIVE-12648:
-

I am trying to bisect now to see what broke the tests, since the failure is 
rather weird (buffer size mismatch when reading metadata)

> LLAP IO was disabled in CliDriver by accident (and tests are broken)
> 
>
> Key: HIVE-12648
> URL: https://issues.apache.org/jira/browse/HIVE-12648
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12648) LLAP IO was disabled in CliDriver by accident (and tests are broken)

2015-12-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12648:

Target Version/s: 2.0.0

> LLAP IO was disabled in CliDriver by accident (and tests are broken)
> 
>
> Key: HIVE-12648
> URL: https://issues.apache.org/jira/browse/HIVE-12648
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12648.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12535) Dynamic Hash Join: Key references are cyclic

2015-12-10 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051592#comment-15051592
 ] 

Laljo John Pullokkaran commented on HIVE-12535:
---

[~pxiong] Could you take a look?

> Dynamic Hash Join: Key references are cyclic
> 
>
> Key: HIVE-12535
> URL: https://issues.apache.org/jira/browse/HIVE-12535
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Gopal V
>Assignee: Jason Dere
> Attachments: philz_26.txt
>
>
> MAPJOIN_4227 is inside "Reducer 2", but refers back to "Reducer 2" in its 
> keys. It should say "Map 1" there.
> {code}
> ||<-Reducer 2 [SIMPLE_EDGE] vectorized, llap  
>   
>   
>   |
> |   Reduce Output Operator [RS_4189]  
>   
>   
>   |
> |  key expressions:_col0 (type: string), _col1 (type: 
> int)  
>   
>   |
> |  Map-reduce partition columns:_col0 (type: string), 
> _col1 (type: int) 
>   
>   |
> |  sort order:++  
>   
>   
>   |
> |  Statistics:Num rows: 83 Data size: 9213 Basic stats: 
> COMPLETE Column stats: COMPLETE   
>   
> |
> |  value expressions:_col2 (type: double) 
>   
>   
>   |
> |  Group By Operator [OP_4229]
>   
>   
>   |
> | aggregations:["sum(_col2)"] 
>   
>   
>   |
> | keys:_col0 (type: string), _col1 (type: int)
>   
>   
>   |
> | outputColumnNames:["_col0","_col1","_col2"] 
>   
>   
>   |
> | Statistics:Num rows: 83 Data size: 9213 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   
> |
> | Select Operator [OP_4228]   
>   
>   
>   |
> |outputColumnNames:["_col0","_col1","_col2"]  
>   
>   
>   |
> |Statistics:Num rows: 166 Data size: 26394 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   
>|
> |Map Join Operator 

[jira] [Updated] (HIVE-12633) LLAP: package included serde jars

2015-12-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12633:

Attachment: HIVE-12633.01.patch

Fixed the typo. I probably deleted the symbol accidentally after building...

> LLAP: package included serde jars
> -
>
> Key: HIVE-12633
> URL: https://issues.apache.org/jira/browse/HIVE-12633
> Project: Hive
>  Issue Type: Bug
>Reporter: Takahiko Saito
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12633.01.patch, HIVE-12633.patch
>
>
> Some SerDes like JSONSerde are not packaged with LLAP. One cannot localize 
> jars on the daemon (due to security consideration if nothing else), so we 
> should package them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12640) Allow StatsOptimizer to optimize the query for Constant GroupBy keys

2015-12-10 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-12640:
-
Description: 
{code}
hive> select count('1') from src group by '1';
{code}

In the above query, while performing StatsOptimizer optimization we can safely 
ignore the group by on the constant key '1' since the above query will return 
the same result as "select count('1') from src".

Exception:
If src is empty, according to the SQL standard,
{code}
 select count('1') from src group by '1'
{code}
and
{code}
 select count('1') from src
{code}

should produce 1 and 0 rows respectively.

  was:
{code}
hive> select count('1') from src group by '1';
{code}

In the above query, while performing StatsOptimizer optimization we can safely 
ignore the group by on the constant key '1' since the above query will return 
the same result as "select count('1') from src".

Exception:
If src is empty, according to the SQL standard, should
 select count('1') from src group by '1'
and
 select count('1') from src


> Allow StatsOptimizer to optimize the query for Constant GroupBy keys 
> -
>
> Key: HIVE-12640
> URL: https://issues.apache.org/jira/browse/HIVE-12640
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-12640.1.patch
>
>
> {code}
> hive> select count('1') from src group by '1';
> {code}
> In the above query, while performing StatsOptimizer optimization we can 
> safely ignore the group by on the constant key '1' since the above query will 
> return the same result as "select count('1') from src".
> Exception:
> If src is empty, according to the SQL standard,
> {code}
>  select count('1') from src group by '1'
> {code}
> and
> {code}
>  select count('1') from src
> {code}
> should produce 1 and 0 rows respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12535) Dynamic Hash Join: Key references are cyclic

2015-12-10 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051651#comment-15051651
 ] 

Pengcheng Xiong commented on HIVE-12535:


[~jdere], thanks for your investigation. Do you have any idea why big table 0 
is not in the input vertices? Thanks.

> Dynamic Hash Join: Key references are cyclic
> 
>
> Key: HIVE-12535
> URL: https://issues.apache.org/jira/browse/HIVE-12535
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Gopal V
>Assignee: Jason Dere
> Attachments: philz_26.txt
>
>
> MAPJOIN_4227 is inside "Reducer 2", but refers back to "Reducer 2" in its 
> keys. It should say "Map 1" there.
> {code}
> ||<-Reducer 2 [SIMPLE_EDGE] vectorized, llap  
>   
>   
>   |
> |   Reduce Output Operator [RS_4189]  
>   
>   
>   |
> |  key expressions:_col0 (type: string), _col1 (type: 
> int)  
>   
>   |
> |  Map-reduce partition columns:_col0 (type: string), 
> _col1 (type: int) 
>   
>   |
> |  sort order:++  
>   
>   
>   |
> |  Statistics:Num rows: 83 Data size: 9213 Basic stats: 
> COMPLETE Column stats: COMPLETE   
>   
> |
> |  value expressions:_col2 (type: double) 
>   
>   
>   |
> |  Group By Operator [OP_4229]
>   
>   
>   |
> | aggregations:["sum(_col2)"] 
>   
>   
>   |
> | keys:_col0 (type: string), _col1 (type: int)
>   
>   
>   |
> | outputColumnNames:["_col0","_col1","_col2"] 
>   
>   
>   |
> | Statistics:Num rows: 83 Data size: 9213 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   
> |
> | Select Operator [OP_4228]   
>   
>   
>   |
> |outputColumnNames:["_col0","_col1","_col2"]  
>   
>   
>   |
> |Statistics:Num rows: 166 Data size: 26394 Basic 
> stats: COMPLETE Column stats: COMPLETE
>   
>   

[jira] [Commented] (HIVE-12551) Fix several kryo exceptions in branch-1

2015-12-10 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051711#comment-15051711
 ] 

Prasanth Jayachandran commented on HIVE-12551:
--

[~Feng Yuan] Can you provide a small reproducible test case for this issue? 
This issues looks like PartitionDesc is not registered with kryo. I can fix 
that but it will not guarantee that it will completely solve the issue. Adding 
PartitionDesc alone won't be sufficient. It could show up some other issue 
after this. To fix it completely, if you can provide a small data set and 
queries that reproduces this issue, I will test the patch before updating the 
JIRA. 

> Fix several kryo exceptions in branch-1
> ---
>
> Key: HIVE-12551
> URL: https://issues.apache.org/jira/browse/HIVE-12551
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>  Labels: serialization
> Fix For: 1.3.0
>
> Attachments: HIVE-12551.1.patch
>
>
> HIVE-11519, HIVE-12174 and the following exception are all caused by 
> unregistered classes or serializers. HIVE-12175 should have fixed these 
> issues for master branch.
> {code}
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.NullPointerException
> Serialization trace:
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> expr (org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor)
> childExpressions 
> (org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterStringColumnBetween)
> conditionEvaluator 
> (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:367)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:276)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:1087)
>   at 
> 

[jira] [Commented] (HIVE-12548) Hive metastore goes down in Kerberos,sentry enabled CDH5.5 cluster

2015-12-10 Thread Andrew Olson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051722#comment-15051722
 ] 

Andrew Olson commented on HIVE-12548:
-

We're seeing this stack trace, from an Oozie Java action that connects to the 
MetaStore in a Kerberos-secured cluster. We've tried to provide the Hive 
credentials as described in 
https://oozie.apache.org/docs/4.2.0/DG_ActionAuthentication.html with no 
success so far. Any advice would be appreciated.

> Hive metastore goes down in Kerberos,sentry enabled CDH5.5 cluster
> --
>
> Key: HIVE-12548
> URL: https://issues.apache.org/jira/browse/HIVE-12548
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
> Environment: RHEL 6.5 CLOUDERA CDH 5.5
>Reporter: narendra reddy ganesana
>Assignee: Vaibhav Gumashta
>
> [pool-3-thread-10]: Error occurred during processing of message.
> java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: 
> Invalid status -128
>   at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:739)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:736)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:356)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:736)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.thrift.transport.TTransportException: Invalid status 
> -128
>   at 
> org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)
>   at 
> org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:184)
>   at 
> org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
>   at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
>   at 
> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
>   at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
>   ... 10 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12648) LLAP IO was disabled in CliDriver by accident (and tests are broken)

2015-12-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12648:

Attachment: HIVE-12648.patch

This re-enables the tests and removes all but one ctors to require that the 
callers think about the settings rather than getting the defaults blindly. The 
orc_llap test fails, I am bisecting to see what broke it.
[~prasanth_j] fyi

> LLAP IO was disabled in CliDriver by accident (and tests are broken)
> 
>
> Key: HIVE-12648
> URL: https://issues.apache.org/jira/browse/HIVE-12648
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12648.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12640) Allow StatsOptimizer to optimize the query for Constant GroupBy keys

2015-12-10 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051586#comment-15051586
 ] 

Julian Hyde commented on HIVE-12640:


If {{src}} is empty, according to the SQL standard, should {code} select 
count('1') from src group by '1'{code} and {code} select count('1') from 
src{code} return the same result? My understanding is that the first should 
return 1 row, the second 0 rows.

> Allow StatsOptimizer to optimize the query for Constant GroupBy keys 
> -
>
> Key: HIVE-12640
> URL: https://issues.apache.org/jira/browse/HIVE-12640
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-12640.1.patch
>
>
> {code}
> hive> select count('1') from src group by '1';
> {code}
> In the above query, while performing StatsOptimizer optimization we can 
> safely ignore the group by on the constant key '1' since the above query will 
> return the same result as "select count('1') from src".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12590) Repeated UDAFs with literals can produce incorrect result

2015-12-10 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051600#comment-15051600
 ] 

Laljo John Pullokkaran commented on HIVE-12590:
---

Obvious change was to remove lowercase conversion in Phase1.
However it seems deeper than that.

[~ashutoshc] Could you take a look? I have other stuff in my plate right now.

> Repeated UDAFs with literals can produce incorrect result
> -
>
> Key: HIVE-12590
> URL: https://issues.apache.org/jira/browse/HIVE-12590
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.0.1, 1.1.1, 1.2.1, 2.0.0
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
>Priority: Critical
>
> Repeated UDAF with literals could produce wrong result.
> This is not a common use case, nevertheless a bug.
> hive> select max('pants'), max('pANTS') from t1 group by key;
>  Total MapReduce CPU Time Spent: 0 msec
> OK
> pANTS pANTS
> pANTS pANTS
> pANTS pANTS
> pANTS pANTS
> pANTS pANTS
> Time taken: 296.252 seconds, Fetched: 5 row(s)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12590) Repeated UDAFs with literals can produce incorrect result

2015-12-10 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-12590:
--
Assignee: Ashutosh Chauhan  (was: Laljo John Pullokkaran)

> Repeated UDAFs with literals can produce incorrect result
> -
>
> Key: HIVE-12590
> URL: https://issues.apache.org/jira/browse/HIVE-12590
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.0.1, 1.1.1, 1.2.1, 2.0.0
>Reporter: Laljo John Pullokkaran
>Assignee: Ashutosh Chauhan
>Priority: Critical
>
> Repeated UDAF with literals could produce wrong result.
> This is not a common use case, nevertheless a bug.
> hive> select max('pants'), max('pANTS') from t1 group by key;
>  Total MapReduce CPU Time Spent: 0 msec
> OK
> pANTS pANTS
> pANTS pANTS
> pANTS pANTS
> pANTS pANTS
> pANTS pANTS
> Time taken: 296.252 seconds, Fetched: 5 row(s)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12640) Allow StatsOptimizer to optimize the query for Constant GroupBy keys

2015-12-10 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-12640:
-
Description: 
{code}
hive> select count('1') from src group by '1';
{code}

In the above query, while performing StatsOptimizer optimization we can safely 
ignore the group by on the constant key '1' since the above query will return 
the same result as "select count('1') from src".

Exception:
If src is empty, according to the SQL standard, should
 select count('1') from src group by '1'
and
 select count('1') from src

  was:
{code}
hive> select count('1') from src group by '1';
{code}

In the above query, while performing StatsOptimizer optimization we can safely 
ignore the group by on the constant key '1' since the above query will return 
the same result as "select count('1') from src".


> Allow StatsOptimizer to optimize the query for Constant GroupBy keys 
> -
>
> Key: HIVE-12640
> URL: https://issues.apache.org/jira/browse/HIVE-12640
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-12640.1.patch
>
>
> {code}
> hive> select count('1') from src group by '1';
> {code}
> In the above query, while performing StatsOptimizer optimization we can 
> safely ignore the group by on the constant key '1' since the above query will 
> return the same result as "select count('1') from src".
> Exception:
> If src is empty, according to the SQL standard, should
>  select count('1') from src group by '1'
> and
>  select count('1') from src



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12648) LLAP IO was disabled in CliDriver by accident (and tests are broken)

2015-12-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-12648:
---

Assignee: Sergey Shelukhin

> LLAP IO was disabled in CliDriver by accident (and tests are broken)
> 
>
> Key: HIVE-12648
> URL: https://issues.apache.org/jira/browse/HIVE-12648
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12573) some DPP tests are broken

2015-12-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12573:

Target Version/s: 2.0.0

> some DPP tests are broken
> -
>
> Key: HIVE-12573
> URL: https://issues.apache.org/jira/browse/HIVE-12573
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12573.patch
>
>
> -It looks like LLAP out files were not updated in some DPP JIRA because the 
> test was entirely broken in HiveQA at the time- actually looks like out files 
> have explain output with a glitch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12640) Allow StatsOptimizer to optimize the query for Constant GroupBy keys

2015-12-10 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051646#comment-15051646
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-12640:
--

[~julianhyde] Thanks , I have noted that as a condition to cover in the jira 
description.

Thanks
Hari

> Allow StatsOptimizer to optimize the query for Constant GroupBy keys 
> -
>
> Key: HIVE-12640
> URL: https://issues.apache.org/jira/browse/HIVE-12640
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-12640.1.patch
>
>
> {code}
> hive> select count('1') from src group by '1';
> {code}
> In the above query, while performing StatsOptimizer optimization we can 
> safely ignore the group by on the constant key '1' since the above query will 
> return the same result as "select count('1') from src".
> Exception:
> If src is empty, according to the SQL standard,
> {code}
>  select count('1') from src group by '1'
> {code}
> and
> {code}
>  select count('1') from src
> {code}
> should produce 1 and 0 rows respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11107) Support for Performance regression test suite with TPCDS

2015-12-10 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11107:
-
Attachment: HIVE-11107.6.patch

> Support for Performance regression test suite with TPCDS
> 
>
> Key: HIVE-11107
> URL: https://issues.apache.org/jira/browse/HIVE-11107
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-11107.1.patch, HIVE-11107.2.patch, 
> HIVE-11107.3.patch, HIVE-11107.4.patch, HIVE-11107.5.patch, 
> HIVE-11107.6.patch, HIVE-11107.6.patch
>
>
> Support to add TPCDS queries to the performance regression test suite with 
> Hive CBO turned on.
> This benchmark is intended to make sure that subsequent changes to the 
> optimizer or any hive code do not yield any unexpected plan changes. i.e.  
> the intention is to not run the entire TPCDS query set, but just "explain 
> plan" for the TPCDS queries.
> As part of this jira, we will manually verify that expected hive 
> optimizations kick in for the queries (for given stats/dataset). If there is 
> a difference in plan within this test suite due to a future commit, it needs 
> to be analyzed and we need to make sure that it is not a regression.
> The test suite can be run in master branch from itests by 
> {code}
> mvn test -Dtest=TestPerfCliDriver 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-11538) Add an option to skip init script while running tests

2015-12-10 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051859#comment-15051859
 ] 

Lefty Leverenz edited comment on HIVE-11538 at 12/10/15 11:49 PM:
--

[~ashutoshc], this is now added to the wikidoc in the HiveDeveloperFAQ:
* [HiveDeveloperFAQ - Testing | 
https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-Testing]
** [How do I modify the init script when testing? | 
https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-HowdoImodifytheinitscriptwhentesting?]


was (Author: sladymon):
[~ashutoshc], this is now added to the wikidoc in the HiveDeveloperFAQ:
* [HiveDeveloperFAQ - Testing | 
https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-Testing]

> Add an option to skip init script while running tests
> -
>
> Key: HIVE-11538
> URL: https://issues.apache.org/jira/browse/HIVE-11538
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11538.2.patch, HIVE-11538.3.patch, HIVE-11538.patch
>
>
> {{q_test_init.sql}} has grown over time. Now, it takes substantial amount of 
> time. When debugging a particular query which doesn't need such 
> initialization, this delay is annoyance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12610) Hybrid Grace Hash Join should fail task faster if processing first batch fails, instead of continuing processing the rest

2015-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051910#comment-15051910
 ] 

Hive QA commented on HIVE-12610:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776660/HIVE-12610.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 9891 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6310/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6310/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6310/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776660 - PreCommit-HIVE-TRUNK-Build

> Hybrid Grace Hash Join should fail task faster if processing first batch 
> fails, instead of continuing processing the rest
> -
>
> Key: HIVE-12610
> URL: https://issues.apache.org/jira/browse/HIVE-12610
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12610.1.patch, HIVE-12610.2.patch
>
>
> During processing the spilled partitions, if there's any fatal error, such as 
> Kryo exception, then we should exit early, instead of moving on to process 
> the rest of spilled partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12445) Tracking of completed dags is a slow memory leak

2015-12-10 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051968#comment-15051968
 ] 

Siddharth Seth commented on HIVE-12445:
---

+1. Looks good.

> Tracking of completed dags is a slow memory leak
> 
>
> Key: HIVE-12445
> URL: https://issues.apache.org/jira/browse/HIVE-12445
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 2.0.0
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12445.patch
>
>
> LLAP daemons track completed DAGs, but never clean up these structures. This 
> is primarily to disallow out of order executions. Evaluate whether that can 
> be avoided - otherwise this structure needs to be cleaned up with a delay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12596) Delete timestamp row throws java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]

2015-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051792#comment-15051792
 ] 

Hive QA commented on HIVE-12596:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776657/HIVE-12596.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 28 failed/errored test(s), 9893 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_join1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge10
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_script_env_var2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_aggregate_9
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_simple
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_distinct_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_orderby_5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_17
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_casts
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6309/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6309/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6309/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 28 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776657 - PreCommit-HIVE-TRUNK-Build

> Delete timestamp row throws java.lang.IllegalArgumentException: Timestamp 
> format must be -mm-dd hh:mm:ss[.f]
> 
>
> Key: HIVE-12596
> URL: https://issues.apache.org/jira/browse/HIVE-12596
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Takahiko Saito
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12596.1.patch
>
>
> Run the below:
> {noformat}
> create table test_acid( i int, ts timestamp)
>   clustered by (i) into 2 buckets
>   stored as orc
>   tblproperties ('transactional'='true');
> insert into table test_acid values (1, '2014-09-14 12:34:30');
> delete from test_acid where ts = '2014-15-16 17:18:19.20';
> {noformat}
> The below error is thrown:
> {noformat}
> 15/12/04 19:55:49 INFO SessionState: Map 1: -/-   Reducer 2: 0/2
> Status: Failed
> 15/12/04 19:55:49 ERROR SessionState: Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1447960616881_0022_2_00, 
> diagnostics=[Vertex vertex_1447960616881_0022_2_00 [Map 1] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: test_acid initializer failed, 
> vertex=vertex_1447960616881_0022_2_00 [Map 1], 
> java.lang.IllegalArgumentException: Timestamp format must be -mm-dd 
> 

[jira] [Commented] (HIVE-12184) DESCRIBE of fully qualified table fails when db and table name match and non-default database is in use

2015-12-10 Thread Shannon Ladymon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051833#comment-15051833
 ] 

Shannon Ladymon commented on HIVE-12184:


[~ngangam], that helps a lot, thank you.  I've updated the doc with the 
clarification of the two syntax formats and the description of field_name.  
Since we aren't completely sure that describe partition can use all those 
options, I haven't added any of them to the doc at this time.

> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use
> ---
>
> Key: HIVE-12184
> URL: https://issues.apache.org/jira/browse/HIVE-12184
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1
>Reporter: Lenni Kuff
>Assignee: Naveen Gangam
> Fix For: 2.0.0
>
> Attachments: HIVE-12184.10.patch, HIVE-12184.10.patch, 
> HIVE-12184.2.patch, HIVE-12184.3.patch, HIVE-12184.4.patch, 
> HIVE-12184.5.patch, HIVE-12184.6.patch, HIVE-12184.7.patch, 
> HIVE-12184.8.patch, HIVE-12184.9.patch, HIVE-12184.patch
>
>
> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use.
> Repro:
> {code}
> : jdbc:hive2://localhost:1/default> create database foo;
> No rows affected (0.116 seconds)
> 0: jdbc:hive2://localhost:1/default> create table foo.foo(i int);
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> +---++--+--+
> | col_name  | data_type  | comment  |
> +---++--+--+
> | i | int|  |
> +---++--+--+
> 1 row selected (0.049 seconds)
> 0: jdbc:hive2://localhost:1/default> use foo;
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error in getting fields from 
> serde.Invalid Field foo (state=08S01,code=1)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9544) Error dropping fully qualified partitioned table - Internal error processing get_partition_names

2015-12-10 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051891#comment-15051891
 ] 

Chaoyu Tang commented on HIVE-9544:
---

It is possible since I was not able to reproduce this issue in current upstream.

> Error dropping fully qualified partitioned table - Internal error processing 
> get_partition_names
> 
>
> Key: HIVE-9544
> URL: https://issues.apache.org/jira/browse/HIVE-9544
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
> Environment: HDP 2.2
>Reporter: Hari Sekhon
>Assignee: Chaoyu Tang
>Priority: Minor
>
> When attempting to drop a partitioned table using a fully qualified name I 
> get this error:
> {code}
> hive -e 'drop table myDB.my_table_name;'
> Logging initialized using configuration in 
> file:/etc/hive/conf/hive-log4j.properties
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. 
> org.apache.thrift.TApplicationException: Internal error processing 
> get_partition_names
> {code}
> It succeeds if I instead do:
> {code}hive -e 'use myDB; drop table my_table_name;'{code}
> Regards,
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11538) Add an option to skip init script while running tests

2015-12-10 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051902#comment-15051902
 ] 

Lefty Leverenz commented on HIVE-11538:
---

[~ashutoshc] since this is only for Hive 2.0.0+, is -Phadoop-2 really necessary 
in the mvn commands?

{code}
mvn test -Dtest=TestCliDriver -Phadoop-2 -Dqfile=test_to_run.q  -DinitScript=
{code}

and

{code}
mvn test -Dtest=TestCliDriver -Phadoop-2 -Dtest.output.overwrite=true 
-Dqfile=test_to_run.q  -DinitScript=custom_script.sql
{code}

> Add an option to skip init script while running tests
> -
>
> Key: HIVE-11538
> URL: https://issues.apache.org/jira/browse/HIVE-11538
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11538.2.patch, HIVE-11538.3.patch, HIVE-11538.patch
>
>
> {{q_test_init.sql}} has grown over time. Now, it takes substantial amount of 
> time. When debugging a particular query which doesn't need such 
> initialization, this delay is annoyance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12635) Hive should return the latest hbase cell timestamp as the row timestamp value

2015-12-10 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051912#comment-15051912
 ] 

Szehon Ho commented on HIVE-12635:
--

Chatted with Aihua on patch, seems simple.

Small comment for consideration, does it make sense to init timestamp variable 
to 0, and then do the loop from 0 to result.rawCells().length?  For cleaner 
code.

And you may know more than me, does this qualify for backward-incompatibility?  
+1 other than those.



> Hive should return the latest hbase cell timestamp as the row timestamp value
> -
>
> Key: HIVE-12635
> URL: https://issues.apache.org/jira/browse/HIVE-12635
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-12635.patch
>
>
> When hive talks to hbase and maps hbase timestamp field to one hive column,  
> seems hive returns the first cell timestamp instead of the latest one as the 
> timestamp value. 
> Makes sense to return the latest timestamp since adding the latest cell can 
> be  considered an update to the row. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11866) Add framework to enable testing using LDAPServer using LDAP protocol

2015-12-10 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051794#comment-15051794
 ] 

Thejas M Nair commented on HIVE-11866:
--

[~ngangam] Thanks for following up on this. The tests would be very valuable.
Did you get a chance to look at suggestions in the thread 
http://mail-archives.apache.org/mod_mbox/directory-users/201510.mbox/browser ? 
(refered from LEGAL-227). Looks like people in the directory list are very 
willing to help.

Avoiding a GPL licensed library keeps things much more straightforward, and 
avoids the chances of someone accidentally assuming its apache compatible and 
start using in the product itself.
Another possibility is bugs in the build scripts, I already see junit in Apache 
hive 1.2.1, though it should not be there.



> Add framework to enable testing using LDAPServer using LDAP protocol
> 
>
> Key: HIVE-11866
> URL: https://issues.apache.org/jira/browse/HIVE-11866
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.3.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-11866.2.patch, HIVE-11866.patch
>
>
> Currently there is no unit test coverage for HS2's LDAP Atn provider using a 
> LDAP Server on the backend. This prevents testing of the LDAPAtnProvider with 
> some realistic usecases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12435) SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and vectorization is enabled.

2015-12-10 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-12435:

Attachment: HIVE-12435.02.patch

Review comments.

> SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and 
> vectorization is enabled.
> --
>
> Key: HIVE-12435
> URL: https://issues.apache.org/jira/browse/HIVE-12435
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.0.0
>Reporter: Takahiko Saito
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12435.01.patch, HIVE-12435.02.patch
>
>
> Run the following query:
> {noformat}
> create table count_case_groupby (key string, bool boolean) STORED AS orc;
> insert into table count_case_groupby values ('key1', true),('key2', 
> false),('key3', NULL),('key4', false),('key5',NULL);
> {noformat}
> The table contains the following:
> {noformat}
> key1  true
> key2  false
> key3  NULL
> key4  false
> key5  NULL
> {noformat}
> The below query returns:
> {noformat}
> SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) 
> AS cnt_bool0_ok FROM count_case_groupby GROUP BY key;
> key1  1
> key2  1
> key3  1
> key4  1
> key5  1
> {noformat}
> while it expects the following results:
> {noformat}
> key1  1
> key2  1
> key3  0
> key4  1
> key5  0
> {noformat}
> The query works with hive ver 1.2. Also it works when a table is not orc 
> format.
> Also even if it's an orc table, when vectorization is disabled, the query 
> works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-2651) The variable hive.exec.mode.local.auto.tasks.max should be changed

2015-12-10 Thread Shannon Ladymon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051947#comment-15051947
 ] 

Shannon Ladymon commented on HIVE-2651:
---

Doc note: The old variable, *hive.exec.mode.local.auto.tasks.max*, is available 
in the wiki here:
* [Configuration Properties - hive.exec.mode.local.auto.tasks.max | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.mode.local.auto.tasks.max]

The new variable, *hive.exec.mode.local.auto.input.files.max*, is available in 
the wiki here:
* [Configuration Properties - hive.exec.mode.local.auto.input.files.max | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.mode.local.auto.input.files.max]

> The variable hive.exec.mode.local.auto.tasks.max should be changed
> --
>
> Key: HIVE-2651
> URL: https://issues.apache.org/jira/browse/HIVE-2651
> Project: Hive
>  Issue Type: Improvement
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.9.0
>
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2651.D861.1.patch
>
>
> It should be called hive.exec.mode.local.auto.input.files.max instead.
> The number of input files are checked currently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12635) Hive should return the latest hbase cell timestamp as the row timestamp value

2015-12-10 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051850#comment-15051850
 ] 

Aihua Xu commented on HIVE-12635:
-

[~szehon] Can you help review the patch?

> Hive should return the latest hbase cell timestamp as the row timestamp value
> -
>
> Key: HIVE-12635
> URL: https://issues.apache.org/jira/browse/HIVE-12635
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-12635.patch
>
>
> When hive talks to hbase and maps hbase timestamp field to one hive column,  
> seems hive returns the first cell timestamp instead of the latest one as the 
> timestamp value. 
> Makes sense to return the latest timestamp since adding the latest cell can 
> be  considered an update to the row. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12366) Refactor Heartbeater logic for transaction

2015-12-10 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-12366:
-
Attachment: HIVE-12366.6.patch

Rebased, patch 6

> Refactor Heartbeater logic for transaction
> --
>
> Key: HIVE-12366
> URL: https://issues.apache.org/jira/browse/HIVE-12366
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12366.1.patch, HIVE-12366.2.patch, 
> HIVE-12366.3.patch, HIVE-12366.4.patch, HIVE-12366.5.patch, HIVE-12366.6.patch
>
>
> Currently there is a gap between the time locks acquisition and the first 
> heartbeat being sent out. Normally the gap is negligible, but when it's big 
> it will cause query fail since the locks are timed out by the time the 
> heartbeat is sent.
> Need to remove this gap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12435) SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and vectorization is enabled.

2015-12-10 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051908#comment-15051908
 ] 

Prasanth Jayachandran commented on HIVE-12435:
--

+1

> SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and 
> vectorization is enabled.
> --
>
> Key: HIVE-12435
> URL: https://issues.apache.org/jira/browse/HIVE-12435
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.0.0
>Reporter: Takahiko Saito
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12435.01.patch, HIVE-12435.02.patch
>
>
> Run the following query:
> {noformat}
> create table count_case_groupby (key string, bool boolean) STORED AS orc;
> insert into table count_case_groupby values ('key1', true),('key2', 
> false),('key3', NULL),('key4', false),('key5',NULL);
> {noformat}
> The table contains the following:
> {noformat}
> key1  true
> key2  false
> key3  NULL
> key4  false
> key5  NULL
> {noformat}
> The below query returns:
> {noformat}
> SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) 
> AS cnt_bool0_ok FROM count_case_groupby GROUP BY key;
> key1  1
> key2  1
> key3  1
> key4  1
> key5  1
> {noformat}
> while it expects the following results:
> {noformat}
> key1  1
> key2  1
> key3  0
> key4  1
> key5  0
> {noformat}
> The query works with hive ver 1.2. Also it works when a table is not orc 
> format.
> Also even if it's an orc table, when vectorization is disabled, the query 
> works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11538) Add an option to skip init script while running tests

2015-12-10 Thread Shannon Ladymon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051859#comment-15051859
 ] 

Shannon Ladymon commented on HIVE-11538:


[~ashutoshc], this is now added to the wikidoc in the HiveDeveloperFAQ:
* [HiveDeveloperFAQ - Testing | 
https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-Testing]

> Add an option to skip init script while running tests
> -
>
> Key: HIVE-11538
> URL: https://issues.apache.org/jira/browse/HIVE-11538
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11538.2.patch, HIVE-11538.3.patch, HIVE-11538.patch
>
>
> {{q_test_init.sql}} has grown over time. Now, it takes substantial amount of 
> time. When debugging a particular query which doesn't need such 
> initialization, this delay is annoyance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12435) SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and vectorization is enabled.

2015-12-10 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051877#comment-15051877
 ] 

Prasanth Jayachandran commented on HIVE-12435:
--

This assumption is fragile
{code}
PrimitiveCategory primitiveCategory = ((PrimitiveTypeInfo) 
expr.getTypeInfo()).getPrimitiveCategory();
{code}

Can you instead get TypeInfo object and see if it is an instance of 
PrimitiveTypeInfo and of type VOID?

> SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and 
> vectorization is enabled.
> --
>
> Key: HIVE-12435
> URL: https://issues.apache.org/jira/browse/HIVE-12435
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 2.0.0
>Reporter: Takahiko Saito
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-12435.01.patch
>
>
> Run the following query:
> {noformat}
> create table count_case_groupby (key string, bool boolean) STORED AS orc;
> insert into table count_case_groupby values ('key1', true),('key2', 
> false),('key3', NULL),('key4', false),('key5',NULL);
> {noformat}
> The table contains the following:
> {noformat}
> key1  true
> key2  false
> key3  NULL
> key4  false
> key5  NULL
> {noformat}
> The below query returns:
> {noformat}
> SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) 
> AS cnt_bool0_ok FROM count_case_groupby GROUP BY key;
> key1  1
> key2  1
> key3  1
> key4  1
> key5  1
> {noformat}
> while it expects the following results:
> {noformat}
> key1  1
> key2  1
> key3  0
> key4  1
> key5  0
> {noformat}
> The query works with hive ver 1.2. Also it works when a table is not orc 
> format.
> Also even if it's an orc table, when vectorization is disabled, the query 
> works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics

2015-12-10 Thread Shannon Ladymon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051945#comment-15051945
 ] 

Shannon Ladymon commented on HIVE-1408:
---

Doc note: These configuration properties are available in the wiki here:
* [Configuration Properties - hive.exec.mode.local.auto | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.mode.local.auto]
* [Configuration Properties - hive.exec.mode.local.auto.inputbytes.max | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.mode.local.auto.inputbytes.max]
* [Configuration Properties - hive.exec.mode.local.auto.tasks.max | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.mode.local.auto.tasks.max]
* [Configuration Properties - hive.exec.mode.local.auto.input.files.max | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.mode.local.auto.input.files.max]

> add option to let hive automatically run in local mode based on tunable 
> heuristics
> --
>
> Key: HIVE-1408
> URL: https://issues.apache.org/jira/browse/HIVE-1408
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Joydeep Sen Sarma
>Assignee: Joydeep Sen Sarma
> Fix For: 0.7.0
>
> Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, 
> 1408.7.patch, hive-1408.6.patch
>
>
> as a followup to HIVE-543 - we should have a simple option (enabled by 
> default) to let hive run in local mode if possible.
> two levels of options are desirable:
> 1. hive.exec.mode.local.auto=true/false // control whether local mode is 
> automatically chosen
> 2. Options to control different heuristics, some naiive examples:
>  hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode 
> if data > 1G
>  hive.exec.mode.local.auto.script.enable=true/false // choose if local 
> mode is enabled for queries with user scripts
> this can be implemented as a pre/post execution hook. It makes sense to 
> provide this as a standard hook in the hive codebase since it's likely to 
> improve response time for many users (especially for test queries).
> the initial proposal is to choose this at a query level and not at per 
> hive-task (ie. hadoop job) level. per job-level requires more changes to 
> compilation (to not pre-commit to hdfs or local scratch directories at 
> compile time).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin

2015-12-10 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050888#comment-15050888
 ] 

Nemon Lou commented on HIVE-12616:
--

[~xuefuz], thanks for review .
It is not surprising that you doubt about the "spark.master" setting in 
HiveConf .
I owe one explanation for the issue described here .

For short , "spark.master" is set for HiveConf during the creation of 
HiveSparkClient.
Snippet of HiveSparkClientFactory#initiateSparkConf :
{code}
String sparkMaster = hiveConf.get("spark.master");
if (sparkMaster == null) {
  sparkMaster = sparkConf.get("spark.master");
  hiveConf.set("spark.master", sparkMaster);
}
{code}
The creation of HiveSparkClient only happens once due to reuse (known as 
SparkSession).
However ,this HiveConf is operation level instead of session level (due to 
asynchronous query).
So ,only the first operation's JobConf has "spark.master" with it .


Now I have two choices :
1, Setting "spark.master" at session level during HiveSparkClient creation .
2, Setting "spark.master" for each operation when not set before ,but using 
sparkConf instead of hiveConf from RemoteHiveSparkClient.(SparkConf in 
RemoteHiveSparkClient already set "spark.master" in an explicit way .) 
Which one do you prefer ?

Adding a test case for this issue seems difficult (yarn-cluster mode,multiple 
operation in one session ),would you provide some guidance ? 
Thanks.


> NullPointerException when spark session is reused to run a mapjoin
> --
>
> Key: HIVE-12616
> URL: https://issues.apache.org/jira/browse/HIVE-12616
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.3.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-12616.patch
>
>
> The way to reproduce:
> {noformat}
> set hive.execution.engine=spark;
> create table if not exists test(id int);
> create table if not exists test1(id int);
> insert into test values(1);
> insert into test1 values(1);
> select max(a.id) from test a ,test1 b
> where a.id = b.id;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12316) Improved integration test for Hive

2015-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050627#comment-15050627
 ] 

Hive QA commented on HIVE-12316:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776619/HIVE-12316.5.patch

{color:green}SUCCESS:{color} +1 due to 25 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 9985 tests 
executed
*Failed tests:*
{noformat}
TestConf - did not produce a TEST-*.xml file
TestHWISessionManager - did not produce a TEST-*.xml file
TestManager - did not produce a TEST-*.xml file
TestTable - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_date_1
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6301/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6301/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6301/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776619 - PreCommit-HIVE-TRUNK-Build

> Improved integration test for Hive
> --
>
> Key: HIVE-12316
> URL: https://issues.apache.org/jira/browse/HIVE-12316
> Project: Hive
>  Issue Type: New Feature
>  Components: Testing Infrastructure
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-12316.2.patch, HIVE-12316.5.patch, HIVE-12316.patch
>
>
> In working with Hive testing I have found there are several issues that are 
> causing problems for developers, testers, and users:
> * Because Hive has many tunable knobs (file format, security, etc.) we end up 
> with tests that cover the same functionality with different permutations of 
> these features.
> * The Hive integration tests (ie qfiles) cannot be run on a cluster.  This 
> means we cannot run any of those tests at scale.  The HBase community by 
> contrast uses the same test suite locally and on a cluster, and has found 
> that this helps them greatly in testing.
> * Golden files are a grievous evil.  Test writers are forced to eyeball 
> results the first time they run a test and decide whether they look 
> reasonable, which is error prone and makes testing at scale impossible.  And 
> changes to one part of Hive often end up changing the plan (and the output of 
> explain) thus breaking many tests that are not related.  This is particularly 
> an issue for people working on the optimizer.  
> * The lack of ability to run on a cluster means that when people test Hive at 
> scale, they are forced to develop custom frameworks which can't then benefit 
> the community.
> * There is no easy mechanism to bring user queries into the test suite.
> I propose we build a new testing capability with the following requirements:
> * One test should be able to run all reasonable permutations (mr/tez/spark, 
> orc/parquet/text/rcfile, secure/non-secure etc.)  This doesn't mean it would 
> run every permutation every time, but that the tester could choose which 
> permutation to run.
> * The same tests should run locally and on a cluster.  The tests should 
> support scaling of input data from Ks to Ts.
> * Expected results should be auto-generated whenever possible, and this 
> should work with the 

[jira] [Commented] (HIVE-12629) hive.auto.convert.join=true makes lateral view join sql failed on spark engine on yarn

2015-12-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050900#comment-15050900
 ] 

吴子美 commented on HIVE-12629:


I will give you a simple dataset to show the bug.

hive> desc logs1;
OK
iik int 
created_at  string  
Time taken: 0.145 seconds, Fetched: 2 row(s)


hive> select * from logs1;
OK
1   js
5   9wj
1   js
5   9wj
56  io
1   js
5   9wj
1   js
5   9wj
Time taken: 0.687 seconds, Fetched: 9 row(s)


hive> select  count(*) from 
> (select  iik from logs1 group by iik) a   join  
> (select iik  from logs1 LATERAL VIEW json_tuple(created_at,'ss') v1 as 
ss) b on a.iik=b.iik;
Query ID = root_20151210201756_e56dadee-69c9-4d4e-838d-98173eab25ec
Total jobs = 2
Launching Job 1 out of 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Spark Job = 031c6e38-d2d3-4b19-baa7-de1553cd7277
Status: Failed
FAILED: Execution Error, return code 3 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask

After I change hive.auto.convert.join from true to false. The result is OK.


> hive.auto.convert.join=true makes lateral view join sql failed on spark 
> engine on yarn
> --
>
> Key: HIVE-12629
> URL: https://issues.apache.org/jira/browse/HIVE-12629
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.2.1
>Reporter: 吴子美
>Assignee: Xuefu Zhang
>
> I am using hive1.2 on spark on yarn. 
> I found 
> select count(1) from 
> (select  user_id from xxx group by user_id ) a join
> (select  user_id from yyy lateral view json_tuple(u, 'h') v1 as h) b
> on a.user_id=b.user_id ;
> failed in hive on spark on yarn, but OK in hive on MR.
> I tried the following sql on spark. It was OK.
> select count(1) from 
> (select  user_id from xxx group by user_id ) a left join
> (select  user_id from yyy lateral view json_tuple(u, 'h') v1 as h) b
> on a.user_id=b.user_id ;
> When I turn hive.auto.convert.join from true to false. Everything goes OK.
> The error message in hive.log was :
> {code}
> 2015-12-09 21:10:17,190 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO log.PerfLogger: 
> 
> 2015-12-09 21:10:17,190 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO exec.Utilities: 
> Serializing ReduceWork via kryo
> 2015-12-09 21:10:17,214 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO log.PerfLogger: 
>  duration=24 from=org.apache.hadoop.hive.ql.exec.Utilities>
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO client.RemoteDriver: 
> Failed to run job 8fed1ca8-834f-497f-b189-eab343440a9f
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) - java.lang.IllegalStateException: Connection 
> already exists
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) -  at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlan.connect(SparkPlan.java:142)
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) -  at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:142)
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) -  at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:106)
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) -  at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:252)
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) -  at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:366)
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) -  at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(569)) -  at 
> java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 2015-12-09 21:10:17,262 INFO  [stderr-redir-1]: 

[jira] [Commented] (HIVE-12531) Implement fast-path for Year/Month UDFs for dates between 1999 and 2038

2015-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051222#comment-15051222
 ] 

Hive QA commented on HIVE-12531:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776692/HIVE-12531.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 9870 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarDataNucleusUnCaching
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6304/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6304/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6304/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776692 - PreCommit-HIVE-TRUNK-Build

> Implement fast-path for Year/Month UDFs for dates between 1999 and 2038
> ---
>
> Key: HIVE-12531
> URL: https://issues.apache.org/jira/browse/HIVE-12531
> Project: Hive
>  Issue Type: Improvement
>Reporter: Gopal V
>Assignee: Jason Dere
> Attachments: HIVE-12531.1.patch, HIVE-12531.2.patch, 
> HIVE-12531.3.patch
>
>
> Current codepath goes into the JDK Calendar implementation, which is very 
> slow for the simple cases in the current decade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12635) Hive should return the latest hbase cell timestamp as the row timestamp value

2015-12-10 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051409#comment-15051409
 ] 

Aihua Xu commented on HIVE-12635:
-

Those tests are not related.

> Hive should return the latest hbase cell timestamp as the row timestamp value
> -
>
> Key: HIVE-12635
> URL: https://issues.apache.org/jira/browse/HIVE-12635
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-12635.patch
>
>
> When hive talks to hbase and maps hbase timestamp field to one hive column,  
> seems hive returns the first cell timestamp instead of the latest one as the 
> timestamp value. 
> Makes sense to return the latest timestamp since adding the latest cell can 
> be  considered an update to the row. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12055) Create row-by-row shims for the write path

2015-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051416#comment-15051416
 ] 

Hive QA commented on HIVE-12055:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776654/HIVE-12055.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6307/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6307/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6307/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-6307/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 57f39a9 HIVE-12598 : LLAP: disable fileId when not supported 
(Sergey Shelukhin, reviewed by Prasanth Jayachandran, Lefty Leverenz)
+ git clean -f -d
+ git checkout master
Already on 'master'
+ git reset --hard origin/master
HEAD is now at 57f39a9 HIVE-12598 : LLAP: disable fileId when not supported 
(Sergey Shelukhin, reviewed by Prasanth Jayachandran, Lefty Leverenz)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776654 - PreCommit-HIVE-TRUNK-Build

> Create row-by-row shims for the write path 
> ---
>
> Key: HIVE-12055
> URL: https://issues.apache.org/jira/browse/HIVE-12055
> Project: Hive
>  Issue Type: Sub-task
>  Components: ORC, Shims
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-12055.patch, HIVE-12055.patch, HIVE-12055.patch, 
> HIVE-12055.patch, HIVE-12055.patch
>
>
> As part of removing the row-by-row writer, we'll need to shim out the higher 
> level API (OrcSerde and OrcOutputFormat) so that we maintain backwards 
> compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12635) Hive should return the latest hbase cell timestamp as the row timestamp value

2015-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051407#comment-15051407
 ] 

Hive QA commented on HIVE-12635:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776651/HIVE-12635.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 9887 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6305/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6305/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6305/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776651 - PreCommit-HIVE-TRUNK-Build

> Hive should return the latest hbase cell timestamp as the row timestamp value
> -
>
> Key: HIVE-12635
> URL: https://issues.apache.org/jira/browse/HIVE-12635
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 2.1.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-12635.patch
>
>
> When hive talks to hbase and maps hbase timestamp field to one hive column,  
> seems hive returns the first cell timestamp instead of the latest one as the 
> timestamp value. 
> Makes sense to return the latest timestamp since adding the latest cell can 
> be  considered an update to the row. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11890) Create ORC module

2015-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051410#comment-15051410
 ] 

Hive QA commented on HIVE-11890:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776650/HIVE-11890.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6306/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6306/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6306/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-6306/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 57f39a9 HIVE-12598 : LLAP: disable fileId when not supported 
(Sergey Shelukhin, reviewed by Prasanth Jayachandran, Lefty Leverenz)
+ git clean -f -d
+ git checkout master
Already on 'master'
+ git reset --hard origin/master
HEAD is now at 57f39a9 HIVE-12598 : LLAP: disable fileId when not supported 
(Sergey Shelukhin, reviewed by Prasanth Jayachandran, Lefty Leverenz)
+ git merge --ff-only origin/master
Already up-to-date.
+ git gc
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776650 - PreCommit-HIVE-TRUNK-Build

> Create ORC module
> -
>
> Key: HIVE-11890
> URL: https://issues.apache.org/jira/browse/HIVE-11890
> Project: Hive
>  Issue Type: Sub-task
>  Components: ORC
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-11890.patch, HIVE-11890.patch, HIVE-11890.patch, 
> HIVE-11890.patch, HIVE-11890.patch, HIVE-11890.patch, HIVE-11890.patch
>
>
> Start moving classes over to the ORC module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12375) ensure hive.compactor.check.interval cannot be set too low

2015-12-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12375:
--
Attachment: HIVE-12375.2.patch

[~alangates] updated patch which makes use of HIVE_IN_TEST

> ensure hive.compactor.check.interval cannot be set too low
> --
>
> Key: HIVE-12375
> URL: https://issues.apache.org/jira/browse/HIVE-12375
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-12375.2.patch, HIVE-12375.patch
>
>
> hive.compactor.check.interval can currently be set to as low as 0, which 
> makes Initiator spin needlessly feeling up logs, etc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin

2015-12-10 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052100#comment-15052100
 ] 

Xuefu Zhang commented on HIVE-12616:


Thanks for the explanation. I guess the problem is that user didn't set 
spark.master explicitly, Hive's default, yarn-cluster, is set only for the 
HiveConf of the first operation. 

I think we should set "spark.master" in session level HiveConf. It seems we 
just need to add one line doing that in the if block below:
{code}
// load properties from hive configurations, including both spark.* 
properties,
// properties for remote driver RPC, and yarn properties for Spark on YARN 
mode.
String sparkMaster = hiveConf.get("spark.master");
if (sparkMaster == null) {
  sparkMaster = sparkConf.get("spark.master");
  hiveConf.set("spark.master", sparkMaster);
}
{code}

> NullPointerException when spark session is reused to run a mapjoin
> --
>
> Key: HIVE-12616
> URL: https://issues.apache.org/jira/browse/HIVE-12616
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.3.0
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-12616.patch
>
>
> The way to reproduce:
> {noformat}
> set hive.execution.engine=spark;
> create table if not exists test(id int);
> create table if not exists test1(id int);
> insert into test values(1);
> insert into test1 values(1);
> select max(a.id) from test a ,test1 b
> where a.id = b.id;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-12-10 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051993#comment-15051993
 ] 

Xiaowei Wang commented on HIVE-12541:
-


I am talking about  the regex in the symbolic path .I found, if a path of the 
symbolic file  contains a regex ,
select is supported in default. So I  mistakenly  think that  
symlinktextinputformat support regex . 


And , I think it shoule be supported with the regex in the path . 

Thank you for your attention.

> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get wrong result :0 
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12367) Lock/unlock database should add current database to inputs and outputs of authz hook

2015-12-10 Thread Dapeng Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated HIVE-12367:
--
Attachment: HIVE-12367.004.patch

> Lock/unlock database should add current database to inputs and outputs of 
> authz hook
> 
>
> Key: HIVE-12367
> URL: https://issues.apache.org/jira/browse/HIVE-12367
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.1
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Attachments: HIVE-12367.001.patch, HIVE-12367.002.patch, 
> HIVE-12367.003.patch, HIVE-12367.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12628) Eliminate flakiness in TestMetrics

2015-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052004#comment-15052004
 ] 

Hive QA commented on HIVE-12628:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776670/HIVE-12628.patch

{color:green}SUCCESS:{color} +1 due to 7 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 9891 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6311/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6311/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6311/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776670 - PreCommit-HIVE-TRUNK-Build

> Eliminate flakiness in TestMetrics
> --
>
> Key: HIVE-12628
> URL: https://issues.apache.org/jira/browse/HIVE-12628
> Project: Hive
>  Issue Type: Test
>  Components: Test
>Affects Versions: 2.1.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-12628.patch
>
>
> TestMetrics relies on timing of json file dumps.  Rewrite these tests to 
> eliminate flakiness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12367) Lock/unlock database should add current database to inputs and outputs of authz hook

2015-12-10 Thread Dapeng Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052093#comment-15052093
 ] 

Dapeng Sun commented on HIVE-12367:
---

Update patch with master.

> Lock/unlock database should add current database to inputs and outputs of 
> authz hook
> 
>
> Key: HIVE-12367
> URL: https://issues.apache.org/jira/browse/HIVE-12367
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.1
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
> Attachments: HIVE-12367.001.patch, HIVE-12367.002.patch, 
> HIVE-12367.003.patch, HIVE-12367.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-12-10 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052092#comment-15052092
 ] 

Chaoyu Tang commented on HIVE-12541:


So basically the regex rules used for symbolic path are same as those 
documented in FileSystem.globStatus, could you add more test cases with 
symbolic paths having different regex, and even at different path levels? your 
case is ../data/files/T* which means the files starting with 0 or more Ts, 
right? 

> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get wrong result :0 
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12596) Delete timestamp row throws java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]

2015-12-10 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052107#comment-15052107
 ] 

Prasanth Jayachandran commented on HIVE-12596:
--

MiniTezCliDriver failed due to initialization issue. I ran these tests in my 
local machine and all of them ran without any issues. All other 11 test 
failures are happening in trunk already. 


> Delete timestamp row throws java.lang.IllegalArgumentException: Timestamp 
> format must be -mm-dd hh:mm:ss[.f]
> 
>
> Key: HIVE-12596
> URL: https://issues.apache.org/jira/browse/HIVE-12596
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Takahiko Saito
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-12596.1.patch
>
>
> Run the below:
> {noformat}
> create table test_acid( i int, ts timestamp)
>   clustered by (i) into 2 buckets
>   stored as orc
>   tblproperties ('transactional'='true');
> insert into table test_acid values (1, '2014-09-14 12:34:30');
> delete from test_acid where ts = '2014-15-16 17:18:19.20';
> {noformat}
> The below error is thrown:
> {noformat}
> 15/12/04 19:55:49 INFO SessionState: Map 1: -/-   Reducer 2: 0/2
> Status: Failed
> 15/12/04 19:55:49 ERROR SessionState: Status: Failed
> Vertex failed, vertexName=Map 1, vertexId=vertex_1447960616881_0022_2_00, 
> diagnostics=[Vertex vertex_1447960616881_0022_2_00 [Map 1] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: test_acid initializer failed, 
> vertex=vertex_1447960616881_0022_2_00 [Map 1], 
> java.lang.IllegalArgumentException: Timestamp format must be -mm-dd 
> hh:mm:ss[.f]
>   at java.sql.Timestamp.valueOf(Timestamp.java:237)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.boxLiteral(ConvertAstToSearchArg.java:160)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.findLiteral(ConvertAstToSearchArg.java:191)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.createLeaf(ConvertAstToSearchArg.java:268)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.createLeaf(ConvertAstToSearchArg.java:326)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.parse(ConvertAstToSearchArg.java:377)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.(ConvertAstToSearchArg.java:68)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.create(ConvertAstToSearchArg.java:417)
>   at 
> org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.createFromConf(ConvertAstToSearchArg.java:436)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.(OrcInputFormat.java:484)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1121)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1207)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:369)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:481)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:160)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:246)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:240)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:240)
>   at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:227)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Not sure if this change is intended as the issue is not seen with ver. 1.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12650) Increase default value of hive.spark.client.server.connect.timeout to exceeds spark.yarn.am.waitTime

2015-12-10 Thread JoneZhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JoneZhang updated HIVE-12650:
-
Affects Version/s: 1.1.1
   1.2.1

> Increase default value of hive.spark.client.server.connect.timeout to exceeds 
> spark.yarn.am.waitTime
> 
>
> Key: HIVE-12650
> URL: https://issues.apache.org/jira/browse/HIVE-12650
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Xuefu Zhang
>
> I think hive.spark.client.server.connect.timeout should be set greater than 
> spark.yarn.am.waitTime. The default value for 
> spark.yarn.am.waitTime is 100s, and the default value for 
> hive.spark.client.server.connect.timeout is 90s, which is not good. We can 
> increase it to a larger value such as 120s.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-12-10 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052110#comment-15052110
 ] 

Xiaowei Wang commented on HIVE-12541:
-

../data/files/T*means the files starting with  T   

> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get wrong result :0 
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12649) Hive on Spark will resubmitted application when not enough resouces to launch yarn application master

2015-12-10 Thread JoneZhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JoneZhang updated HIVE-12649:
-
Description: 
Hive on spark will estimate reducer number when the query is not set reduce 
number,which cause a application submit.The application will pending if the 
yarn queue's resources is insufficient.
So there are more than one pending applications probably because 
there are more than one estimate call.The failure is soft, so it doesn't 
prevent subsequent processings. We can make that a hard failure

That code is found in 
at 
org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:112)
at 
org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:115)


  was:
Hive on spark will estimate reducer number when the query is not set reduce 
number,which cause a application submit.The application will pending if the 
yarn queue's resources is insufficient.
So there are more than one pending applications probably because 
there are more than one estimate call.The failure is soft, so it doesn't 
prevent subsequent processings. We can make that a hard failure

That code is found in 
728237 at 
org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:112)
 728238 at 
org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:115)



> Hive on Spark will resubmitted application when not enough resouces to launch 
> yarn application master
> -
>
> Key: HIVE-12649
> URL: https://issues.apache.org/jira/browse/HIVE-12649
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Xuefu Zhang
>
> Hive on spark will estimate reducer number when the query is not set reduce 
> number,which cause a application submit.The application will pending if the 
> yarn queue's resources is insufficient.
> So there are more than one pending applications probably because 
> there are more than one estimate call.The failure is soft, so it doesn't 
> prevent subsequent processings. We can make that a hard failure
> That code is found in 
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:112)
> at 
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:115)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12355) Keep Obj Inspectors in Sync with RowSchema

2015-12-10 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052301#comment-15052301
 ] 

Laljo John Pullokkaran commented on HIVE-12355:
---

TS keeps pruned col list; 
  private List neededColumnIDs;
  private List neededColumns;

This is what TS really outputs.
 There is two parts to this:
1. RS should match needColumns (I have a patch for this as part of union fix)
2. Output Obj Ins should match neededColumns

This bug is for #2.

> Keep Obj Inspectors in Sync with RowSchema
> --
>
> Key: HIVE-12355
> URL: https://issues.apache.org/jira/browse/HIVE-12355
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.0, 1.1.0, 1.2.1
>Reporter: Laljo John Pullokkaran
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12355.1.patch
>
>
> Currently Not all operators match their Output Obj inspectors to Row schema.
> Many times OutputObjectInspectors may be more than needed.
> This causes problems especially with union.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11603) IndexOutOfBoundsException thrown when accessing a union all subquery and filtering on a column which does not exist in all underlying tables

2015-12-10 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052306#comment-15052306
 ] 

Laljo John Pullokkaran commented on HIVE-11603:
---

There is an issue on master too; but is kind of doesn't get exposed due to some 
optimizations.

> IndexOutOfBoundsException thrown when accessing a union all subquery and 
> filtering on a column which does not exist in all underlying tables
> 
>
> Key: HIVE-11603
> URL: https://issues.apache.org/jira/browse/HIVE-11603
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.3.0, 1.2.1
> Environment: Hadoop 2.6
>Reporter: Nicholas Brenwald
>Assignee: Laljo John Pullokkaran
>Priority: Minor
> Attachments: HIVE-11603.1.patch, HIVE-11603.2.patch
>
>
> Create two empty tables t1 and t2
> {code}
> CREATE TABLE t1(c1 STRING);
> CREATE TABLE t2(c1 STRING, c2 INT);
> {code}
> Create a view on these two tables
> {code}
> CREATE VIEW v1 AS 
> SELECT c1, c2 
> FROM (
> SELECT c1, CAST(NULL AS INT) AS c2 FROM t1
> UNION ALL
> SELECT c1, c2 FROM t2
> ) x;
> {code}
> Then run
> {code}
> SELECT COUNT(*) from v1 
> WHERE c2 = 0;
> {code}
> We expect to get a result of zero, but instead the query fails with stack 
> trace:
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:119)
>   ... 22 more
> {code}
> Workarounds include disabling ppd,
> {code}
> set hive.optimize.ppd=false;
> {code}
> Or changing the view so that column c2 is null cast to double:
> {code}
> CREATE VIEW v1_workaround AS 
> SELECT c1, c2 
> FROM (
> SELECT c1, CAST(NULL AS DOUBLE) AS c2 FROM t1
> UNION ALL
> SELECT c1, c2 FROM t2
> ) x;
> {code}
> The problem seems to occur in branch-1.1, branch-1.2, branch-1 but seems to 
> be resolved in master (2.0.0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12615) Do not start spark session when only explain

2015-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052319#comment-15052319
 ] 

Hive QA commented on HIVE-12615:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776700/HIVE-12615.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 293 failed/errored test(s), 9894 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap_auto
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18_multi_distinct
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join21
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join23
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join26
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join27
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join28
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join31
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join6
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_stats
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_6
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_tez1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_tez2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_column_access_stats
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_count
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cross_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ctas
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_dynamic_rdd_cache
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_escape_clusterby1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_escape_distributeby1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_escape_sortby1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby1_map
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby1_map_nomap
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby1_map_skew
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby1_noskew

[jira] [Updated] (HIVE-12652) SymbolicTextInputFormat should supports the path with regex ,especially using CombineHiveInputFormat .Add test sql .

2015-12-10 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12652:

Attachment: HIVE-12652.0.patch

> SymbolicTextInputFormat should supports the  path with regex  ,especially 
> using CombineHiveInputFormat .Add test sql .
> --
>
> Key: HIVE-12652
> URL: https://issues.apache.org/jira/browse/HIVE-12652
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12652.0.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex  .I add some  
> test sql . 
> 2, But ,when using  CombineHiveInputFormat  to merge small file  , It cannot 
> resolve the path with regex ,so it will get a wrong result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-12-10 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052327#comment-15052327
 ] 

Xiaowei Wang commented on HIVE-12541:
-

I  add some test cases in another jira  
https://issues.apache.org/jira/browse/HIVE-12652 .  

> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get wrong result :0 
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-12-10 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052326#comment-15052326
 ] 

Xiaowei Wang commented on HIVE-12541:
-

[~aihuaxu] [~ctang.ma] [~ychena] 

May be it is better to talk about in another jira .  
https://issues.apache.org/jira/browse/HIVE-12652  . 



> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get wrong result :0 
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12632) LLAP: don't use IO elevator for ACID tables

2015-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052332#comment-15052332
 ] 

Hive QA commented on HIVE-12632:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776706/HIVE-12632.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6315/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6315/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6315/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
udf-classloader-udf2 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-github-source-source/itests/custom-udfs/udf-classloader-udf2/src/test/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ udf-classloader-udf2 
---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/itests/custom-udfs/udf-classloader-udf2/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/itests/custom-udfs/udf-classloader-udf2/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-github-source-source/itests/custom-udfs/udf-classloader-udf2/target/tmp/conf
 [copy] Copying 14 files to 
/data/hive-ptest/working/apache-github-source-source/itests/custom-udfs/udf-classloader-udf2/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
udf-classloader-udf2 ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ 
udf-classloader-udf2 ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ udf-classloader-udf2 ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-github-source-source/itests/custom-udfs/udf-classloader-udf2/target/udf-classloader-udf2-2.1.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
udf-classloader-udf2 ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ 
udf-classloader-udf2 ---
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/itests/custom-udfs/udf-classloader-udf2/target/udf-classloader-udf2-2.1.0-SNAPSHOT.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-it-custom-udfs/udf-classloader-udf2/2.1.0-SNAPSHOT/udf-classloader-udf2-2.1.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-github-source-source/itests/custom-udfs/udf-classloader-udf2/pom.xml
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-it-custom-udfs/udf-classloader-udf2/2.1.0-SNAPSHOT/udf-classloader-udf2-2.1.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive Integration - HCatalog Unit Tests 2.1.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-hcatalog-it-unit 
---
[INFO] Deleting 
/data/hive-ptest/working/apache-github-source-source/itests/hcatalog-unit/target
[INFO] Deleting 
/data/hive-ptest/working/apache-github-source-source/itests/hcatalog-unit 
(includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ 
hive-hcatalog-it-unit ---
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (download-spark) @ hive-hcatalog-it-unit 
---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ 
hive-hcatalog-it-unit ---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
hive-hcatalog-it-unit ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-github-source-source/itests/hcatalog-unit/src/main/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ 
hive-hcatalog-it-unit ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ 
hive-hcatalog-it-unit ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
hive-hcatalog-it-unit ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 

[jira] [Updated] (HIVE-12448) Change to tracking of dag status via dagIdentifier instead of dag name

2015-12-10 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-12448:
--
Attachment: HIVE-12448.3.txt

Updated patch with rb comments addressed.

> Change to tracking of dag status via dagIdentifier instead of dag name
> --
>
> Key: HIVE-12448
> URL: https://issues.apache.org/jira/browse/HIVE-12448
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Affects Versions: 2.0.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-12448.1.txt, HIVE-12448.2.txt, HIVE-12448.3.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11890) Create ORC module

2015-12-10 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-11890:
-
Attachment: HIVE-11890.patch

Rebased to trunk.

> Create ORC module
> -
>
> Key: HIVE-11890
> URL: https://issues.apache.org/jira/browse/HIVE-11890
> Project: Hive
>  Issue Type: Sub-task
>  Components: ORC
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-11890.patch, HIVE-11890.patch, HIVE-11890.patch, 
> HIVE-11890.patch, HIVE-11890.patch, HIVE-11890.patch, HIVE-11890.patch, 
> HIVE-11890.patch
>
>
> Start moving classes over to the ORC module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11110) Reorder applyPreJoinOrderingTransforms, add NotNULL/FilterMerge rules, improve Filter selectivity estimation

2015-12-10 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-0?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-0:
--
Attachment: HIVE-0.34.patch

> Reorder applyPreJoinOrderingTransforms, add NotNULL/FilterMerge rules, 
> improve Filter selectivity estimation
> 
>
> Key: HIVE-0
> URL: https://issues.apache.org/jira/browse/HIVE-0
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-0-10.patch, HIVE-0-11.patch, 
> HIVE-0-12.patch, HIVE-0-branch-1.2.patch, HIVE-0.1.patch, 
> HIVE-0.13.patch, HIVE-0.14.patch, HIVE-0.15.patch, 
> HIVE-0.16.patch, HIVE-0.17.patch, HIVE-0.18.patch, 
> HIVE-0.19.patch, HIVE-0.2.patch, HIVE-0.20.patch, 
> HIVE-0.21.patch, HIVE-0.22.patch, HIVE-0.23.patch, 
> HIVE-0.24.patch, HIVE-0.25.patch, HIVE-0.26.patch, HIVE-0.27, 
> HIVE-0.27.patch, HIVE-0.28.patch, HIVE-0.29.patch, 
> HIVE-0.30.patch, HIVE-0.31.patch, HIVE-0.32.patch, 
> HIVE-0.33.patch, HIVE-0.34.patch, HIVE-0.4.patch, 
> HIVE-0.5.patch, HIVE-0.6.patch, HIVE-0.7.patch, 
> HIVE-0.8.patch, HIVE-0.9.patch, HIVE-0.91.patch, 
> HIVE-0.92.patch, HIVE-0.patch
>
>
> Query
> {code}
> select  count(*)
>  from store_sales
>  ,store_returns
>  ,date_dim d1
>  ,date_dim d2
>  where d1.d_quarter_name = '2000Q1'
>and d1.d_date_sk = ss_sold_date_sk
>and ss_customer_sk = sr_customer_sk
>and ss_item_sk = sr_item_sk
>and ss_ticket_number = sr_ticket_number
>and sr_returned_date_sk = d2.d_date_sk
>and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3’);
> {code}
> The store_sales table is partitioned on ss_sold_date_sk, which is also used 
> in a join clause. The join clause should add a filter “filterExpr: 
> ss_sold_date_sk is not null”, which should get pushed the MetaStore when 
> fetching the stats. Currently this is not done in CBO planning, which results 
> in the stats from __HIVE_DEFAULT_PARTITION__ to be fetched and considered in 
> the optimization phase. In particular, this increases the NDV for the join 
> columns and may result in wrong planning.
> Including HiveJoinAddNotNullRule in the optimization phase solves this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12422) LLAP: add security to Web UI endpoint

2015-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052178#comment-15052178
 ] 

Hive QA commented on HIVE-12422:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776677/HIVE-12422.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 9878 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6312/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6312/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6312/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776677 - PreCommit-HIVE-TRUNK-Build

> LLAP: add security to Web UI endpoint
> -
>
> Key: HIVE-12422
> URL: https://issues.apache.org/jira/browse/HIVE-12422
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12422.01.patch, HIVE-12422.02.patch, 
> HIVE-12422.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8376) Umbrella Jira for HiveServer2 dynamic service discovery

2015-12-10 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052221#comment-15052221
 ] 

Amareshwari Sriramadasu commented on HIVE-8376:
---

[~vgumashta], I see all sub tasks of this are resolved. Is anything pending for 
this jira? Can this be marked resolved? If so, can you resolve it with proper 
fix version?

> Umbrella Jira for HiveServer2 dynamic service discovery
> ---
>
> Key: HIVE-8376
> URL: https://issues.apache.org/jira/browse/HIVE-8376
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2, JDBC
>Affects Versions: 0.14.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>  Labels: TODOC14
> Attachments: HiveServer2DynamicServiceDiscovery.pdf
>
>
> Creating an ☂ Jira for documentation purpose. I'll add a detailed doc for the 
> implementation & usage here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12648) LLAP IO was disabled in CliDriver by accident (and tests are broken)

2015-12-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12648:

Attachment: HIVE-12648.01.patch

There was a really stupid (and hard to find) error in recent fileId patch. This 
is why having tests is important... 01 both enables and fixes it. [~prasanth_j] 
can you take a look?

> LLAP IO was disabled in CliDriver by accident (and tests are broken)
> 
>
> Key: HIVE-12648
> URL: https://issues.apache.org/jira/browse/HIVE-12648
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12648.01.patch, HIVE-12648.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12648) LLAP IO was disabled in CliDriver by accident (and tests are broken)

2015-12-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12648:

Priority: Blocker  (was: Major)

> LLAP IO was disabled in CliDriver by accident (and tests are broken)
> 
>
> Key: HIVE-12648
> URL: https://issues.apache.org/jira/browse/HIVE-12648
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Blocker
> Attachments: HIVE-12648.01.patch, HIVE-12648.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12648) LLAP IO was disabled in CliDriver by accident (and tests are broken)

2015-12-10 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052150#comment-15052150
 ] 

Prasanth Jayachandran commented on HIVE-12648:
--

Why is the version string "0.20" while hadoop-1 is removed? Are there unit 
tests for testing cache? If so are there unit tests for fileId, non-fileId and 
synthetic fileId cases?

> LLAP IO was disabled in CliDriver by accident (and tests are broken)
> 
>
> Key: HIVE-12648
> URL: https://issues.apache.org/jira/browse/HIVE-12648
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Blocker
> Attachments: HIVE-12648.01.patch, HIVE-12648.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12648) LLAP IO was disabled in CliDriver by accident (and tests are broken)

2015-12-10 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052151#comment-15052151
 ] 

Prasanth Jayachandran commented on HIVE-12648:
--

Other changes, looks good to me.

> LLAP IO was disabled in CliDriver by accident (and tests are broken)
> 
>
> Key: HIVE-12648
> URL: https://issues.apache.org/jira/browse/HIVE-12648
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Blocker
> Attachments: HIVE-12648.01.patch, HIVE-12648.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type

2015-12-10 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-12502:
---
Attachment: HIVE-12502.1.patch

> to_date UDF cannot accept NULLs of VOID type
> 
>
> Key: HIVE-12502
> URL: https://issues.apache.org/jira/browse/HIVE-12502
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.0.0
>Reporter: Aaron Tokhy
>Assignee: Jason Dere
>Priority: Trivial
> Attachments: HIVE-12502.1.patch
>
>
> The to_date method behaves differently based off the 'data type' of null 
> passed in.
> hive> select to_date(null);   
> FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': 
> TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID
> hive> select to_date(cast(null as timestamp));
> OK
> NULL
> Time taken: 0.031 seconds, Fetched: 1 row(s)
> This appears to be a regression introduced in HIVE-5731.  The previous 
> version of to_date would not check the type:
> https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type

2015-12-10 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052191#comment-15052191
 ] 

Aaron Tokhy commented on HIVE-12502:


I've submitted a patch with an associated unit test.

> to_date UDF cannot accept NULLs of VOID type
> 
>
> Key: HIVE-12502
> URL: https://issues.apache.org/jira/browse/HIVE-12502
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.0.0
>Reporter: Aaron Tokhy
>Assignee: Jason Dere
>Priority: Trivial
> Attachments: HIVE-12502.1.patch
>
>
> The to_date method behaves differently based off the 'data type' of null 
> passed in.
> hive> select to_date(null);   
> FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': 
> TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID
> hive> select to_date(cast(null as timestamp));
> OK
> NULL
> Time taken: 0.031 seconds, Fetched: 1 row(s)
> This appears to be a regression introduced in HIVE-5731.  The previous 
> version of to_date would not check the type:
> https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12528) don't start HS2 Tez sessions in a single thread

2015-12-10 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052192#comment-15052192
 ] 

Siddharth Seth commented on HIVE-12528:
---

{code}
+if (threadCount <= 1) {
+  for (int i = 0; i < blockingQueueLength; i++) {
+// The queue is FIFO, so if we cycle thru length items, we'd start 
each session once.
+startNextSessionFromQueue();
+  }
{code}
Replace with threadCount == 1 ?, and a precondition for the thredCount to not 
be below 1 ?

The threads end up throwing a RuntimeException in case of an Error. This would 
otherwise have been caught by the HiveServer2 static initialization block. Is 
it now relying upon the default uncaught exception handler ? Would be better to 
propagate the exception upwards as it's done today (and maybe a 
CompletionService / ListeningExecutor with Callbacks)

{code}
/*
 *  with this the ordering of sessions in the queue will be (with 2 
sessions 3 queues)
 *  s1q1, s1q2, s1q3, s2q1, s2q2, s2q3 there by ensuring uniform 
distribution of
 *  the sessions across queues at least to begin with. Then as sessions get 
freed up, the list
 *  may change this ordering.
 */
{code}
This statement no longer stands. Given that it's only for the the first set of 
jobs anyway - I don't think this is a problem. cc [~vikram.dixit]


I'm not sure how the thread safety - specifically visibility aspects are 
handled (both before and after the patch). SessionStates, TezClient instances 
etc are created in a single thread (now multiple threads), and then used in 
completely different threads. What is guaranteeing correct visibility ?


> don't start HS2 Tez sessions in a single thread
> ---
>
> Key: HIVE-12528
> URL: https://issues.apache.org/jira/browse/HIVE-12528
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12528.patch
>
>
> Starting sessions in parallel would improve the startup time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type

2015-12-10 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-12502:
---
Attachment: HIVE-12502-branch-1.patch
HIVE-12502.patch

> to_date UDF cannot accept NULLs of VOID type
> 
>
> Key: HIVE-12502
> URL: https://issues.apache.org/jira/browse/HIVE-12502
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.0.0
>Reporter: Aaron Tokhy
>Assignee: Jason Dere
>Priority: Trivial
> Attachments: HIVE-12502-branch-1.patch, HIVE-12502.patch
>
>
> The to_date method behaves differently based off the 'data type' of null 
> passed in.
> hive> select to_date(null);   
> FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': 
> TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID
> hive> select to_date(cast(null as timestamp));
> OK
> NULL
> Time taken: 0.031 seconds, Fetched: 1 row(s)
> This appears to be a regression introduced in HIVE-5731.  The previous 
> version of to_date would not check the type:
> https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type

2015-12-10 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-12502:
---
Attachment: (was: HIVE-12502.1.patch)

> to_date UDF cannot accept NULLs of VOID type
> 
>
> Key: HIVE-12502
> URL: https://issues.apache.org/jira/browse/HIVE-12502
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.0.0
>Reporter: Aaron Tokhy
>Assignee: Jason Dere
>Priority: Trivial
>
> The to_date method behaves differently based off the 'data type' of null 
> passed in.
> hive> select to_date(null);   
> FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': 
> TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID
> hive> select to_date(cast(null as timestamp));
> OK
> NULL
> Time taken: 0.031 seconds, Fetched: 1 row(s)
> This appears to be a regression introduced in HIVE-5731.  The previous 
> version of to_date would not check the type:
> https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12643) For self describing InputFormat don't replicate schema information in partitions

2015-12-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052200#comment-15052200
 ] 

Hive QA commented on HIVE-12643:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12776691/HIVE-12643.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6313/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6313/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6313/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: 
org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult 
[localFile=/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-6313/succeeded/TestCliDriver-cp_mj_rc.q-udf_stddev_pop.q-mapreduce2.q-and-12-more,
 remoteFile=/home/hiveptest/50.16.94.163-hiveptest-2/logs/, getExitCode()=12, 
getException()=null, getUser()=hiveptest, getHost()=50.16.94.163, 
getInstance()=2]: 'ssh_exchange_identification: Connection closed by remote host
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) 
[receiver=3.0.6]
ssh_exchange_identification: Connection closed by remote host
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) 
[receiver=3.0.6]
ssh_exchange_identification: Connection closed by remote host
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) 
[receiver=3.0.6]
ssh_exchange_identification: Connection closed by remote host
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) 
[receiver=3.0.6]
ssh_exchange_identification: Connection closed by remote host
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) 
[receiver=3.0.6]
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12776691 - PreCommit-HIVE-TRUNK-Build

> For self describing InputFormat don't replicate schema information in 
> partitions
> 
>
> Key: HIVE-12643
> URL: https://issues.apache.org/jira/browse/HIVE-12643
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12643.patch
>
>
> Since self describing Input Formats don't use individual partition schemas 
> for schema resolution, there is no need to send that info to tasks.
> Doing this should cut down plan size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >