[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051359#comment-15051359 ] Jason Dere commented on HIVE-11878: --- Test failiures are not related > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878 ClassLoader Issues when Registering > Jars.pptx, HIVE-11878.2.patch, HIVE-11878.3.patch, HIVE-11878.4.patch, > HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, > HIVE-11878_approach3_with_review_comments.patch, > HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051361#comment-15051361 ] Ashutosh Chauhan commented on HIVE-11878: - Since this has been seen at other sites also, it will be good to land this in 2.0 branch as well. > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878 ClassLoader Issues when Registering > Jars.pptx, HIVE-11878.2.patch, HIVE-11878.3.patch, HIVE-11878.4.patch, > HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, > HIVE-11878_approach3_with_review_comments.patch, > HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*, and it has the path to the class > *c1*, but note that Class-loaders work by delegating to parent class-loader > first. In this case class *c1* will be found and *defined* by class-loader > *u1*. > Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say > initialize) is called in *c1*, which references the class *c2*, *c2* will not > be found since the class-loader used to search for *c2* will be *u1* (Since > the caller's class-loader is used to load a class) > I've added a qtest to explain the problem. Please see the attached patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12531) Implement fast-path for Year/Month UDFs for dates between 1999 and 2038
[ https://issues.apache.org/jira/browse/HIVE-12531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051442#comment-15051442 ] Jason Dere commented on HIVE-12531: --- Test failures do not appear to be related > Implement fast-path for Year/Month UDFs for dates between 1999 and 2038 > --- > > Key: HIVE-12531 > URL: https://issues.apache.org/jira/browse/HIVE-12531 > Project: Hive > Issue Type: Improvement >Reporter: Gopal V >Assignee: Jason Dere > Attachments: HIVE-12531.1.patch, HIVE-12531.2.patch, > HIVE-12531.3.patch > > > Current codepath goes into the JDK Calendar implementation, which is very > slow for the simple cases in the current decade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-12572) select partitioned acid table order by throws java.io.FileNotFoundException
[ https://issues.apache.org/jira/browse/HIVE-12572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-12572. - Resolution: Duplicate Sorry, didn't realize there was already a JIRA. The other one has a patch. > select partitioned acid table order by throws java.io.FileNotFoundException > --- > > Key: HIVE-12572 > URL: https://issues.apache.org/jira/browse/HIVE-12572 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.0.0 >Reporter: Takahiko Saito >Assignee: Alan Gates >Priority: Critical > > Run the below queries: > {noformat} > create table test_acid (a int) partitioned by (b int) clustered by (a) into 2 > buckets stored as orc tblproperties ('transactional'='true'); > insert into table test_acid partition (b=1) values (1), (2), (3), (4); > select * from acid_partitioned order by a; > {noformat} > The above fails with the following error: > {noformat} > 15/12/02 21:12:30 INFO SessionState: Map 1: 0(+0,-4)/1Reducer 2: 0/1 > Status: Failed > 15/12/02 21:12:30 ERROR SessionState: Status: Failed > Vertex failed, vertexName=Map 1, vertexId=vertex_1449077191499_0023_1_00, > diagnostics=[Task failed, taskId=task_1449077191499_0023_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task: > attempt_1449077191499_0023_1_00_00_0:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.FileNotFoundException: Path is not a file: > /apps/hive/warehouse/test_acid/b=1 > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75) > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:652) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:195) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:160) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:348) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: java.io.FileNotFoundException: Path is not a file: > /apps/hive/warehouse/test_acid/b=1 > at > org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75) > at >
[jira] [Updated] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11531: --- Fix Version/s: (was: 2.0.0) 2.1.0 > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Fix For: 2.1.0 > > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, > HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051527#comment-15051527 ] Sergey Shelukhin commented on HIVE-11531: - Makes sense. branch-2.0 doesn't require approval currently. I will send an update tomorrow morning-ish on the release. > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Fix For: 2.1.0 > > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, > HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11866) Add framework to enable testing using LDAPServer using LDAP protocol
[ https://issues.apache.org/jira/browse/HIVE-11866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051471#comment-15051471 ] Naveen Gangam commented on HIVE-11866: -- [~thejas] I am seeing no response on the LEGAL jira except a push from Apache DS team to adopt ApacheDS for testing. I have sought some internal legal advice on this matter. Here is what I got back "When you say something "is not compatible with Apache 2" are you referring to a compatibility requirement imposed by the Apache Hive project particularly? Or are you just saying that the ENTIRE distribution, including all its dependencies, can no longer be released under Apache 2? In either case, the UnboundIP license (https://docs.ldap.com/ldap-sdk/docs/LICENSE-UnboundID-LDAPSDK.txt) does include a clause allowing UnboundIP to unilaterally terminate the license at any time. Notwithstanding anything contained in this Agreement to the contrary, UnboundID may also, in its sole discretion, terminate or suspend access to the SDK to You or any end user at any time. Naturally, that makes the SDK's inclusion in a project, particularly as a foundational tool, rather risky. This may be why the jira considers the license "not compatible". But from a cursory review I understand that UnboundID alternatively provides the SDK under the GPL or LGPL licenses (https://docs.ldap.com/ldap-sdk/docs/index.html) (https://docs.ldap.com/ldap-sdk/docs/LICENSE.txt). If no one is modifying the SDK, but simply linking to it at arms-length as an independent file in its original form, I'm not sure why anyone would object to including it in Apache Hive as an LGPL dependency (Naveen seems to suggest below that he'll be using it at arms-length). If Hive requires that ALL dependencies be Apache licensed, then this wouldn't be acceptable. But taking a copy of the SDK under the LGPL license shouldn't prevent you from releasing the rest of Hive under the Apache license. Furthermore, Naveen seems to indicate that the SDK will only appear internally on the Apache Hive machines (e.g., it won't be redistributed with the package provided to customers). If that's the case, then use of an internal copy of the SDK solely on the Hive machines would be particularly benign. See, e.g., (https://docs.ldap.com/ldap-sdk/docs/ldapsdk-faq.html#internal) There are no restrictions on the use of the LDAP SDK in an application that is for internal use only and will not be redistributed outside of your organization. (Still, if developers [inside and outside the Hive group] will be regularly downloading and uploading test framework builds that include the SDK from the Hive group, you might make the LGPL election explicit as a precaution) So you may just have Naveen confirm that an LGPL 2.1 license would be acceptable to the Hive group and that the SDK copy will remain within Hive (e.g., the test framework is not distributed, except when developers push / pull the build from a Hive machine). [I'm not sure which edition (standard, minimal, or commercial) of the SDK Naveen is contemplating using, but all three seem to have the same licensing options for his purpose (https://www.ldap.com/unboundid-ldap-sdk-for-java).] " Given that our usage is "at arms length" and the binaries wouldnt be packaged in the end product, do you think we have enough to re-insert this fix? Thanks > Add framework to enable testing using LDAPServer using LDAP protocol > > > Key: HIVE-11866 > URL: https://issues.apache.org/jira/browse/HIVE-11866 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.3.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam > Attachments: HIVE-11866.2.patch, HIVE-11866.patch > > > Currently there is no unit test coverage for HS2's LDAP Atn provider using a > LDAP Server on the backend. This prevents testing of the LDAPAtnProvider with > some realistic usecases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051523#comment-15051523 ] Jesus Camacho Rodriguez commented on HIVE-11531: [~sershe], I want to backport this patch to 2.0.0 once HIVE-12644 goes in. > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Fix For: 2.1.0 > > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, > HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9544) Error dropping fully qualified partitioned table - Internal error processing get_partition_names
[ https://issues.apache.org/jira/browse/HIVE-9544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051457#comment-15051457 ] Ryan P commented on HIVE-9544: -- This looks pretty similar to HIVE-10421. Are the offending tables partitioned? > Error dropping fully qualified partitioned table - Internal error processing > get_partition_names > > > Key: HIVE-9544 > URL: https://issues.apache.org/jira/browse/HIVE-9544 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0 > Environment: HDP 2.2 >Reporter: Hari Sekhon >Assignee: Chaoyu Tang >Priority: Minor > > When attempting to drop a partitioned table using a fully qualified name I > get this error: > {code} > hive -e 'drop table myDB.my_table_name;' > Logging initialized using configuration in > file:/etc/hive/conf/hive-log4j.properties > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask. > org.apache.thrift.TApplicationException: Internal error processing > get_partition_names > {code} > It succeeds if I instead do: > {code}hive -e 'use myDB; drop table my_table_name;'{code} > Regards, > Hari Sekhon > http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12645) ConstantPropagateProcCtx.resolve() should verify internal names first instead of alias to match 2 columns from different row schemas
[ https://issues.apache.org/jira/browse/HIVE-12645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-12645: - Summary: ConstantPropagateProcCtx.resolve() should verify internal names first instead of alias to match 2 columns from different row schemas (was: ConstantPropagateProcCtx.resolve() should use internal names instead of alias to match 2 columns from different row schemas ) > ConstantPropagateProcCtx.resolve() should verify internal names first instead > of alias to match 2 columns from different row schemas > - > > Key: HIVE-12645 > URL: https://issues.apache.org/jira/browse/HIVE-12645 > Project: Hive > Issue Type: Bug >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > > Currently, it seems that we look to match the ColumnInfo between the parent > and the child rowschemas by calling rci = rs.getColumnInfo(tblAlias, alias) > which might be a bit aggressive. i.e. we will lose opportunity to constant > propogate even if the columns are the same but the alias in the rowschemas do > not match. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11531: --- Fix Version/s: 2.0.0 > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Fix For: 2.1.0 > > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, > HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12640) Allow StatsOptimizer to optimize the query for Constant GroupBy keys
[ https://issues.apache.org/jira/browse/HIVE-12640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051560#comment-15051560 ] Hive QA commented on HIVE-12640: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12776655/HIVE-12640.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 9887 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_metadata_only_queries org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6308/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6308/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6308/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12776655 - PreCommit-HIVE-TRUNK-Build > Allow StatsOptimizer to optimize the query for Constant GroupBy keys > - > > Key: HIVE-12640 > URL: https://issues.apache.org/jira/browse/HIVE-12640 > Project: Hive > Issue Type: Bug >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-12640.1.patch > > > {code} > hive> select count('1') from src group by '1'; > {code} > In the above query, while performing StatsOptimizer optimization we can > safely ignore the group by on the constant key '1' since the above query will > return the same result as "select count('1') from src". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11531) Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise
[ https://issues.apache.org/jira/browse/HIVE-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050382#comment-15050382 ] Jesus Camacho Rodriguez commented on HIVE-11531: +1 > Add mysql-style LIMIT support to Hive, or improve ROW_NUMBER performance-wise > - > > Key: HIVE-11531 > URL: https://issues.apache.org/jira/browse/HIVE-11531 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Sergey Shelukhin >Assignee: Hui Zheng > Attachments: HIVE-11531.02.patch, HIVE-11531.03.patch, > HIVE-11531.04.patch, HIVE-11531.05.patch, HIVE-11531.06.patch, > HIVE-11531.07.patch, HIVE-11531.WIP.1.patch, HIVE-11531.WIP.2.patch, > HIVE-11531.patch > > > For any UIs that involve pagination, it is useful to issue queries in the > form SELECT ... LIMIT X,Y where X,Y are coordinates inside the result to be > paginated (which can be extremely large by itself). At present, ROW_NUMBER > can be used to achieve this effect, but optimizations for LIMIT such as TopN > in ReduceSink do not apply to ROW_NUMBER. We can add first class support for > "skip" to existing limit, or improve ROW_NUMBER for better performance -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12435) SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and vectorization is enabled.
[ https://issues.apache.org/jira/browse/HIVE-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-12435: Attachment: (was: vector_select_null2.q) > SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and > vectorization is enabled. > -- > > Key: HIVE-12435 > URL: https://issues.apache.org/jira/browse/HIVE-12435 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 2.0.0 >Reporter: Takahiko Saito >Assignee: Matt McCline >Priority: Critical > > Run the following query: > {noformat} > create table count_case_groupby (key string, bool boolean) STORED AS orc; > insert into table count_case_groupby values ('key1', true),('key2', > false),('key3', NULL),('key4', false),('key5',NULL); > {noformat} > The table contains the following: > {noformat} > key1 true > key2 false > key3 NULL > key4 false > key5 NULL > {noformat} > The below query returns: > {noformat} > SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) > AS cnt_bool0_ok FROM count_case_groupby GROUP BY key; > key1 1 > key2 1 > key3 1 > key4 1 > key5 1 > {noformat} > while it expects the following results: > {noformat} > key1 1 > key2 1 > key3 0 > key4 1 > key5 0 > {noformat} > The query works with hive ver 1.2. Also it works when a table is not orc > format. > Also even if it's an orc table, when vectorization is disabled, the query > works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12625) Backport to branch-1 HIVE-11981 ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)
[ https://issues.apache.org/jira/browse/HIVE-12625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-12625: Component/s: (was: Hive) ORC > Backport to branch-1 HIVE-11981 ORC Schema Evolution Issues (Vectorized, > ACID, and Non-Vectorized) > -- > > Key: HIVE-12625 > URL: https://issues.apache.org/jira/browse/HIVE-12625 > Project: Hive > Issue Type: Bug > Components: ORC >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12625.1-branch1.patch, HIVE-12625.2-branch1.patch, > HIVE-12625.3-branch1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12435) SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and vectorization is enabled.
[ https://issues.apache.org/jira/browse/HIVE-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-12435: Attachment: HIVE-12435.01.patch > SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and > vectorization is enabled. > -- > > Key: HIVE-12435 > URL: https://issues.apache.org/jira/browse/HIVE-12435 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 2.0.0 >Reporter: Takahiko Saito >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12435.01.patch > > > Run the following query: > {noformat} > create table count_case_groupby (key string, bool boolean) STORED AS orc; > insert into table count_case_groupby values ('key1', true),('key2', > false),('key3', NULL),('key4', false),('key5',NULL); > {noformat} > The table contains the following: > {noformat} > key1 true > key2 false > key3 NULL > key4 false > key5 NULL > {noformat} > The below query returns: > {noformat} > SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) > AS cnt_bool0_ok FROM count_case_groupby GROUP BY key; > key1 1 > key2 1 > key3 1 > key4 1 > key5 1 > {noformat} > while it expects the following results: > {noformat} > key1 1 > key2 1 > key3 0 > key4 1 > key5 0 > {noformat} > The query works with hive ver 1.2. Also it works when a table is not orc > format. > Also even if it's an orc table, when vectorization is disabled, the query > works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12632) LLAP: don't use IO elevator for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-12632: Component/s: llap > LLAP: don't use IO elevator for ACID tables > > > Key: HIVE-12632 > URL: https://issues.apache.org/jira/browse/HIVE-12632 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Takahiko Saito >Assignee: Sergey Shelukhin > Attachments: HIVE-12632.patch > > > Until HIVE-12631 is fixed, we need to avoid ACID tables in IO elevator. Right > now, a FileNotFound error is thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12598) LLAP: disable fileId when not supported
[ https://issues.apache.org/jira/browse/HIVE-12598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-12598: Component/s: llap > LLAP: disable fileId when not supported > --- > > Key: HIVE-12598 > URL: https://issues.apache.org/jira/browse/HIVE-12598 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 2.0.0, 2.1.0 > > Attachments: HIVE-12598.01.patch, HIVE-12598.02.patch, > HIVE-12598.patch > > > There is a TODO somewhere in code. We might get a synthetic fileId in absence > of the real one in some cases when another FS masquerades as HDFS, we should > be able to turn off fileID support explicitly for such cases as they are not > bulletproof. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11878) ClassNotFoundException can possibly occur if multiple jars are registered one at a time in Hive
[ https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050350#comment-15050350 ] Hive QA commented on HIVE-11878: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12776512/HIVE-11878.4.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 9874 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testAddPartitions org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6299/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6299/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6299/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12776512 - PreCommit-HIVE-TRUNK-Build > ClassNotFoundException can possibly occur if multiple jars are registered > one at a time in Hive > > > Key: HIVE-11878 > URL: https://issues.apache.org/jira/browse/HIVE-11878 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Labels: URLClassLoader > Attachments: HIVE-11878 ClassLoader Issues when Registering > Jars.pptx, HIVE-11878.2.patch, HIVE-11878.3.patch, HIVE-11878.4.patch, > HIVE-11878.patch, HIVE-11878_approach3.patch, > HIVE-11878_approach3_per_session_clasloader.patch, > HIVE-11878_approach3_with_review_comments.patch, > HIVE-11878_approach3_with_review_comments1.patch, HIVE-11878_qtest.patch > > > When we register a jar on the Hive console. Hive creates a fresh URL > classloader which includes the path of the current jar to be registered and > all the jar paths of the parent classloader. The parent classlaoder is the > current ThreadContextClassLoader. Once the URLClassloader is created Hive > sets that as the current ThreadContextClassloader. > So if we register multiple jars in Hive, there will be multiple > URLClassLoaders created, each classloader including the jars from its parent > and the one extra jar to be registered. The last URLClassLoader created will > end up as the current ThreadContextClassLoader. (See details: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath) > Now here's an example in which the above strategy can lead to a CNF exception. > We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class > *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, > the URLClassLoader *u1* is created and also set as the > ThreadContextClassLoader. We register *j2* next, the new URLClassLoader > created will be *u2* with *u1* as parent and *u2* becomes the new > ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* > whereas *u1* only has paths to *j1* (For details see: > org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath). > Now when we register class *c1* under a temporary function in Hive, we load > the class using {code} class.forName("c1", true, > Thread.currentThread().getContextClassLoader()) {code} . The > currentThreadContext class-loader is *u2*,
[jira] [Updated] (HIVE-12422) LLAP: add security to Web UI endpoint
[ https://issues.apache.org/jira/browse/HIVE-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-12422: Component/s: llap > LLAP: add security to Web UI endpoint > - > > Key: HIVE-12422 > URL: https://issues.apache.org/jira/browse/HIVE-12422 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12422.01.patch, HIVE-12422.02.patch, > HIVE-12422.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11528) incrementally read query results when there's no ORDER BY
[ https://issues.apache.org/jira/browse/HIVE-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051602#comment-15051602 ] Sergey Shelukhin commented on HIVE-11528: - Yes, IIRC order by query always ends with a single reducer to sort the data. Without order by, multiple reducers output the results to multiple files, and then we read these. I think the pipeline is the same for both cases, the question is whether the last stage is a single reducer. > incrementally read query results when there's no ORDER BY > - > > Key: HIVE-11528 > URL: https://issues.apache.org/jira/browse/HIVE-11528 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Keisuke Ogiwara > > May require HIVE-11527. When there's no ORDER BY and there's more than one > reducer on the last stage of the query, it should be possible to return data > to the user as it is produced, instead of waiting for all reducers to finish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12632) LLAP: don't use IO elevator for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12632: Target Version/s: 2.0.0 > LLAP: don't use IO elevator for ACID tables > > > Key: HIVE-12632 > URL: https://issues.apache.org/jira/browse/HIVE-12632 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Takahiko Saito >Assignee: Sergey Shelukhin > Attachments: HIVE-12632.patch > > > Until HIVE-12631 is fixed, we need to avoid ACID tables in IO elevator. Right > now, a FileNotFound error is thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11107) Support for Performance regression test suite with TPCDS
[ https://issues.apache.org/jira/browse/HIVE-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11107: - Attachment: HIVE-11107.7.patch > Support for Performance regression test suite with TPCDS > > > Key: HIVE-11107 > URL: https://issues.apache.org/jira/browse/HIVE-11107 > Project: Hive > Issue Type: Sub-task >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-11107.1.patch, HIVE-11107.2.patch, > HIVE-11107.3.patch, HIVE-11107.4.patch, HIVE-11107.5.patch, > HIVE-11107.6.patch, HIVE-11107.7.patch > > > Support to add TPCDS queries to the performance regression test suite with > Hive CBO turned on. > This benchmark is intended to make sure that subsequent changes to the > optimizer or any hive code do not yield any unexpected plan changes. i.e. > the intention is to not run the entire TPCDS query set, but just "explain > plan" for the TPCDS queries. > As part of this jira, we will manually verify that expected hive > optimizations kick in for the queries (for given stats/dataset). If there is > a difference in plan within this test suite due to a future commit, it needs > to be analyzed and we need to make sure that it is not a regression. > The test suite can be run in master branch from itests by > {code} > mvn test -Dtest=TestPerfCliDriver > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11107) Support for Performance regression test suite with TPCDS
[ https://issues.apache.org/jira/browse/HIVE-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11107: - Attachment: (was: HIVE-11107.6.patch) > Support for Performance regression test suite with TPCDS > > > Key: HIVE-11107 > URL: https://issues.apache.org/jira/browse/HIVE-11107 > Project: Hive > Issue Type: Sub-task >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-11107.1.patch, HIVE-11107.2.patch, > HIVE-11107.3.patch, HIVE-11107.4.patch, HIVE-11107.5.patch, > HIVE-11107.6.patch, HIVE-11107.7.patch > > > Support to add TPCDS queries to the performance regression test suite with > Hive CBO turned on. > This benchmark is intended to make sure that subsequent changes to the > optimizer or any hive code do not yield any unexpected plan changes. i.e. > the intention is to not run the entire TPCDS query set, but just "explain > plan" for the TPCDS queries. > As part of this jira, we will manually verify that expected hive > optimizations kick in for the queries (for given stats/dataset). If there is > a difference in plan within this test suite due to a future commit, it needs > to be analyzed and we need to make sure that it is not a regression. > The test suite can be run in master branch from itests by > {code} > mvn test -Dtest=TestPerfCliDriver > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12632) LLAP: don't use IO elevator for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051738#comment-15051738 ] Sergey Shelukhin commented on HIVE-12632: - Looks like LLAP tests are also broken as per the JIRA I just created. Don't know why I didn't hit this issue when I was originally running. > LLAP: don't use IO elevator for ACID tables > > > Key: HIVE-12632 > URL: https://issues.apache.org/jira/browse/HIVE-12632 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Takahiko Saito >Assignee: Sergey Shelukhin > Attachments: HIVE-12632.patch > > > Until HIVE-12631 is fixed, we need to avoid ACID tables in IO elevator. Right > now, a FileNotFound error is thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12648) LLAP IO was disabled in CliDriver by accident (and tests are broken)
[ https://issues.apache.org/jira/browse/HIVE-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051758#comment-15051758 ] Sergey Shelukhin commented on HIVE-12648: - I am trying to bisect now to see what broke the tests, since the failure is rather weird (buffer size mismatch when reading metadata) > LLAP IO was disabled in CliDriver by accident (and tests are broken) > > > Key: HIVE-12648 > URL: https://issues.apache.org/jira/browse/HIVE-12648 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12648) LLAP IO was disabled in CliDriver by accident (and tests are broken)
[ https://issues.apache.org/jira/browse/HIVE-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12648: Target Version/s: 2.0.0 > LLAP IO was disabled in CliDriver by accident (and tests are broken) > > > Key: HIVE-12648 > URL: https://issues.apache.org/jira/browse/HIVE-12648 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12648.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12535) Dynamic Hash Join: Key references are cyclic
[ https://issues.apache.org/jira/browse/HIVE-12535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051592#comment-15051592 ] Laljo John Pullokkaran commented on HIVE-12535: --- [~pxiong] Could you take a look? > Dynamic Hash Join: Key references are cyclic > > > Key: HIVE-12535 > URL: https://issues.apache.org/jira/browse/HIVE-12535 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Gopal V >Assignee: Jason Dere > Attachments: philz_26.txt > > > MAPJOIN_4227 is inside "Reducer 2", but refers back to "Reducer 2" in its > keys. It should say "Map 1" there. > {code} > ||<-Reducer 2 [SIMPLE_EDGE] vectorized, llap > > > | > | Reduce Output Operator [RS_4189] > > > | > | key expressions:_col0 (type: string), _col1 (type: > int) > > | > | Map-reduce partition columns:_col0 (type: string), > _col1 (type: int) > > | > | sort order:++ > > > | > | Statistics:Num rows: 83 Data size: 9213 Basic stats: > COMPLETE Column stats: COMPLETE > > | > | value expressions:_col2 (type: double) > > > | > | Group By Operator [OP_4229] > > > | > | aggregations:["sum(_col2)"] > > > | > | keys:_col0 (type: string), _col1 (type: int) > > > | > | outputColumnNames:["_col0","_col1","_col2"] > > > | > | Statistics:Num rows: 83 Data size: 9213 Basic > stats: COMPLETE Column stats: COMPLETE > > | > | Select Operator [OP_4228] > > > | > |outputColumnNames:["_col0","_col1","_col2"] > > > | > |Statistics:Num rows: 166 Data size: 26394 Basic > stats: COMPLETE Column stats: COMPLETE > >| > |Map Join Operator
[jira] [Updated] (HIVE-12633) LLAP: package included serde jars
[ https://issues.apache.org/jira/browse/HIVE-12633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12633: Attachment: HIVE-12633.01.patch Fixed the typo. I probably deleted the symbol accidentally after building... > LLAP: package included serde jars > - > > Key: HIVE-12633 > URL: https://issues.apache.org/jira/browse/HIVE-12633 > Project: Hive > Issue Type: Bug >Reporter: Takahiko Saito >Assignee: Sergey Shelukhin > Attachments: HIVE-12633.01.patch, HIVE-12633.patch > > > Some SerDes like JSONSerde are not packaged with LLAP. One cannot localize > jars on the daemon (due to security consideration if nothing else), so we > should package them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12640) Allow StatsOptimizer to optimize the query for Constant GroupBy keys
[ https://issues.apache.org/jira/browse/HIVE-12640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-12640: - Description: {code} hive> select count('1') from src group by '1'; {code} In the above query, while performing StatsOptimizer optimization we can safely ignore the group by on the constant key '1' since the above query will return the same result as "select count('1') from src". Exception: If src is empty, according to the SQL standard, {code} select count('1') from src group by '1' {code} and {code} select count('1') from src {code} should produce 1 and 0 rows respectively. was: {code} hive> select count('1') from src group by '1'; {code} In the above query, while performing StatsOptimizer optimization we can safely ignore the group by on the constant key '1' since the above query will return the same result as "select count('1') from src". Exception: If src is empty, according to the SQL standard, should select count('1') from src group by '1' and select count('1') from src > Allow StatsOptimizer to optimize the query for Constant GroupBy keys > - > > Key: HIVE-12640 > URL: https://issues.apache.org/jira/browse/HIVE-12640 > Project: Hive > Issue Type: Bug >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-12640.1.patch > > > {code} > hive> select count('1') from src group by '1'; > {code} > In the above query, while performing StatsOptimizer optimization we can > safely ignore the group by on the constant key '1' since the above query will > return the same result as "select count('1') from src". > Exception: > If src is empty, according to the SQL standard, > {code} > select count('1') from src group by '1' > {code} > and > {code} > select count('1') from src > {code} > should produce 1 and 0 rows respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12535) Dynamic Hash Join: Key references are cyclic
[ https://issues.apache.org/jira/browse/HIVE-12535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051651#comment-15051651 ] Pengcheng Xiong commented on HIVE-12535: [~jdere], thanks for your investigation. Do you have any idea why big table 0 is not in the input vertices? Thanks. > Dynamic Hash Join: Key references are cyclic > > > Key: HIVE-12535 > URL: https://issues.apache.org/jira/browse/HIVE-12535 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Gopal V >Assignee: Jason Dere > Attachments: philz_26.txt > > > MAPJOIN_4227 is inside "Reducer 2", but refers back to "Reducer 2" in its > keys. It should say "Map 1" there. > {code} > ||<-Reducer 2 [SIMPLE_EDGE] vectorized, llap > > > | > | Reduce Output Operator [RS_4189] > > > | > | key expressions:_col0 (type: string), _col1 (type: > int) > > | > | Map-reduce partition columns:_col0 (type: string), > _col1 (type: int) > > | > | sort order:++ > > > | > | Statistics:Num rows: 83 Data size: 9213 Basic stats: > COMPLETE Column stats: COMPLETE > > | > | value expressions:_col2 (type: double) > > > | > | Group By Operator [OP_4229] > > > | > | aggregations:["sum(_col2)"] > > > | > | keys:_col0 (type: string), _col1 (type: int) > > > | > | outputColumnNames:["_col0","_col1","_col2"] > > > | > | Statistics:Num rows: 83 Data size: 9213 Basic > stats: COMPLETE Column stats: COMPLETE > > | > | Select Operator [OP_4228] > > > | > |outputColumnNames:["_col0","_col1","_col2"] > > > | > |Statistics:Num rows: 166 Data size: 26394 Basic > stats: COMPLETE Column stats: COMPLETE > >
[jira] [Commented] (HIVE-12551) Fix several kryo exceptions in branch-1
[ https://issues.apache.org/jira/browse/HIVE-12551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051711#comment-15051711 ] Prasanth Jayachandran commented on HIVE-12551: -- [~Feng Yuan] Can you provide a small reproducible test case for this issue? This issues looks like PartitionDesc is not registered with kryo. I can fix that but it will not guarantee that it will completely solve the issue. Adding PartitionDesc alone won't be sufficient. It could show up some other issue after this. To fix it completely, if you can provide a small data set and queries that reproduces this issue, I will test the patch before updating the JIRA. > Fix several kryo exceptions in branch-1 > --- > > Key: HIVE-12551 > URL: https://issues.apache.org/jira/browse/HIVE-12551 > Project: Hive > Issue Type: Bug >Affects Versions: 1.3.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Labels: serialization > Fix For: 1.3.0 > > Attachments: HIVE-12551.1.patch > > > HIVE-11519, HIVE-12174 and the following exception are all caused by > unregistered classes or serializers. HIVE-12175 should have fixed these > issues for master branch. > {code} > Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.lang.NullPointerException > Serialization trace: > chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc) > expr (org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor) > childExpressions > (org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FilterStringColumnBetween) > conditionEvaluator > (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator) > childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) > aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:367) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:276) > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18) > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17) > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507) > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672) > at > org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:1087) > at >
[jira] [Commented] (HIVE-12548) Hive metastore goes down in Kerberos,sentry enabled CDH5.5 cluster
[ https://issues.apache.org/jira/browse/HIVE-12548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051722#comment-15051722 ] Andrew Olson commented on HIVE-12548: - We're seeing this stack trace, from an Oozie Java action that connects to the MetaStore in a Kerberos-secured cluster. We've tried to provide the Hive credentials as described in https://oozie.apache.org/docs/4.2.0/DG_ActionAuthentication.html with no success so far. Any advice would be appreciated. > Hive metastore goes down in Kerberos,sentry enabled CDH5.5 cluster > -- > > Key: HIVE-12548 > URL: https://issues.apache.org/jira/browse/HIVE-12548 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2 > Environment: RHEL 6.5 CLOUDERA CDH 5.5 >Reporter: narendra reddy ganesana >Assignee: Vaibhav Gumashta > > [pool-3-thread-10]: Error occurred during processing of message. > java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: > Invalid status -128 > at > org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:739) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:736) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:736) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.thrift.transport.TTransportException: Invalid status > -128 > at > org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232) > at > org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:184) > at > org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) > at > org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) > at > org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) > ... 10 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12648) LLAP IO was disabled in CliDriver by accident (and tests are broken)
[ https://issues.apache.org/jira/browse/HIVE-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12648: Attachment: HIVE-12648.patch This re-enables the tests and removes all but one ctors to require that the callers think about the settings rather than getting the defaults blindly. The orc_llap test fails, I am bisecting to see what broke it. [~prasanth_j] fyi > LLAP IO was disabled in CliDriver by accident (and tests are broken) > > > Key: HIVE-12648 > URL: https://issues.apache.org/jira/browse/HIVE-12648 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12648.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12640) Allow StatsOptimizer to optimize the query for Constant GroupBy keys
[ https://issues.apache.org/jira/browse/HIVE-12640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051586#comment-15051586 ] Julian Hyde commented on HIVE-12640: If {{src}} is empty, according to the SQL standard, should {code} select count('1') from src group by '1'{code} and {code} select count('1') from src{code} return the same result? My understanding is that the first should return 1 row, the second 0 rows. > Allow StatsOptimizer to optimize the query for Constant GroupBy keys > - > > Key: HIVE-12640 > URL: https://issues.apache.org/jira/browse/HIVE-12640 > Project: Hive > Issue Type: Bug >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-12640.1.patch > > > {code} > hive> select count('1') from src group by '1'; > {code} > In the above query, while performing StatsOptimizer optimization we can > safely ignore the group by on the constant key '1' since the above query will > return the same result as "select count('1') from src". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12590) Repeated UDAFs with literals can produce incorrect result
[ https://issues.apache.org/jira/browse/HIVE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051600#comment-15051600 ] Laljo John Pullokkaran commented on HIVE-12590: --- Obvious change was to remove lowercase conversion in Phase1. However it seems deeper than that. [~ashutoshc] Could you take a look? I have other stuff in my plate right now. > Repeated UDAFs with literals can produce incorrect result > - > > Key: HIVE-12590 > URL: https://issues.apache.org/jira/browse/HIVE-12590 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 1.0.1, 1.1.1, 1.2.1, 2.0.0 >Reporter: Laljo John Pullokkaran >Assignee: Laljo John Pullokkaran >Priority: Critical > > Repeated UDAF with literals could produce wrong result. > This is not a common use case, nevertheless a bug. > hive> select max('pants'), max('pANTS') from t1 group by key; > Total MapReduce CPU Time Spent: 0 msec > OK > pANTS pANTS > pANTS pANTS > pANTS pANTS > pANTS pANTS > pANTS pANTS > Time taken: 296.252 seconds, Fetched: 5 row(s) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12590) Repeated UDAFs with literals can produce incorrect result
[ https://issues.apache.org/jira/browse/HIVE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-12590: -- Assignee: Ashutosh Chauhan (was: Laljo John Pullokkaran) > Repeated UDAFs with literals can produce incorrect result > - > > Key: HIVE-12590 > URL: https://issues.apache.org/jira/browse/HIVE-12590 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 1.0.1, 1.1.1, 1.2.1, 2.0.0 >Reporter: Laljo John Pullokkaran >Assignee: Ashutosh Chauhan >Priority: Critical > > Repeated UDAF with literals could produce wrong result. > This is not a common use case, nevertheless a bug. > hive> select max('pants'), max('pANTS') from t1 group by key; > Total MapReduce CPU Time Spent: 0 msec > OK > pANTS pANTS > pANTS pANTS > pANTS pANTS > pANTS pANTS > pANTS pANTS > Time taken: 296.252 seconds, Fetched: 5 row(s) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12640) Allow StatsOptimizer to optimize the query for Constant GroupBy keys
[ https://issues.apache.org/jira/browse/HIVE-12640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-12640: - Description: {code} hive> select count('1') from src group by '1'; {code} In the above query, while performing StatsOptimizer optimization we can safely ignore the group by on the constant key '1' since the above query will return the same result as "select count('1') from src". Exception: If src is empty, according to the SQL standard, should select count('1') from src group by '1' and select count('1') from src was: {code} hive> select count('1') from src group by '1'; {code} In the above query, while performing StatsOptimizer optimization we can safely ignore the group by on the constant key '1' since the above query will return the same result as "select count('1') from src". > Allow StatsOptimizer to optimize the query for Constant GroupBy keys > - > > Key: HIVE-12640 > URL: https://issues.apache.org/jira/browse/HIVE-12640 > Project: Hive > Issue Type: Bug >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-12640.1.patch > > > {code} > hive> select count('1') from src group by '1'; > {code} > In the above query, while performing StatsOptimizer optimization we can > safely ignore the group by on the constant key '1' since the above query will > return the same result as "select count('1') from src". > Exception: > If src is empty, according to the SQL standard, should > select count('1') from src group by '1' > and > select count('1') from src -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-12648) LLAP IO was disabled in CliDriver by accident (and tests are broken)
[ https://issues.apache.org/jira/browse/HIVE-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-12648: --- Assignee: Sergey Shelukhin > LLAP IO was disabled in CliDriver by accident (and tests are broken) > > > Key: HIVE-12648 > URL: https://issues.apache.org/jira/browse/HIVE-12648 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12573) some DPP tests are broken
[ https://issues.apache.org/jira/browse/HIVE-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12573: Target Version/s: 2.0.0 > some DPP tests are broken > - > > Key: HIVE-12573 > URL: https://issues.apache.org/jira/browse/HIVE-12573 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12573.patch > > > -It looks like LLAP out files were not updated in some DPP JIRA because the > test was entirely broken in HiveQA at the time- actually looks like out files > have explain output with a glitch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12640) Allow StatsOptimizer to optimize the query for Constant GroupBy keys
[ https://issues.apache.org/jira/browse/HIVE-12640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051646#comment-15051646 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-12640: -- [~julianhyde] Thanks , I have noted that as a condition to cover in the jira description. Thanks Hari > Allow StatsOptimizer to optimize the query for Constant GroupBy keys > - > > Key: HIVE-12640 > URL: https://issues.apache.org/jira/browse/HIVE-12640 > Project: Hive > Issue Type: Bug >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-12640.1.patch > > > {code} > hive> select count('1') from src group by '1'; > {code} > In the above query, while performing StatsOptimizer optimization we can > safely ignore the group by on the constant key '1' since the above query will > return the same result as "select count('1') from src". > Exception: > If src is empty, according to the SQL standard, > {code} > select count('1') from src group by '1' > {code} > and > {code} > select count('1') from src > {code} > should produce 1 and 0 rows respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11107) Support for Performance regression test suite with TPCDS
[ https://issues.apache.org/jira/browse/HIVE-11107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11107: - Attachment: HIVE-11107.6.patch > Support for Performance regression test suite with TPCDS > > > Key: HIVE-11107 > URL: https://issues.apache.org/jira/browse/HIVE-11107 > Project: Hive > Issue Type: Sub-task >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-11107.1.patch, HIVE-11107.2.patch, > HIVE-11107.3.patch, HIVE-11107.4.patch, HIVE-11107.5.patch, > HIVE-11107.6.patch, HIVE-11107.6.patch > > > Support to add TPCDS queries to the performance regression test suite with > Hive CBO turned on. > This benchmark is intended to make sure that subsequent changes to the > optimizer or any hive code do not yield any unexpected plan changes. i.e. > the intention is to not run the entire TPCDS query set, but just "explain > plan" for the TPCDS queries. > As part of this jira, we will manually verify that expected hive > optimizations kick in for the queries (for given stats/dataset). If there is > a difference in plan within this test suite due to a future commit, it needs > to be analyzed and we need to make sure that it is not a regression. > The test suite can be run in master branch from itests by > {code} > mvn test -Dtest=TestPerfCliDriver > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-11538) Add an option to skip init script while running tests
[ https://issues.apache.org/jira/browse/HIVE-11538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051859#comment-15051859 ] Lefty Leverenz edited comment on HIVE-11538 at 12/10/15 11:49 PM: -- [~ashutoshc], this is now added to the wikidoc in the HiveDeveloperFAQ: * [HiveDeveloperFAQ - Testing | https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-Testing] ** [How do I modify the init script when testing? | https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-HowdoImodifytheinitscriptwhentesting?] was (Author: sladymon): [~ashutoshc], this is now added to the wikidoc in the HiveDeveloperFAQ: * [HiveDeveloperFAQ - Testing | https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-Testing] > Add an option to skip init script while running tests > - > > Key: HIVE-11538 > URL: https://issues.apache.org/jira/browse/HIVE-11538 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: HIVE-11538.2.patch, HIVE-11538.3.patch, HIVE-11538.patch > > > {{q_test_init.sql}} has grown over time. Now, it takes substantial amount of > time. When debugging a particular query which doesn't need such > initialization, this delay is annoyance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12610) Hybrid Grace Hash Join should fail task faster if processing first batch fails, instead of continuing processing the rest
[ https://issues.apache.org/jira/browse/HIVE-12610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051910#comment-15051910 ] Hive QA commented on HIVE-12610: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12776660/HIVE-12610.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 9891 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6310/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6310/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6310/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12776660 - PreCommit-HIVE-TRUNK-Build > Hybrid Grace Hash Join should fail task faster if processing first batch > fails, instead of continuing processing the rest > - > > Key: HIVE-12610 > URL: https://issues.apache.org/jira/browse/HIVE-12610 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.1 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-12610.1.patch, HIVE-12610.2.patch > > > During processing the spilled partitions, if there's any fatal error, such as > Kryo exception, then we should exit early, instead of moving on to process > the rest of spilled partitions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12445) Tracking of completed dags is a slow memory leak
[ https://issues.apache.org/jira/browse/HIVE-12445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051968#comment-15051968 ] Siddharth Seth commented on HIVE-12445: --- +1. Looks good. > Tracking of completed dags is a slow memory leak > > > Key: HIVE-12445 > URL: https://issues.apache.org/jira/browse/HIVE-12445 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.0.0 >Reporter: Siddharth Seth >Assignee: Sergey Shelukhin > Attachments: HIVE-12445.patch > > > LLAP daemons track completed DAGs, but never clean up these structures. This > is primarily to disallow out of order executions. Evaluate whether that can > be avoided - otherwise this structure needs to be cleaned up with a delay. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12596) Delete timestamp row throws java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
[ https://issues.apache.org/jira/browse/HIVE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051792#comment-15051792 ] Hive QA commented on HIVE-12596: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12776657/HIVE-12596.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 28 failed/errored test(s), 9893 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_bucket_map_join_tez1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_join1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge10 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_script_env_var2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_aggregate_9 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_auto_smb_mapjoin_14 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_simple org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_distinct_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_orderby_5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_outer_join3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_17 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_casts org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6309/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6309/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6309/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 28 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12776657 - PreCommit-HIVE-TRUNK-Build > Delete timestamp row throws java.lang.IllegalArgumentException: Timestamp > format must be -mm-dd hh:mm:ss[.f] > > > Key: HIVE-12596 > URL: https://issues.apache.org/jira/browse/HIVE-12596 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Takahiko Saito >Assignee: Prasanth Jayachandran > Attachments: HIVE-12596.1.patch > > > Run the below: > {noformat} > create table test_acid( i int, ts timestamp) > clustered by (i) into 2 buckets > stored as orc > tblproperties ('transactional'='true'); > insert into table test_acid values (1, '2014-09-14 12:34:30'); > delete from test_acid where ts = '2014-15-16 17:18:19.20'; > {noformat} > The below error is thrown: > {noformat} > 15/12/04 19:55:49 INFO SessionState: Map 1: -/- Reducer 2: 0/2 > Status: Failed > 15/12/04 19:55:49 ERROR SessionState: Status: Failed > Vertex failed, vertexName=Map 1, vertexId=vertex_1447960616881_0022_2_00, > diagnostics=[Vertex vertex_1447960616881_0022_2_00 [Map 1] killed/failed due > to:ROOT_INPUT_INIT_FAILURE, Vertex Input: test_acid initializer failed, > vertex=vertex_1447960616881_0022_2_00 [Map 1], > java.lang.IllegalArgumentException: Timestamp format must be -mm-dd >
[jira] [Commented] (HIVE-12184) DESCRIBE of fully qualified table fails when db and table name match and non-default database is in use
[ https://issues.apache.org/jira/browse/HIVE-12184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051833#comment-15051833 ] Shannon Ladymon commented on HIVE-12184: [~ngangam], that helps a lot, thank you. I've updated the doc with the clarification of the two syntax formats and the description of field_name. Since we aren't completely sure that describe partition can use all those options, I haven't added any of them to the doc at this time. > DESCRIBE of fully qualified table fails when db and table name match and > non-default database is in use > --- > > Key: HIVE-12184 > URL: https://issues.apache.org/jira/browse/HIVE-12184 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.1 >Reporter: Lenni Kuff >Assignee: Naveen Gangam > Fix For: 2.0.0 > > Attachments: HIVE-12184.10.patch, HIVE-12184.10.patch, > HIVE-12184.2.patch, HIVE-12184.3.patch, HIVE-12184.4.patch, > HIVE-12184.5.patch, HIVE-12184.6.patch, HIVE-12184.7.patch, > HIVE-12184.8.patch, HIVE-12184.9.patch, HIVE-12184.patch > > > DESCRIBE of fully qualified table fails when db and table name match and > non-default database is in use. > Repro: > {code} > : jdbc:hive2://localhost:1/default> create database foo; > No rows affected (0.116 seconds) > 0: jdbc:hive2://localhost:1/default> create table foo.foo(i int); > 0: jdbc:hive2://localhost:1/default> describe foo.foo; > +---++--+--+ > | col_name | data_type | comment | > +---++--+--+ > | i | int| | > +---++--+--+ > 1 row selected (0.049 seconds) > 0: jdbc:hive2://localhost:1/default> use foo; > 0: jdbc:hive2://localhost:1/default> describe foo.foo; > Error: Error while processing statement: FAILED: Execution Error, return code > 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error in getting fields from > serde.Invalid Field foo (state=08S01,code=1) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9544) Error dropping fully qualified partitioned table - Internal error processing get_partition_names
[ https://issues.apache.org/jira/browse/HIVE-9544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051891#comment-15051891 ] Chaoyu Tang commented on HIVE-9544: --- It is possible since I was not able to reproduce this issue in current upstream. > Error dropping fully qualified partitioned table - Internal error processing > get_partition_names > > > Key: HIVE-9544 > URL: https://issues.apache.org/jira/browse/HIVE-9544 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0 > Environment: HDP 2.2 >Reporter: Hari Sekhon >Assignee: Chaoyu Tang >Priority: Minor > > When attempting to drop a partitioned table using a fully qualified name I > get this error: > {code} > hive -e 'drop table myDB.my_table_name;' > Logging initialized using configuration in > file:/etc/hive/conf/hive-log4j.properties > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask. > org.apache.thrift.TApplicationException: Internal error processing > get_partition_names > {code} > It succeeds if I instead do: > {code}hive -e 'use myDB; drop table my_table_name;'{code} > Regards, > Hari Sekhon > http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11538) Add an option to skip init script while running tests
[ https://issues.apache.org/jira/browse/HIVE-11538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051902#comment-15051902 ] Lefty Leverenz commented on HIVE-11538: --- [~ashutoshc] since this is only for Hive 2.0.0+, is -Phadoop-2 really necessary in the mvn commands? {code} mvn test -Dtest=TestCliDriver -Phadoop-2 -Dqfile=test_to_run.q -DinitScript= {code} and {code} mvn test -Dtest=TestCliDriver -Phadoop-2 -Dtest.output.overwrite=true -Dqfile=test_to_run.q -DinitScript=custom_script.sql {code} > Add an option to skip init script while running tests > - > > Key: HIVE-11538 > URL: https://issues.apache.org/jira/browse/HIVE-11538 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: HIVE-11538.2.patch, HIVE-11538.3.patch, HIVE-11538.patch > > > {{q_test_init.sql}} has grown over time. Now, it takes substantial amount of > time. When debugging a particular query which doesn't need such > initialization, this delay is annoyance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12635) Hive should return the latest hbase cell timestamp as the row timestamp value
[ https://issues.apache.org/jira/browse/HIVE-12635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051912#comment-15051912 ] Szehon Ho commented on HIVE-12635: -- Chatted with Aihua on patch, seems simple. Small comment for consideration, does it make sense to init timestamp variable to 0, and then do the loop from 0 to result.rawCells().length? For cleaner code. And you may know more than me, does this qualify for backward-incompatibility? +1 other than those. > Hive should return the latest hbase cell timestamp as the row timestamp value > - > > Key: HIVE-12635 > URL: https://issues.apache.org/jira/browse/HIVE-12635 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Affects Versions: 2.1.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-12635.patch > > > When hive talks to hbase and maps hbase timestamp field to one hive column, > seems hive returns the first cell timestamp instead of the latest one as the > timestamp value. > Makes sense to return the latest timestamp since adding the latest cell can > be considered an update to the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11866) Add framework to enable testing using LDAPServer using LDAP protocol
[ https://issues.apache.org/jira/browse/HIVE-11866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051794#comment-15051794 ] Thejas M Nair commented on HIVE-11866: -- [~ngangam] Thanks for following up on this. The tests would be very valuable. Did you get a chance to look at suggestions in the thread http://mail-archives.apache.org/mod_mbox/directory-users/201510.mbox/browser ? (refered from LEGAL-227). Looks like people in the directory list are very willing to help. Avoiding a GPL licensed library keeps things much more straightforward, and avoids the chances of someone accidentally assuming its apache compatible and start using in the product itself. Another possibility is bugs in the build scripts, I already see junit in Apache hive 1.2.1, though it should not be there. > Add framework to enable testing using LDAPServer using LDAP protocol > > > Key: HIVE-11866 > URL: https://issues.apache.org/jira/browse/HIVE-11866 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.3.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam > Attachments: HIVE-11866.2.patch, HIVE-11866.patch > > > Currently there is no unit test coverage for HS2's LDAP Atn provider using a > LDAP Server on the backend. This prevents testing of the LDAPAtnProvider with > some realistic usecases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12435) SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and vectorization is enabled.
[ https://issues.apache.org/jira/browse/HIVE-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-12435: Attachment: HIVE-12435.02.patch Review comments. > SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and > vectorization is enabled. > -- > > Key: HIVE-12435 > URL: https://issues.apache.org/jira/browse/HIVE-12435 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 2.0.0 >Reporter: Takahiko Saito >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12435.01.patch, HIVE-12435.02.patch > > > Run the following query: > {noformat} > create table count_case_groupby (key string, bool boolean) STORED AS orc; > insert into table count_case_groupby values ('key1', true),('key2', > false),('key3', NULL),('key4', false),('key5',NULL); > {noformat} > The table contains the following: > {noformat} > key1 true > key2 false > key3 NULL > key4 false > key5 NULL > {noformat} > The below query returns: > {noformat} > SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) > AS cnt_bool0_ok FROM count_case_groupby GROUP BY key; > key1 1 > key2 1 > key3 1 > key4 1 > key5 1 > {noformat} > while it expects the following results: > {noformat} > key1 1 > key2 1 > key3 0 > key4 1 > key5 0 > {noformat} > The query works with hive ver 1.2. Also it works when a table is not orc > format. > Also even if it's an orc table, when vectorization is disabled, the query > works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-2651) The variable hive.exec.mode.local.auto.tasks.max should be changed
[ https://issues.apache.org/jira/browse/HIVE-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051947#comment-15051947 ] Shannon Ladymon commented on HIVE-2651: --- Doc note: The old variable, *hive.exec.mode.local.auto.tasks.max*, is available in the wiki here: * [Configuration Properties - hive.exec.mode.local.auto.tasks.max | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.mode.local.auto.tasks.max] The new variable, *hive.exec.mode.local.auto.input.files.max*, is available in the wiki here: * [Configuration Properties - hive.exec.mode.local.auto.input.files.max | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.mode.local.auto.input.files.max] > The variable hive.exec.mode.local.auto.tasks.max should be changed > -- > > Key: HIVE-2651 > URL: https://issues.apache.org/jira/browse/HIVE-2651 > Project: Hive > Issue Type: Improvement >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.9.0 > > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2651.D861.1.patch > > > It should be called hive.exec.mode.local.auto.input.files.max instead. > The number of input files are checked currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12635) Hive should return the latest hbase cell timestamp as the row timestamp value
[ https://issues.apache.org/jira/browse/HIVE-12635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051850#comment-15051850 ] Aihua Xu commented on HIVE-12635: - [~szehon] Can you help review the patch? > Hive should return the latest hbase cell timestamp as the row timestamp value > - > > Key: HIVE-12635 > URL: https://issues.apache.org/jira/browse/HIVE-12635 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Affects Versions: 2.1.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-12635.patch > > > When hive talks to hbase and maps hbase timestamp field to one hive column, > seems hive returns the first cell timestamp instead of the latest one as the > timestamp value. > Makes sense to return the latest timestamp since adding the latest cell can > be considered an update to the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12366) Refactor Heartbeater logic for transaction
[ https://issues.apache.org/jira/browse/HIVE-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-12366: - Attachment: HIVE-12366.6.patch Rebased, patch 6 > Refactor Heartbeater logic for transaction > -- > > Key: HIVE-12366 > URL: https://issues.apache.org/jira/browse/HIVE-12366 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-12366.1.patch, HIVE-12366.2.patch, > HIVE-12366.3.patch, HIVE-12366.4.patch, HIVE-12366.5.patch, HIVE-12366.6.patch > > > Currently there is a gap between the time locks acquisition and the first > heartbeat being sent out. Normally the gap is negligible, but when it's big > it will cause query fail since the locks are timed out by the time the > heartbeat is sent. > Need to remove this gap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12435) SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and vectorization is enabled.
[ https://issues.apache.org/jira/browse/HIVE-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051908#comment-15051908 ] Prasanth Jayachandran commented on HIVE-12435: -- +1 > SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and > vectorization is enabled. > -- > > Key: HIVE-12435 > URL: https://issues.apache.org/jira/browse/HIVE-12435 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 2.0.0 >Reporter: Takahiko Saito >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12435.01.patch, HIVE-12435.02.patch > > > Run the following query: > {noformat} > create table count_case_groupby (key string, bool boolean) STORED AS orc; > insert into table count_case_groupby values ('key1', true),('key2', > false),('key3', NULL),('key4', false),('key5',NULL); > {noformat} > The table contains the following: > {noformat} > key1 true > key2 false > key3 NULL > key4 false > key5 NULL > {noformat} > The below query returns: > {noformat} > SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) > AS cnt_bool0_ok FROM count_case_groupby GROUP BY key; > key1 1 > key2 1 > key3 1 > key4 1 > key5 1 > {noformat} > while it expects the following results: > {noformat} > key1 1 > key2 1 > key3 0 > key4 1 > key5 0 > {noformat} > The query works with hive ver 1.2. Also it works when a table is not orc > format. > Also even if it's an orc table, when vectorization is disabled, the query > works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11538) Add an option to skip init script while running tests
[ https://issues.apache.org/jira/browse/HIVE-11538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051859#comment-15051859 ] Shannon Ladymon commented on HIVE-11538: [~ashutoshc], this is now added to the wikidoc in the HiveDeveloperFAQ: * [HiveDeveloperFAQ - Testing | https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-Testing] > Add an option to skip init script while running tests > - > > Key: HIVE-11538 > URL: https://issues.apache.org/jira/browse/HIVE-11538 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: HIVE-11538.2.patch, HIVE-11538.3.patch, HIVE-11538.patch > > > {{q_test_init.sql}} has grown over time. Now, it takes substantial amount of > time. When debugging a particular query which doesn't need such > initialization, this delay is annoyance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12435) SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and vectorization is enabled.
[ https://issues.apache.org/jira/browse/HIVE-12435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051877#comment-15051877 ] Prasanth Jayachandran commented on HIVE-12435: -- This assumption is fragile {code} PrimitiveCategory primitiveCategory = ((PrimitiveTypeInfo) expr.getTypeInfo()).getPrimitiveCategory(); {code} Can you instead get TypeInfo object and see if it is an instance of PrimitiveTypeInfo and of type VOID? > SELECT COUNT(CASE WHEN...) GROUPBY returns 1 for 'NULL' in a case of ORC and > vectorization is enabled. > -- > > Key: HIVE-12435 > URL: https://issues.apache.org/jira/browse/HIVE-12435 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 2.0.0 >Reporter: Takahiko Saito >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12435.01.patch > > > Run the following query: > {noformat} > create table count_case_groupby (key string, bool boolean) STORED AS orc; > insert into table count_case_groupby values ('key1', true),('key2', > false),('key3', NULL),('key4', false),('key5',NULL); > {noformat} > The table contains the following: > {noformat} > key1 true > key2 false > key3 NULL > key4 false > key5 NULL > {noformat} > The below query returns: > {noformat} > SELECT key, COUNT(CASE WHEN bool THEN 1 WHEN NOT bool THEN 0 ELSE NULL END) > AS cnt_bool0_ok FROM count_case_groupby GROUP BY key; > key1 1 > key2 1 > key3 1 > key4 1 > key5 1 > {noformat} > while it expects the following results: > {noformat} > key1 1 > key2 1 > key3 0 > key4 1 > key5 0 > {noformat} > The query works with hive ver 1.2. Also it works when a table is not orc > format. > Also even if it's an orc table, when vectorization is disabled, the query > works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics
[ https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051945#comment-15051945 ] Shannon Ladymon commented on HIVE-1408: --- Doc note: These configuration properties are available in the wiki here: * [Configuration Properties - hive.exec.mode.local.auto | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.mode.local.auto] * [Configuration Properties - hive.exec.mode.local.auto.inputbytes.max | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.mode.local.auto.inputbytes.max] * [Configuration Properties - hive.exec.mode.local.auto.tasks.max | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.mode.local.auto.tasks.max] * [Configuration Properties - hive.exec.mode.local.auto.input.files.max | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.mode.local.auto.input.files.max] > add option to let hive automatically run in local mode based on tunable > heuristics > -- > > Key: HIVE-1408 > URL: https://issues.apache.org/jira/browse/HIVE-1408 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Joydeep Sen Sarma >Assignee: Joydeep Sen Sarma > Fix For: 0.7.0 > > Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, > 1408.7.patch, hive-1408.6.patch > > > as a followup to HIVE-543 - we should have a simple option (enabled by > default) to let hive run in local mode if possible. > two levels of options are desirable: > 1. hive.exec.mode.local.auto=true/false // control whether local mode is > automatically chosen > 2. Options to control different heuristics, some naiive examples: > hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode > if data > 1G > hive.exec.mode.local.auto.script.enable=true/false // choose if local > mode is enabled for queries with user scripts > this can be implemented as a pre/post execution hook. It makes sense to > provide this as a standard hook in the hive codebase since it's likely to > improve response time for many users (especially for test queries). > the initial proposal is to choose this at a query level and not at per > hive-task (ie. hadoop job) level. per job-level requires more changes to > compilation (to not pre-commit to hdfs or local scratch directories at > compile time). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin
[ https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050888#comment-15050888 ] Nemon Lou commented on HIVE-12616: -- [~xuefuz], thanks for review . It is not surprising that you doubt about the "spark.master" setting in HiveConf . I owe one explanation for the issue described here . For short , "spark.master" is set for HiveConf during the creation of HiveSparkClient. Snippet of HiveSparkClientFactory#initiateSparkConf : {code} String sparkMaster = hiveConf.get("spark.master"); if (sparkMaster == null) { sparkMaster = sparkConf.get("spark.master"); hiveConf.set("spark.master", sparkMaster); } {code} The creation of HiveSparkClient only happens once due to reuse (known as SparkSession). However ,this HiveConf is operation level instead of session level (due to asynchronous query). So ,only the first operation's JobConf has "spark.master" with it . Now I have two choices : 1, Setting "spark.master" at session level during HiveSparkClient creation . 2, Setting "spark.master" for each operation when not set before ,but using sparkConf instead of hiveConf from RemoteHiveSparkClient.(SparkConf in RemoteHiveSparkClient already set "spark.master" in an explicit way .) Which one do you prefer ? Adding a test case for this issue seems difficult (yarn-cluster mode,multiple operation in one session ),would you provide some guidance ? Thanks. > NullPointerException when spark session is reused to run a mapjoin > -- > > Key: HIVE-12616 > URL: https://issues.apache.org/jira/browse/HIVE-12616 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.3.0 >Reporter: Nemon Lou >Assignee: Nemon Lou > Attachments: HIVE-12616.patch > > > The way to reproduce: > {noformat} > set hive.execution.engine=spark; > create table if not exists test(id int); > create table if not exists test1(id int); > insert into test values(1); > insert into test1 values(1); > select max(a.id) from test a ,test1 b > where a.id = b.id; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12316) Improved integration test for Hive
[ https://issues.apache.org/jira/browse/HIVE-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050627#comment-15050627 ] Hive QA commented on HIVE-12316: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12776619/HIVE-12316.5.patch {color:green}SUCCESS:{color} +1 due to 25 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 9985 tests executed *Failed tests:* {noformat} TestConf - did not produce a TEST-*.xml file TestHWISessionManager - did not produce a TEST-*.xml file TestManager - did not produce a TEST-*.xml file TestTable - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_date_1 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6301/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6301/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6301/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 18 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12776619 - PreCommit-HIVE-TRUNK-Build > Improved integration test for Hive > -- > > Key: HIVE-12316 > URL: https://issues.apache.org/jira/browse/HIVE-12316 > Project: Hive > Issue Type: New Feature > Components: Testing Infrastructure >Affects Versions: 2.0.0 >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-12316.2.patch, HIVE-12316.5.patch, HIVE-12316.patch > > > In working with Hive testing I have found there are several issues that are > causing problems for developers, testers, and users: > * Because Hive has many tunable knobs (file format, security, etc.) we end up > with tests that cover the same functionality with different permutations of > these features. > * The Hive integration tests (ie qfiles) cannot be run on a cluster. This > means we cannot run any of those tests at scale. The HBase community by > contrast uses the same test suite locally and on a cluster, and has found > that this helps them greatly in testing. > * Golden files are a grievous evil. Test writers are forced to eyeball > results the first time they run a test and decide whether they look > reasonable, which is error prone and makes testing at scale impossible. And > changes to one part of Hive often end up changing the plan (and the output of > explain) thus breaking many tests that are not related. This is particularly > an issue for people working on the optimizer. > * The lack of ability to run on a cluster means that when people test Hive at > scale, they are forced to develop custom frameworks which can't then benefit > the community. > * There is no easy mechanism to bring user queries into the test suite. > I propose we build a new testing capability with the following requirements: > * One test should be able to run all reasonable permutations (mr/tez/spark, > orc/parquet/text/rcfile, secure/non-secure etc.) This doesn't mean it would > run every permutation every time, but that the tester could choose which > permutation to run. > * The same tests should run locally and on a cluster. The tests should > support scaling of input data from Ks to Ts. > * Expected results should be auto-generated whenever possible, and this > should work with the
[jira] [Commented] (HIVE-12629) hive.auto.convert.join=true makes lateral view join sql failed on spark engine on yarn
[ https://issues.apache.org/jira/browse/HIVE-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050900#comment-15050900 ] 吴子美 commented on HIVE-12629: I will give you a simple dataset to show the bug. hive> desc logs1; OK iik int created_at string Time taken: 0.145 seconds, Fetched: 2 row(s) hive> select * from logs1; OK 1 js 5 9wj 1 js 5 9wj 56 io 1 js 5 9wj 1 js 5 9wj Time taken: 0.687 seconds, Fetched: 9 row(s) hive> select count(*) from > (select iik from logs1 group by iik) a join > (select iik from logs1 LATERAL VIEW json_tuple(created_at,'ss') v1 as ss) b on a.iik=b.iik; Query ID = root_20151210201756_e56dadee-69c9-4d4e-838d-98173eab25ec Total jobs = 2 Launching Job 1 out of 2 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Starting Spark Job = 031c6e38-d2d3-4b19-baa7-de1553cd7277 Status: Failed FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask After I change hive.auto.convert.join from true to false. The result is OK. > hive.auto.convert.join=true makes lateral view join sql failed on spark > engine on yarn > -- > > Key: HIVE-12629 > URL: https://issues.apache.org/jira/browse/HIVE-12629 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.1 >Reporter: 吴子美 >Assignee: Xuefu Zhang > > I am using hive1.2 on spark on yarn. > I found > select count(1) from > (select user_id from xxx group by user_id ) a join > (select user_id from yyy lateral view json_tuple(u, 'h') v1 as h) b > on a.user_id=b.user_id ; > failed in hive on spark on yarn, but OK in hive on MR. > I tried the following sql on spark. It was OK. > select count(1) from > (select user_id from xxx group by user_id ) a left join > (select user_id from yyy lateral view json_tuple(u, 'h') v1 as h) b > on a.user_id=b.user_id ; > When I turn hive.auto.convert.join from true to false. Everything goes OK. > The error message in hive.log was : > {code} > 2015-12-09 21:10:17,190 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO log.PerfLogger: > > 2015-12-09 21:10:17,190 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO exec.Utilities: > Serializing ReduceWork via kryo > 2015-12-09 21:10:17,214 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO log.PerfLogger: > duration=24 from=org.apache.hadoop.hive.ql.exec.Utilities> > 2015-12-09 21:10:17,261 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO client.RemoteDriver: > Failed to run job 8fed1ca8-834f-497f-b189-eab343440a9f > 2015-12-09 21:10:17,261 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - java.lang.IllegalStateException: Connection > already exists > 2015-12-09 21:10:17,261 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - at > org.apache.hadoop.hive.ql.exec.spark.SparkPlan.connect(SparkPlan.java:142) > 2015-12-09 21:10:17,261 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - at > org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:142) > 2015-12-09 21:10:17,261 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - at > org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:106) > 2015-12-09 21:10:17,261 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - at > org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:252) > 2015-12-09 21:10:17,261 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - at > org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:366) > 2015-12-09 21:10:17,261 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - at > org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335) > 2015-12-09 21:10:17,261 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(569)) - at > java.util.concurrent.FutureTask.run(FutureTask.java:262) > 2015-12-09 21:10:17,262 INFO [stderr-redir-1]:
[jira] [Commented] (HIVE-12531) Implement fast-path for Year/Month UDFs for dates between 1999 and 2038
[ https://issues.apache.org/jira/browse/HIVE-12531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051222#comment-15051222 ] Hive QA commented on HIVE-12531: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12776692/HIVE-12531.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 9870 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarDataNucleusUnCaching org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6304/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6304/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6304/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12776692 - PreCommit-HIVE-TRUNK-Build > Implement fast-path for Year/Month UDFs for dates between 1999 and 2038 > --- > > Key: HIVE-12531 > URL: https://issues.apache.org/jira/browse/HIVE-12531 > Project: Hive > Issue Type: Improvement >Reporter: Gopal V >Assignee: Jason Dere > Attachments: HIVE-12531.1.patch, HIVE-12531.2.patch, > HIVE-12531.3.patch > > > Current codepath goes into the JDK Calendar implementation, which is very > slow for the simple cases in the current decade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12635) Hive should return the latest hbase cell timestamp as the row timestamp value
[ https://issues.apache.org/jira/browse/HIVE-12635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051409#comment-15051409 ] Aihua Xu commented on HIVE-12635: - Those tests are not related. > Hive should return the latest hbase cell timestamp as the row timestamp value > - > > Key: HIVE-12635 > URL: https://issues.apache.org/jira/browse/HIVE-12635 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Affects Versions: 2.1.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-12635.patch > > > When hive talks to hbase and maps hbase timestamp field to one hive column, > seems hive returns the first cell timestamp instead of the latest one as the > timestamp value. > Makes sense to return the latest timestamp since adding the latest cell can > be considered an update to the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12055) Create row-by-row shims for the write path
[ https://issues.apache.org/jira/browse/HIVE-12055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051416#comment-15051416 ] Hive QA commented on HIVE-12055: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12776654/HIVE-12055.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6307/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6307/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6307/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-6307/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 57f39a9 HIVE-12598 : LLAP: disable fileId when not supported (Sergey Shelukhin, reviewed by Prasanth Jayachandran, Lefty Leverenz) + git clean -f -d + git checkout master Already on 'master' + git reset --hard origin/master HEAD is now at 57f39a9 HIVE-12598 : LLAP: disable fileId when not supported (Sergey Shelukhin, reviewed by Prasanth Jayachandran, Lefty Leverenz) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12776654 - PreCommit-HIVE-TRUNK-Build > Create row-by-row shims for the write path > --- > > Key: HIVE-12055 > URL: https://issues.apache.org/jira/browse/HIVE-12055 > Project: Hive > Issue Type: Sub-task > Components: ORC, Shims >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: HIVE-12055.patch, HIVE-12055.patch, HIVE-12055.patch, > HIVE-12055.patch, HIVE-12055.patch > > > As part of removing the row-by-row writer, we'll need to shim out the higher > level API (OrcSerde and OrcOutputFormat) so that we maintain backwards > compatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12635) Hive should return the latest hbase cell timestamp as the row timestamp value
[ https://issues.apache.org/jira/browse/HIVE-12635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051407#comment-15051407 ] Hive QA commented on HIVE-12635: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12776651/HIVE-12635.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 9887 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testFetchingPartitionsWithDifferentSchemas org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6305/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6305/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6305/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12776651 - PreCommit-HIVE-TRUNK-Build > Hive should return the latest hbase cell timestamp as the row timestamp value > - > > Key: HIVE-12635 > URL: https://issues.apache.org/jira/browse/HIVE-12635 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Affects Versions: 2.1.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-12635.patch > > > When hive talks to hbase and maps hbase timestamp field to one hive column, > seems hive returns the first cell timestamp instead of the latest one as the > timestamp value. > Makes sense to return the latest timestamp since adding the latest cell can > be considered an update to the row. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11890) Create ORC module
[ https://issues.apache.org/jira/browse/HIVE-11890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051410#comment-15051410 ] Hive QA commented on HIVE-11890: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12776650/HIVE-11890.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6306/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6306/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6306/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-6306/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 57f39a9 HIVE-12598 : LLAP: disable fileId when not supported (Sergey Shelukhin, reviewed by Prasanth Jayachandran, Lefty Leverenz) + git clean -f -d + git checkout master Already on 'master' + git reset --hard origin/master HEAD is now at 57f39a9 HIVE-12598 : LLAP: disable fileId when not supported (Sergey Shelukhin, reviewed by Prasanth Jayachandran, Lefty Leverenz) + git merge --ff-only origin/master Already up-to-date. + git gc + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12776650 - PreCommit-HIVE-TRUNK-Build > Create ORC module > - > > Key: HIVE-11890 > URL: https://issues.apache.org/jira/browse/HIVE-11890 > Project: Hive > Issue Type: Sub-task > Components: ORC >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: HIVE-11890.patch, HIVE-11890.patch, HIVE-11890.patch, > HIVE-11890.patch, HIVE-11890.patch, HIVE-11890.patch, HIVE-11890.patch > > > Start moving classes over to the ORC module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12375) ensure hive.compactor.check.interval cannot be set too low
[ https://issues.apache.org/jira/browse/HIVE-12375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-12375: -- Attachment: HIVE-12375.2.patch [~alangates] updated patch which makes use of HIVE_IN_TEST > ensure hive.compactor.check.interval cannot be set too low > -- > > Key: HIVE-12375 > URL: https://issues.apache.org/jira/browse/HIVE-12375 > Project: Hive > Issue Type: Bug > Components: Metastore, Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-12375.2.patch, HIVE-12375.patch > > > hive.compactor.check.interval can currently be set to as low as 0, which > makes Initiator spin needlessly feeling up logs, etc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12616) NullPointerException when spark session is reused to run a mapjoin
[ https://issues.apache.org/jira/browse/HIVE-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052100#comment-15052100 ] Xuefu Zhang commented on HIVE-12616: Thanks for the explanation. I guess the problem is that user didn't set spark.master explicitly, Hive's default, yarn-cluster, is set only for the HiveConf of the first operation. I think we should set "spark.master" in session level HiveConf. It seems we just need to add one line doing that in the if block below: {code} // load properties from hive configurations, including both spark.* properties, // properties for remote driver RPC, and yarn properties for Spark on YARN mode. String sparkMaster = hiveConf.get("spark.master"); if (sparkMaster == null) { sparkMaster = sparkConf.get("spark.master"); hiveConf.set("spark.master", sparkMaster); } {code} > NullPointerException when spark session is reused to run a mapjoin > -- > > Key: HIVE-12616 > URL: https://issues.apache.org/jira/browse/HIVE-12616 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.3.0 >Reporter: Nemon Lou >Assignee: Nemon Lou > Attachments: HIVE-12616.patch > > > The way to reproduce: > {noformat} > set hive.execution.engine=spark; > create table if not exists test(id int); > create table if not exists test1(id int); > insert into test values(1); > insert into test1 values(1); > select max(a.id) from test a ,test1 b > where a.id = b.id; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result
[ https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051993#comment-15051993 ] Xiaowei Wang commented on HIVE-12541: - I am talking about the regex in the symbolic path .I found, if a path of the symbolic file contains a regex , select is supported in default. So I mistakenly think that symlinktextinputformat support regex . And , I think it shoule be supported with the regex in the path . Thank you for your attention. > Using CombineHiveInputFormat with the origin inputformat > SymbolicTextInputFormat ,it will get a wrong result > -- > > Key: HIVE-12541 > URL: https://issues.apache.org/jira/browse/HIVE-12541 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.2.0, 1.2.1 >Reporter: Xiaowei Wang >Assignee: Xiaowei Wang > Fix For: 1.2.1 > > Attachments: HIVE-12541.1.patch > > > Table desc : > {noformat} > CREATE External TABLE `symlink_text_input_format`( > `key` string, > `value` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'viewfs://nsX/user/hive/warehouse/symlink_text_input_format' > {noformat} > There is a link file in the dir > '/user/hive/warehouse/symlink_text_input_format' , the content of the link > file is > {noformat} > viewfs://nsx/tmp/symlink* > {noformat} > it contains one path ,and the path contains a regex! > Execute the sql : > {noformat} > set hive.rework.mapredwork = true ; > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > set mapred.min.split.size.per.rack= 0 ; > set mapred.min.split.size.per.node= 0 ; > set mapred.max.split.size= 0 ; > select count(*) from symlink_text_input_format ; > {noformat} > It will get wrong result :0 > At the same time ,I add a test case in the patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12367) Lock/unlock database should add current database to inputs and outputs of authz hook
[ https://issues.apache.org/jira/browse/HIVE-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dapeng Sun updated HIVE-12367: -- Attachment: HIVE-12367.004.patch > Lock/unlock database should add current database to inputs and outputs of > authz hook > > > Key: HIVE-12367 > URL: https://issues.apache.org/jira/browse/HIVE-12367 > Project: Hive > Issue Type: Bug > Components: Authorization >Affects Versions: 1.2.1 >Reporter: Dapeng Sun >Assignee: Dapeng Sun > Attachments: HIVE-12367.001.patch, HIVE-12367.002.patch, > HIVE-12367.003.patch, HIVE-12367.004.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12628) Eliminate flakiness in TestMetrics
[ https://issues.apache.org/jira/browse/HIVE-12628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052004#comment-15052004 ] Hive QA commented on HIVE-12628: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12776670/HIVE-12628.patch {color:green}SUCCESS:{color} +1 due to 7 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 9891 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6311/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6311/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6311/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12776670 - PreCommit-HIVE-TRUNK-Build > Eliminate flakiness in TestMetrics > -- > > Key: HIVE-12628 > URL: https://issues.apache.org/jira/browse/HIVE-12628 > Project: Hive > Issue Type: Test > Components: Test >Affects Versions: 2.1.0 >Reporter: Szehon Ho >Assignee: Szehon Ho > Attachments: HIVE-12628.patch > > > TestMetrics relies on timing of json file dumps. Rewrite these tests to > eliminate flakiness. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12367) Lock/unlock database should add current database to inputs and outputs of authz hook
[ https://issues.apache.org/jira/browse/HIVE-12367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052093#comment-15052093 ] Dapeng Sun commented on HIVE-12367: --- Update patch with master. > Lock/unlock database should add current database to inputs and outputs of > authz hook > > > Key: HIVE-12367 > URL: https://issues.apache.org/jira/browse/HIVE-12367 > Project: Hive > Issue Type: Bug > Components: Authorization >Affects Versions: 1.2.1 >Reporter: Dapeng Sun >Assignee: Dapeng Sun > Attachments: HIVE-12367.001.patch, HIVE-12367.002.patch, > HIVE-12367.003.patch, HIVE-12367.004.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result
[ https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052092#comment-15052092 ] Chaoyu Tang commented on HIVE-12541: So basically the regex rules used for symbolic path are same as those documented in FileSystem.globStatus, could you add more test cases with symbolic paths having different regex, and even at different path levels? your case is ../data/files/T* which means the files starting with 0 or more Ts, right? > Using CombineHiveInputFormat with the origin inputformat > SymbolicTextInputFormat ,it will get a wrong result > -- > > Key: HIVE-12541 > URL: https://issues.apache.org/jira/browse/HIVE-12541 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.2.0, 1.2.1 >Reporter: Xiaowei Wang >Assignee: Xiaowei Wang > Fix For: 1.2.1 > > Attachments: HIVE-12541.1.patch > > > Table desc : > {noformat} > CREATE External TABLE `symlink_text_input_format`( > `key` string, > `value` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'viewfs://nsX/user/hive/warehouse/symlink_text_input_format' > {noformat} > There is a link file in the dir > '/user/hive/warehouse/symlink_text_input_format' , the content of the link > file is > {noformat} > viewfs://nsx/tmp/symlink* > {noformat} > it contains one path ,and the path contains a regex! > Execute the sql : > {noformat} > set hive.rework.mapredwork = true ; > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > set mapred.min.split.size.per.rack= 0 ; > set mapred.min.split.size.per.node= 0 ; > set mapred.max.split.size= 0 ; > select count(*) from symlink_text_input_format ; > {noformat} > It will get wrong result :0 > At the same time ,I add a test case in the patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12596) Delete timestamp row throws java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
[ https://issues.apache.org/jira/browse/HIVE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052107#comment-15052107 ] Prasanth Jayachandran commented on HIVE-12596: -- MiniTezCliDriver failed due to initialization issue. I ran these tests in my local machine and all of them ran without any issues. All other 11 test failures are happening in trunk already. > Delete timestamp row throws java.lang.IllegalArgumentException: Timestamp > format must be -mm-dd hh:mm:ss[.f] > > > Key: HIVE-12596 > URL: https://issues.apache.org/jira/browse/HIVE-12596 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Takahiko Saito >Assignee: Prasanth Jayachandran > Attachments: HIVE-12596.1.patch > > > Run the below: > {noformat} > create table test_acid( i int, ts timestamp) > clustered by (i) into 2 buckets > stored as orc > tblproperties ('transactional'='true'); > insert into table test_acid values (1, '2014-09-14 12:34:30'); > delete from test_acid where ts = '2014-15-16 17:18:19.20'; > {noformat} > The below error is thrown: > {noformat} > 15/12/04 19:55:49 INFO SessionState: Map 1: -/- Reducer 2: 0/2 > Status: Failed > 15/12/04 19:55:49 ERROR SessionState: Status: Failed > Vertex failed, vertexName=Map 1, vertexId=vertex_1447960616881_0022_2_00, > diagnostics=[Vertex vertex_1447960616881_0022_2_00 [Map 1] killed/failed due > to:ROOT_INPUT_INIT_FAILURE, Vertex Input: test_acid initializer failed, > vertex=vertex_1447960616881_0022_2_00 [Map 1], > java.lang.IllegalArgumentException: Timestamp format must be -mm-dd > hh:mm:ss[.f] > at java.sql.Timestamp.valueOf(Timestamp.java:237) > at > org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.boxLiteral(ConvertAstToSearchArg.java:160) > at > org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.findLiteral(ConvertAstToSearchArg.java:191) > at > org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.createLeaf(ConvertAstToSearchArg.java:268) > at > org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.createLeaf(ConvertAstToSearchArg.java:326) > at > org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.parse(ConvertAstToSearchArg.java:377) > at > org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.(ConvertAstToSearchArg.java:68) > at > org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.create(ConvertAstToSearchArg.java:417) > at > org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.createFromConf(ConvertAstToSearchArg.java:436) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.(OrcInputFormat.java:484) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1121) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1207) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:369) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:481) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:160) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:246) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:240) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:240) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:227) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Not sure if this change is intended as the issue is not seen with ver. 1.2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12650) Increase default value of hive.spark.client.server.connect.timeout to exceeds spark.yarn.am.waitTime
[ https://issues.apache.org/jira/browse/HIVE-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JoneZhang updated HIVE-12650: - Affects Version/s: 1.1.1 1.2.1 > Increase default value of hive.spark.client.server.connect.timeout to exceeds > spark.yarn.am.waitTime > > > Key: HIVE-12650 > URL: https://issues.apache.org/jira/browse/HIVE-12650 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.1, 1.2.1 >Reporter: JoneZhang >Assignee: Xuefu Zhang > > I think hive.spark.client.server.connect.timeout should be set greater than > spark.yarn.am.waitTime. The default value for > spark.yarn.am.waitTime is 100s, and the default value for > hive.spark.client.server.connect.timeout is 90s, which is not good. We can > increase it to a larger value such as 120s. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result
[ https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052110#comment-15052110 ] Xiaowei Wang commented on HIVE-12541: - ../data/files/T*means the files starting with T > Using CombineHiveInputFormat with the origin inputformat > SymbolicTextInputFormat ,it will get a wrong result > -- > > Key: HIVE-12541 > URL: https://issues.apache.org/jira/browse/HIVE-12541 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.2.0, 1.2.1 >Reporter: Xiaowei Wang >Assignee: Xiaowei Wang > Fix For: 1.2.1 > > Attachments: HIVE-12541.1.patch > > > Table desc : > {noformat} > CREATE External TABLE `symlink_text_input_format`( > `key` string, > `value` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'viewfs://nsX/user/hive/warehouse/symlink_text_input_format' > {noformat} > There is a link file in the dir > '/user/hive/warehouse/symlink_text_input_format' , the content of the link > file is > {noformat} > viewfs://nsx/tmp/symlink* > {noformat} > it contains one path ,and the path contains a regex! > Execute the sql : > {noformat} > set hive.rework.mapredwork = true ; > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > set mapred.min.split.size.per.rack= 0 ; > set mapred.min.split.size.per.node= 0 ; > set mapred.max.split.size= 0 ; > select count(*) from symlink_text_input_format ; > {noformat} > It will get wrong result :0 > At the same time ,I add a test case in the patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12649) Hive on Spark will resubmitted application when not enough resouces to launch yarn application master
[ https://issues.apache.org/jira/browse/HIVE-12649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JoneZhang updated HIVE-12649: - Description: Hive on spark will estimate reducer number when the query is not set reduce number,which cause a application submit.The application will pending if the yarn queue's resources is insufficient. So there are more than one pending applications probably because there are more than one estimate call.The failure is soft, so it doesn't prevent subsequent processings. We can make that a hard failure That code is found in at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:112) at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:115) was: Hive on spark will estimate reducer number when the query is not set reduce number,which cause a application submit.The application will pending if the yarn queue's resources is insufficient. So there are more than one pending applications probably because there are more than one estimate call.The failure is soft, so it doesn't prevent subsequent processings. We can make that a hard failure That code is found in 728237 at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:112) 728238 at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:115) > Hive on Spark will resubmitted application when not enough resouces to launch > yarn application master > - > > Key: HIVE-12649 > URL: https://issues.apache.org/jira/browse/HIVE-12649 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.1, 1.2.1 >Reporter: JoneZhang >Assignee: Xuefu Zhang > > Hive on spark will estimate reducer number when the query is not set reduce > number,which cause a application submit.The application will pending if the > yarn queue's resources is insufficient. > So there are more than one pending applications probably because > there are more than one estimate call.The failure is soft, so it doesn't > prevent subsequent processings. We can make that a hard failure > That code is found in > at > org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:112) > at > org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:115) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12355) Keep Obj Inspectors in Sync with RowSchema
[ https://issues.apache.org/jira/browse/HIVE-12355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052301#comment-15052301 ] Laljo John Pullokkaran commented on HIVE-12355: --- TS keeps pruned col list; private List neededColumnIDs; private List neededColumns; This is what TS really outputs. There is two parts to this: 1. RS should match needColumns (I have a patch for this as part of union fix) 2. Output Obj Ins should match neededColumns This bug is for #2. > Keep Obj Inspectors in Sync with RowSchema > -- > > Key: HIVE-12355 > URL: https://issues.apache.org/jira/browse/HIVE-12355 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.0.0, 1.1.0, 1.2.1 >Reporter: Laljo John Pullokkaran >Assignee: Ashutosh Chauhan > Attachments: HIVE-12355.1.patch > > > Currently Not all operators match their Output Obj inspectors to Row schema. > Many times OutputObjectInspectors may be more than needed. > This causes problems especially with union. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11603) IndexOutOfBoundsException thrown when accessing a union all subquery and filtering on a column which does not exist in all underlying tables
[ https://issues.apache.org/jira/browse/HIVE-11603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052306#comment-15052306 ] Laljo John Pullokkaran commented on HIVE-11603: --- There is an issue on master too; but is kind of doesn't get exposed due to some optimizations. > IndexOutOfBoundsException thrown when accessing a union all subquery and > filtering on a column which does not exist in all underlying tables > > > Key: HIVE-11603 > URL: https://issues.apache.org/jira/browse/HIVE-11603 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.3.0, 1.2.1 > Environment: Hadoop 2.6 >Reporter: Nicholas Brenwald >Assignee: Laljo John Pullokkaran >Priority: Minor > Attachments: HIVE-11603.1.patch, HIVE-11603.2.patch > > > Create two empty tables t1 and t2 > {code} > CREATE TABLE t1(c1 STRING); > CREATE TABLE t2(c1 STRING, c2 INT); > {code} > Create a view on these two tables > {code} > CREATE VIEW v1 AS > SELECT c1, c2 > FROM ( > SELECT c1, CAST(NULL AS INT) AS c2 FROM t1 > UNION ALL > SELECT c1, c2 FROM t2 > ) x; > {code} > Then run > {code} > SELECT COUNT(*) from v1 > WHERE c2 = 0; > {code} > We expect to get a result of zero, but instead the query fails with stack > trace: > {code} > Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:635) > at java.util.ArrayList.get(ArrayList.java:411) > at > org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) > at > org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:442) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:119) > ... 22 more > {code} > Workarounds include disabling ppd, > {code} > set hive.optimize.ppd=false; > {code} > Or changing the view so that column c2 is null cast to double: > {code} > CREATE VIEW v1_workaround AS > SELECT c1, c2 > FROM ( > SELECT c1, CAST(NULL AS DOUBLE) AS c2 FROM t1 > UNION ALL > SELECT c1, c2 FROM t2 > ) x; > {code} > The problem seems to occur in branch-1.1, branch-1.2, branch-1 but seems to > be resolved in master (2.0.0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12615) Do not start spark session when only explain
[ https://issues.apache.org/jira/browse/HIVE-12615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052319#comment-15052319 ] Hive QA commented on HIVE-12615: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12776700/HIVE-12615.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 293 failed/errored test(s), 9894 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap_auto org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_parallel_orderby org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join15 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18_multi_distinct org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join20 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join21 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join23 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join26 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join27 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join28 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join29 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join30 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join31 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join32 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join6 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_stats org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_without_localtask org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_smb_mapjoin_14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_6 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_9 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_tez1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_tez2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_column_access_stats org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_count org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cross_join org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cross_product_check_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_cross_product_check_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ctas org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_dynamic_rdd_cache org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_escape_clusterby1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_escape_distributeby1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_escape_sortby1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby1_map org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby1_map_nomap org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby1_map_skew org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby1_noskew
[jira] [Updated] (HIVE-12652) SymbolicTextInputFormat should supports the path with regex ,especially using CombineHiveInputFormat .Add test sql .
[ https://issues.apache.org/jira/browse/HIVE-12652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaowei Wang updated HIVE-12652: Attachment: HIVE-12652.0.patch > SymbolicTextInputFormat should supports the path with regex ,especially > using CombineHiveInputFormat .Add test sql . > -- > > Key: HIVE-12652 > URL: https://issues.apache.org/jira/browse/HIVE-12652 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Xiaowei Wang >Assignee: Xiaowei Wang > Fix For: 1.2.1 > > Attachments: HIVE-12652.0.patch > > > 1, In fact,SybolicTextInputFormat supports the path with regex .I add some > test sql . > 2, But ,when using CombineHiveInputFormat to merge small file , It cannot > resolve the path with regex ,so it will get a wrong result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result
[ https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052327#comment-15052327 ] Xiaowei Wang commented on HIVE-12541: - I add some test cases in another jira https://issues.apache.org/jira/browse/HIVE-12652 . > Using CombineHiveInputFormat with the origin inputformat > SymbolicTextInputFormat ,it will get a wrong result > -- > > Key: HIVE-12541 > URL: https://issues.apache.org/jira/browse/HIVE-12541 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.2.0, 1.2.1 >Reporter: Xiaowei Wang >Assignee: Xiaowei Wang > Fix For: 1.2.1 > > Attachments: HIVE-12541.1.patch > > > Table desc : > {noformat} > CREATE External TABLE `symlink_text_input_format`( > `key` string, > `value` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'viewfs://nsX/user/hive/warehouse/symlink_text_input_format' > {noformat} > There is a link file in the dir > '/user/hive/warehouse/symlink_text_input_format' , the content of the link > file is > {noformat} > viewfs://nsx/tmp/symlink* > {noformat} > it contains one path ,and the path contains a regex! > Execute the sql : > {noformat} > set hive.rework.mapredwork = true ; > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > set mapred.min.split.size.per.rack= 0 ; > set mapred.min.split.size.per.node= 0 ; > set mapred.max.split.size= 0 ; > select count(*) from symlink_text_input_format ; > {noformat} > It will get wrong result :0 > At the same time ,I add a test case in the patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result
[ https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052326#comment-15052326 ] Xiaowei Wang commented on HIVE-12541: - [~aihuaxu] [~ctang.ma] [~ychena] May be it is better to talk about in another jira . https://issues.apache.org/jira/browse/HIVE-12652 . > Using CombineHiveInputFormat with the origin inputformat > SymbolicTextInputFormat ,it will get a wrong result > -- > > Key: HIVE-12541 > URL: https://issues.apache.org/jira/browse/HIVE-12541 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.2.0, 1.2.1 >Reporter: Xiaowei Wang >Assignee: Xiaowei Wang > Fix For: 1.2.1 > > Attachments: HIVE-12541.1.patch > > > Table desc : > {noformat} > CREATE External TABLE `symlink_text_input_format`( > `key` string, > `value` string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' > LOCATION > 'viewfs://nsX/user/hive/warehouse/symlink_text_input_format' > {noformat} > There is a link file in the dir > '/user/hive/warehouse/symlink_text_input_format' , the content of the link > file is > {noformat} > viewfs://nsx/tmp/symlink* > {noformat} > it contains one path ,and the path contains a regex! > Execute the sql : > {noformat} > set hive.rework.mapredwork = true ; > set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; > set mapred.min.split.size.per.rack= 0 ; > set mapred.min.split.size.per.node= 0 ; > set mapred.max.split.size= 0 ; > select count(*) from symlink_text_input_format ; > {noformat} > It will get wrong result :0 > At the same time ,I add a test case in the patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12632) LLAP: don't use IO elevator for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-12632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052332#comment-15052332 ] Hive QA commented on HIVE-12632: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12776706/HIVE-12632.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6315/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6315/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6315/ Messages: {noformat} This message was trimmed, see log for full details [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ udf-classloader-udf2 --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-github-source-source/itests/custom-udfs/udf-classloader-udf2/src/test/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ udf-classloader-udf2 --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/itests/custom-udfs/udf-classloader-udf2/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/itests/custom-udfs/udf-classloader-udf2/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-github-source-source/itests/custom-udfs/udf-classloader-udf2/target/tmp/conf [copy] Copying 14 files to /data/hive-ptest/working/apache-github-source-source/itests/custom-udfs/udf-classloader-udf2/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ udf-classloader-udf2 --- [INFO] No sources to compile [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ udf-classloader-udf2 --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ udf-classloader-udf2 --- [INFO] Building jar: /data/hive-ptest/working/apache-github-source-source/itests/custom-udfs/udf-classloader-udf2/target/udf-classloader-udf2-2.1.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ udf-classloader-udf2 --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ udf-classloader-udf2 --- [INFO] Installing /data/hive-ptest/working/apache-github-source-source/itests/custom-udfs/udf-classloader-udf2/target/udf-classloader-udf2-2.1.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-it-custom-udfs/udf-classloader-udf2/2.1.0-SNAPSHOT/udf-classloader-udf2-2.1.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-github-source-source/itests/custom-udfs/udf-classloader-udf2/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-it-custom-udfs/udf-classloader-udf2/2.1.0-SNAPSHOT/udf-classloader-udf2-2.1.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive Integration - HCatalog Unit Tests 2.1.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-hcatalog-it-unit --- [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/itests/hcatalog-unit/target [INFO] Deleting /data/hive-ptest/working/apache-github-source-source/itests/hcatalog-unit (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ hive-hcatalog-it-unit --- [INFO] [INFO] --- maven-antrun-plugin:1.7:run (download-spark) @ hive-hcatalog-it-unit --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-hcatalog-it-unit --- [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-hcatalog-it-unit --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-github-source-source/itests/hcatalog-unit/src/main/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-hcatalog-it-unit --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-hcatalog-it-unit --- [INFO] No sources to compile [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ hive-hcatalog-it-unit --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory
[jira] [Updated] (HIVE-12448) Change to tracking of dag status via dagIdentifier instead of dag name
[ https://issues.apache.org/jira/browse/HIVE-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-12448: -- Attachment: HIVE-12448.3.txt Updated patch with rb comments addressed. > Change to tracking of dag status via dagIdentifier instead of dag name > -- > > Key: HIVE-12448 > URL: https://issues.apache.org/jira/browse/HIVE-12448 > Project: Hive > Issue Type: Sub-task > Components: llap >Affects Versions: 2.0.0 >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-12448.1.txt, HIVE-12448.2.txt, HIVE-12448.3.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11890) Create ORC module
[ https://issues.apache.org/jira/browse/HIVE-11890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-11890: - Attachment: HIVE-11890.patch Rebased to trunk. > Create ORC module > - > > Key: HIVE-11890 > URL: https://issues.apache.org/jira/browse/HIVE-11890 > Project: Hive > Issue Type: Sub-task > Components: ORC >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: HIVE-11890.patch, HIVE-11890.patch, HIVE-11890.patch, > HIVE-11890.patch, HIVE-11890.patch, HIVE-11890.patch, HIVE-11890.patch, > HIVE-11890.patch > > > Start moving classes over to the ORC module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11110) Reorder applyPreJoinOrderingTransforms, add NotNULL/FilterMerge rules, improve Filter selectivity estimation
[ https://issues.apache.org/jira/browse/HIVE-0?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-0: -- Attachment: HIVE-0.34.patch > Reorder applyPreJoinOrderingTransforms, add NotNULL/FilterMerge rules, > improve Filter selectivity estimation > > > Key: HIVE-0 > URL: https://issues.apache.org/jira/browse/HIVE-0 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Jesus Camacho Rodriguez >Assignee: Laljo John Pullokkaran > Attachments: HIVE-0-10.patch, HIVE-0-11.patch, > HIVE-0-12.patch, HIVE-0-branch-1.2.patch, HIVE-0.1.patch, > HIVE-0.13.patch, HIVE-0.14.patch, HIVE-0.15.patch, > HIVE-0.16.patch, HIVE-0.17.patch, HIVE-0.18.patch, > HIVE-0.19.patch, HIVE-0.2.patch, HIVE-0.20.patch, > HIVE-0.21.patch, HIVE-0.22.patch, HIVE-0.23.patch, > HIVE-0.24.patch, HIVE-0.25.patch, HIVE-0.26.patch, HIVE-0.27, > HIVE-0.27.patch, HIVE-0.28.patch, HIVE-0.29.patch, > HIVE-0.30.patch, HIVE-0.31.patch, HIVE-0.32.patch, > HIVE-0.33.patch, HIVE-0.34.patch, HIVE-0.4.patch, > HIVE-0.5.patch, HIVE-0.6.patch, HIVE-0.7.patch, > HIVE-0.8.patch, HIVE-0.9.patch, HIVE-0.91.patch, > HIVE-0.92.patch, HIVE-0.patch > > > Query > {code} > select count(*) > from store_sales > ,store_returns > ,date_dim d1 > ,date_dim d2 > where d1.d_quarter_name = '2000Q1' >and d1.d_date_sk = ss_sold_date_sk >and ss_customer_sk = sr_customer_sk >and ss_item_sk = sr_item_sk >and ss_ticket_number = sr_ticket_number >and sr_returned_date_sk = d2.d_date_sk >and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3’); > {code} > The store_sales table is partitioned on ss_sold_date_sk, which is also used > in a join clause. The join clause should add a filter “filterExpr: > ss_sold_date_sk is not null”, which should get pushed the MetaStore when > fetching the stats. Currently this is not done in CBO planning, which results > in the stats from __HIVE_DEFAULT_PARTITION__ to be fetched and considered in > the optimization phase. In particular, this increases the NDV for the join > columns and may result in wrong planning. > Including HiveJoinAddNotNullRule in the optimization phase solves this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12422) LLAP: add security to Web UI endpoint
[ https://issues.apache.org/jira/browse/HIVE-12422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052178#comment-15052178 ] Hive QA commented on HIVE-12422: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12776677/HIVE-12422.02.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 9878 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_udf_max org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_order2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9 org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_vectorized_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_mergejoin org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.hive.jdbc.miniHS2.TestHs2Metrics.testMetrics {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6312/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6312/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6312/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12776677 - PreCommit-HIVE-TRUNK-Build > LLAP: add security to Web UI endpoint > - > > Key: HIVE-12422 > URL: https://issues.apache.org/jira/browse/HIVE-12422 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12422.01.patch, HIVE-12422.02.patch, > HIVE-12422.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8376) Umbrella Jira for HiveServer2 dynamic service discovery
[ https://issues.apache.org/jira/browse/HIVE-8376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052221#comment-15052221 ] Amareshwari Sriramadasu commented on HIVE-8376: --- [~vgumashta], I see all sub tasks of this are resolved. Is anything pending for this jira? Can this be marked resolved? If so, can you resolve it with proper fix version? > Umbrella Jira for HiveServer2 dynamic service discovery > --- > > Key: HIVE-8376 > URL: https://issues.apache.org/jira/browse/HIVE-8376 > Project: Hive > Issue Type: New Feature > Components: HiveServer2, JDBC >Affects Versions: 0.14.0 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > Labels: TODOC14 > Attachments: HiveServer2DynamicServiceDiscovery.pdf > > > Creating an ☂ Jira for documentation purpose. I'll add a detailed doc for the > implementation & usage here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12648) LLAP IO was disabled in CliDriver by accident (and tests are broken)
[ https://issues.apache.org/jira/browse/HIVE-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12648: Attachment: HIVE-12648.01.patch There was a really stupid (and hard to find) error in recent fileId patch. This is why having tests is important... 01 both enables and fixes it. [~prasanth_j] can you take a look? > LLAP IO was disabled in CliDriver by accident (and tests are broken) > > > Key: HIVE-12648 > URL: https://issues.apache.org/jira/browse/HIVE-12648 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12648.01.patch, HIVE-12648.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12648) LLAP IO was disabled in CliDriver by accident (and tests are broken)
[ https://issues.apache.org/jira/browse/HIVE-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12648: Priority: Blocker (was: Major) > LLAP IO was disabled in CliDriver by accident (and tests are broken) > > > Key: HIVE-12648 > URL: https://issues.apache.org/jira/browse/HIVE-12648 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Blocker > Attachments: HIVE-12648.01.patch, HIVE-12648.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12648) LLAP IO was disabled in CliDriver by accident (and tests are broken)
[ https://issues.apache.org/jira/browse/HIVE-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052150#comment-15052150 ] Prasanth Jayachandran commented on HIVE-12648: -- Why is the version string "0.20" while hadoop-1 is removed? Are there unit tests for testing cache? If so are there unit tests for fileId, non-fileId and synthetic fileId cases? > LLAP IO was disabled in CliDriver by accident (and tests are broken) > > > Key: HIVE-12648 > URL: https://issues.apache.org/jira/browse/HIVE-12648 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Blocker > Attachments: HIVE-12648.01.patch, HIVE-12648.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12648) LLAP IO was disabled in CliDriver by accident (and tests are broken)
[ https://issues.apache.org/jira/browse/HIVE-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052151#comment-15052151 ] Prasanth Jayachandran commented on HIVE-12648: -- Other changes, looks good to me. > LLAP IO was disabled in CliDriver by accident (and tests are broken) > > > Key: HIVE-12648 > URL: https://issues.apache.org/jira/browse/HIVE-12648 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Blocker > Attachments: HIVE-12648.01.patch, HIVE-12648.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type
[ https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-12502: --- Attachment: HIVE-12502.1.patch > to_date UDF cannot accept NULLs of VOID type > > > Key: HIVE-12502 > URL: https://issues.apache.org/jira/browse/HIVE-12502 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.0.0 >Reporter: Aaron Tokhy >Assignee: Jason Dere >Priority: Trivial > Attachments: HIVE-12502.1.patch > > > The to_date method behaves differently based off the 'data type' of null > passed in. > hive> select to_date(null); > FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': > TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID > hive> select to_date(cast(null as timestamp)); > OK > NULL > Time taken: 0.031 seconds, Fetched: 1 row(s) > This appears to be a regression introduced in HIVE-5731. The previous > version of to_date would not check the type: > https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type
[ https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052191#comment-15052191 ] Aaron Tokhy commented on HIVE-12502: I've submitted a patch with an associated unit test. > to_date UDF cannot accept NULLs of VOID type > > > Key: HIVE-12502 > URL: https://issues.apache.org/jira/browse/HIVE-12502 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.0.0 >Reporter: Aaron Tokhy >Assignee: Jason Dere >Priority: Trivial > Attachments: HIVE-12502.1.patch > > > The to_date method behaves differently based off the 'data type' of null > passed in. > hive> select to_date(null); > FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': > TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID > hive> select to_date(cast(null as timestamp)); > OK > NULL > Time taken: 0.031 seconds, Fetched: 1 row(s) > This appears to be a regression introduced in HIVE-5731. The previous > version of to_date would not check the type: > https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12528) don't start HS2 Tez sessions in a single thread
[ https://issues.apache.org/jira/browse/HIVE-12528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052192#comment-15052192 ] Siddharth Seth commented on HIVE-12528: --- {code} +if (threadCount <= 1) { + for (int i = 0; i < blockingQueueLength; i++) { +// The queue is FIFO, so if we cycle thru length items, we'd start each session once. +startNextSessionFromQueue(); + } {code} Replace with threadCount == 1 ?, and a precondition for the thredCount to not be below 1 ? The threads end up throwing a RuntimeException in case of an Error. This would otherwise have been caught by the HiveServer2 static initialization block. Is it now relying upon the default uncaught exception handler ? Would be better to propagate the exception upwards as it's done today (and maybe a CompletionService / ListeningExecutor with Callbacks) {code} /* * with this the ordering of sessions in the queue will be (with 2 sessions 3 queues) * s1q1, s1q2, s1q3, s2q1, s2q2, s2q3 there by ensuring uniform distribution of * the sessions across queues at least to begin with. Then as sessions get freed up, the list * may change this ordering. */ {code} This statement no longer stands. Given that it's only for the the first set of jobs anyway - I don't think this is a problem. cc [~vikram.dixit] I'm not sure how the thread safety - specifically visibility aspects are handled (both before and after the patch). SessionStates, TezClient instances etc are created in a single thread (now multiple threads), and then used in completely different threads. What is guaranteeing correct visibility ? > don't start HS2 Tez sessions in a single thread > --- > > Key: HIVE-12528 > URL: https://issues.apache.org/jira/browse/HIVE-12528 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12528.patch > > > Starting sessions in parallel would improve the startup time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type
[ https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-12502: --- Attachment: HIVE-12502-branch-1.patch HIVE-12502.patch > to_date UDF cannot accept NULLs of VOID type > > > Key: HIVE-12502 > URL: https://issues.apache.org/jira/browse/HIVE-12502 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.0.0 >Reporter: Aaron Tokhy >Assignee: Jason Dere >Priority: Trivial > Attachments: HIVE-12502-branch-1.patch, HIVE-12502.patch > > > The to_date method behaves differently based off the 'data type' of null > passed in. > hive> select to_date(null); > FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': > TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID > hive> select to_date(cast(null as timestamp)); > OK > NULL > Time taken: 0.031 seconds, Fetched: 1 row(s) > This appears to be a regression introduced in HIVE-5731. The previous > version of to_date would not check the type: > https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12502) to_date UDF cannot accept NULLs of VOID type
[ https://issues.apache.org/jira/browse/HIVE-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Tokhy updated HIVE-12502: --- Attachment: (was: HIVE-12502.1.patch) > to_date UDF cannot accept NULLs of VOID type > > > Key: HIVE-12502 > URL: https://issues.apache.org/jira/browse/HIVE-12502 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.0.0 >Reporter: Aaron Tokhy >Assignee: Jason Dere >Priority: Trivial > > The to_date method behaves differently based off the 'data type' of null > passed in. > hive> select to_date(null); > FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments 'TOK_NULL': > TO_DATE() only takes STRING/TIMESTAMP/DATEWRITABLE types, got VOID > hive> select to_date(cast(null as timestamp)); > OK > NULL > Time taken: 0.031 seconds, Fetched: 1 row(s) > This appears to be a regression introduced in HIVE-5731. The previous > version of to_date would not check the type: > https://github.com/apache/hive/commit/09b6553214d6db5ec7049b88bbe8ff640a7fef72#diff-204f5588c0767cf372a5ca7e3fb964afL56 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12643) For self describing InputFormat don't replicate schema information in partitions
[ https://issues.apache.org/jira/browse/HIVE-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052200#comment-15052200 ] Hive QA commented on HIVE-12643: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12776691/HIVE-12643.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6313/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6313/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6313/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult [localFile=/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-6313/succeeded/TestCliDriver-cp_mj_rc.q-udf_stddev_pop.q-mapreduce2.q-and-12-more, remoteFile=/home/hiveptest/50.16.94.163-hiveptest-2/logs/, getExitCode()=12, getException()=null, getUser()=hiveptest, getHost()=50.16.94.163, getInstance()=2]: 'ssh_exchange_identification: Connection closed by remote host rsync: connection unexpectedly closed (0 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [receiver=3.0.6] ssh_exchange_identification: Connection closed by remote host rsync: connection unexpectedly closed (0 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [receiver=3.0.6] ssh_exchange_identification: Connection closed by remote host rsync: connection unexpectedly closed (0 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [receiver=3.0.6] ssh_exchange_identification: Connection closed by remote host rsync: connection unexpectedly closed (0 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [receiver=3.0.6] ssh_exchange_identification: Connection closed by remote host rsync: connection unexpectedly closed (0 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [receiver=3.0.6] ' {noformat} This message is automatically generated. ATTACHMENT ID: 12776691 - PreCommit-HIVE-TRUNK-Build > For self describing InputFormat don't replicate schema information in > partitions > > > Key: HIVE-12643 > URL: https://issues.apache.org/jira/browse/HIVE-12643 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-12643.patch > > > Since self describing Input Formats don't use individual partition schemas > for schema resolution, there is no need to send that info to tasks. > Doing this should cut down plan size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)