[jira] [Created] (HIVE-22053) Function name is not normalized when creating function
Rui Li created HIVE-22053: - Summary: Function name is not normalized when creating function Key: HIVE-22053 URL: https://issues.apache.org/jira/browse/HIVE-22053 Project: Hive Issue Type: Bug Components: Standalone Metastore Reporter: Rui Li Assignee: Rui Li If a function is created with a name containing upper case characters, we get NoSuchObjectException when trying to get that function. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-19895) The unique ID in SparkPartitionPruningSinkOperator is no longer needed
Rui Li created HIVE-19895: - Summary: The unique ID in SparkPartitionPruningSinkOperator is no longer needed Key: HIVE-19895 URL: https://issues.apache.org/jira/browse/HIVE-19895 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19671) Distribute by rand() can lead to data inconsistency
Rui Li created HIVE-19671: - Summary: Distribute by rand() can lead to data inconsistency Key: HIVE-19671 URL: https://issues.apache.org/jira/browse/HIVE-19671 Project: Hive Issue Type: Bug Reporter: Rui Li Noticed the following queries can give different results: {code} select count(*) from tbl; select count(*) from (select * from tbl distribute by rand()); {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19439) MapWork shouldn't be reused when Spark task fails during initialization
Rui Li created HIVE-19439: - Summary: MapWork shouldn't be reused when Spark task fails during initialization Key: HIVE-19439 URL: https://issues.apache.org/jira/browse/HIVE-19439 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Issue identified in HIVE-19388. When a Spark task fails during initializing the map operator, the task is retried with the same MapWork retrieved from cache. This can be problematic because the MapWork may be partially initialized, e.g. some operators are already in INIT state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19316) StatsTask fails due to ClassCastException
Rui Li created HIVE-19316: - Summary: StatsTask fails due to ClassCastException Key: HIVE-19316 URL: https://issues.apache.org/jira/browse/HIVE-19316 Project: Hive Issue Type: Bug Components: Statistics Reporter: Rui Li The stack trace: {noformat} 2018-04-26T20:17:37,674 ERROR [pool-7-thread-11] metastore.RetryingHMSHandler: java.lang.ClassCastException: org.apache.hadoop.hive.metastore.api.LongColumnStatsData cannot be cast to org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector at org.apache.hadoop.hive.metastore.columnstats.merge.LongColumnStatsMerger.merge(LongColumnStatsMerger.java:30) at org.apache.hadoop.hive.metastore.utils.MetaStoreUtils.mergeColStats(MetaStoreUtils.java:1052) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:7202) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) at com.sun.proxy.$Proxy26.set_aggr_stats_for(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:16795) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:16779) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18955) HoS: Unable to create Channel from class NioServerSocketChannel
Rui Li created HIVE-18955: - Summary: HoS: Unable to create Channel from class NioServerSocketChannel Key: HIVE-18955 URL: https://issues.apache.org/jira/browse/HIVE-18955 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Hit the issue when trying launch spark job. Stack trace: {noformat} Caused by: java.lang.NoSuchMethodError: io.netty.channel.DefaultChannelId.newInstance()Lio/netty/channel/DefaultChannelId; at io.netty.channel.AbstractChannel.newId(AbstractChannel.java:111) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.channel.AbstractChannel.(AbstractChannel.java:83) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.channel.nio.AbstractNioChannel.(AbstractNioChannel.java:84) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.channel.nio.AbstractNioMessageChannel.(AbstractNioMessageChannel.java:42) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.channel.socket.nio.NioServerSocketChannel.(NioServerSocketChannel.java:86) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at io.netty.channel.socket.nio.NioServerSocketChannel.(NioServerSocketChannel.java:72) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_151] at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_151] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_151] at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_151] at io.netty.channel.ReflectiveChannelFactory.newChannel(ReflectiveChannelFactory.java:38) ~[netty-all-4.1.17.Final.jar:4.1.17.Final] ... 32 more {noformat} It seems we have conflicts versions of class {{io.netty.channel.DefaultChannelId}} from async-http-client.jar and netty-all.jar -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18647) Cannot create table: Unknown column 'CREATION_METADATA_MV_CREATION_METADATA_ID_OID'
Rui Li created HIVE-18647: - Summary: Cannot create table: Unknown column 'CREATION_METADATA_MV_CREATION_METADATA_ID_OID' Key: HIVE-18647 URL: https://issues.apache.org/jira/browse/HIVE-18647 Project: Hive Issue Type: Bug Reporter: Rui Li -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-18442) HoS: No FileSystem for scheme: nullscan
Rui Li created HIVE-18442: - Summary: HoS: No FileSystem for scheme: nullscan Key: HIVE-18442 URL: https://issues.apache.org/jira/browse/HIVE-18442 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18282) Spark tar is downloaded every time for itest
Rui Li created HIVE-18282: - Summary: Spark tar is downloaded every time for itest Key: HIVE-18282 URL: https://issues.apache.org/jira/browse/HIVE-18282 Project: Hive Issue Type: Test Reporter: Rui Li Seems we missed the md5 file for spark-2.2.0? cc [~kellyzly], [~stakiar] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18242) VectorizedRowBatch cast exception when analyzing partitioned table
Rui Li created HIVE-18242: - Summary: VectorizedRowBatch cast exception when analyzing partitioned table Key: HIVE-18242 URL: https://issues.apache.org/jira/browse/HIVE-18242 Project: Hive Issue Type: Bug Reporter: Rui Li Happens when I run the following (vectorization enabled): {code} ANALYZE TABLE srcpart PARTITION(ds, hr) COMPUTE STATISTICS; {code} The stack trace is: {noformat} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.copyObject(WritableStringObjectInspector.java:36) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:425) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.partialCopyToStandardObject(ObjectInspectorUtils.java:314) at org.apache.hadoop.hive.ql.exec.TableScanOperator.gatherStats(TableScanOperator.java:191) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:138) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.setupPartitionContextVars(VectorMapOperator.java:682) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.cleanUpInputFileChangedOp(VectorMapOperator.java:607) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1187) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:784) {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18148) NPE in SparkDynamicPartitionPruningResolver
Rui Li created HIVE-18148: - Summary: NPE in SparkDynamicPartitionPruningResolver Key: HIVE-18148 URL: https://issues.apache.org/jira/browse/HIVE-18148 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li The stack trace is: {noformat} 2017-11-27T10:32:38,752 ERROR [e6c8aab5-ddd2-461d-b185-a7597c3e7519 main] ql.Driver: FAILED: NullPointerException null java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.physical.SparkDynamicPartitionPruningResolver$SparkDynamicPartitionPruningDispatcher.dispatch(SparkDynamicPartitionPruningResolver.java:100) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180) at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125) at org.apache.hadoop.hive.ql.optimizer.physical.SparkDynamicPartitionPruningResolver.resolve(SparkDynamicPartitionPruningResolver.java:74) at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeTaskPlan(SparkCompiler.java:568) {noformat} At this stage, there shouldn't be a DPP sink whose target map work is null. The root cause seems to be a malformed operator tree generated by SplitOpTreeForDPP. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18129) The ConditionalResolverMergeFiles doesn't merge empty files
Rui Li created HIVE-18129: - Summary: The ConditionalResolverMergeFiles doesn't merge empty files Key: HIVE-18129 URL: https://issues.apache.org/jira/browse/HIVE-18129 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li If a query produces lots of empty files, these files won't be merged by the merge-small-file feature. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18111) Fix temp path for Spark DPP sink
Rui Li created HIVE-18111: - Summary: Fix temp path for Spark DPP sink Key: HIVE-18111 URL: https://issues.apache.org/jira/browse/HIVE-18111 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-18041) Add SORT_QUERY_RESULTS to subquery_multi
Rui Li created HIVE-18041: - Summary: Add SORT_QUERY_RESULTS to subquery_multi Key: HIVE-18041 URL: https://issues.apache.org/jira/browse/HIVE-18041 Project: Hive Issue Type: Test Reporter: Rui Li Priority: Trivial -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17976) HoS: don't set output collector if there's no data to process
Rui Li created HIVE-17976: - Summary: HoS: don't set output collector if there's no data to process Key: HIVE-17976 URL: https://issues.apache.org/jira/browse/HIVE-17976 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Minor MR doesn't set an output collector if no row is processed, i.e. {{ExecMapper::map}} is never called. Let's investigate whether Spark should do the same. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17964) HoS: some spark configs doesn't require re-creating a session
Rui Li created HIVE-17964: - Summary: HoS: some spark configs doesn't require re-creating a session Key: HIVE-17964 URL: https://issues.apache.org/jira/browse/HIVE-17964 Project: Hive Issue Type: Improvement Reporter: Rui Li Priority: Minor I guess the {{hive.spark.}} configs were initially intended for the RSC. Therefore when they're changed, we'll re-create the session for them to take effect. There're some configs not related to RSC that also start with {{hive.spark.}}. We'd better rename them so that we don't unnecessarily re-create sessions, which is usually time consuming. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17877) HoS: combine equivalent DPP sink works
Rui Li created HIVE-17877: - Summary: HoS: combine equivalent DPP sink works Key: HIVE-17877 URL: https://issues.apache.org/jira/browse/HIVE-17877 Project: Hive Issue Type: Improvement Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17383) ArrayIndexOutOfBoundsException in VectorGroupByOperator
Rui Li created HIVE-17383: - Summary: ArrayIndexOutOfBoundsException in VectorGroupByOperator Key: HIVE-17383 URL: https://issues.apache.org/jira/browse/HIVE-17383 Project: Hive Issue Type: Bug Reporter: Rui Li Query to reproduce: {noformat} set hive.cbo.enable=false; select count(*) from (select key from src group by key) s where s.key='98'; {noformat} The stack trace is: {noformat} Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:831) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:174) at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1046) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:462) ... 18 more {noformat} More details can be found in HIVE-16823 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified
Rui Li created HIVE-17321: - Summary: HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified Key: HIVE-17321 URL: https://issues.apache.org/jira/browse/HIVE-17321 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17193) HoS: don't combine map works that are targets of different DPPs
Rui Li created HIVE-17193: - Summary: HoS: don't combine map works that are targets of different DPPs Key: HIVE-17193 URL: https://issues.apache.org/jira/browse/HIVE-17193 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17133) NoSuchMethodError in Hadoop FileStatus.compareTo
Rui Li created HIVE-17133: - Summary: NoSuchMethodError in Hadoop FileStatus.compareTo Key: HIVE-17133 URL: https://issues.apache.org/jira/browse/HIVE-17133 Project: Hive Issue Type: Bug Reporter: Rui Li The stack trace is: {noformat} Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.fs.FileStatus.compareTo(Lorg/apache/hadoop/fs/FileStatus;)I at org.apache.hadoop.hive.ql.io.AcidUtils.lambda$getAcidState$0(AcidUtils.java:931) at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355) at java.util.TimSort.sort(TimSort.java:234) at java.util.Arrays.sort(Arrays.java:1512) at java.util.ArrayList.sort(ArrayList.java:1454) at java.util.Collections.sort(Collections.java:175) at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:929) {noformat} I'm on Hive master and using Hadoop 2.7.2. The method signature in Hadoop 2.7.2 is: https://github.com/apache/hadoop/blob/release-2.7.2-RC2/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileStatus.java#L336 In Hadoop 2.8.0 it becomes: https://github.com/apache/hadoop/blob/release-2.8.0-RC3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileStatus.java#L332 I think that breaks binary compatibility. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17114) HoS: Possible skew in shuffling when data is not really skewed
Rui Li created HIVE-17114: - Summary: HoS: Possible skew in shuffling when data is not really skewed Key: HIVE-17114 URL: https://issues.apache.org/jira/browse/HIVE-17114 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li Priority: Minor Observed in HoS and may apply to other engines as well. When we join 2 tables on a single int key, we use the key itself as hash code in {{ObjectInspectorUtils.hashCode}}: {code} case INT: return ((IntObjectInspector) poi).get(o); {code} Suppose the keys are different but are all some multiples of 10. And if we choose 10 as #reducers, the shuffle will be skewed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17034) The spark tar for itests is downloaded every time if md5sum is not installed
Rui Li created HIVE-17034: - Summary: The spark tar for itests is downloaded every time if md5sum is not installed Key: HIVE-17034 URL: https://issues.apache.org/jira/browse/HIVE-17034 Project: Hive Issue Type: Test Reporter: Rui Li Assignee: Rui Li I think we should either skip verifying md5, or fail the build to let developer know md5sum is required. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17020) Aggressive RS dedup can incorrectly remove OP tree branch
Rui Li created HIVE-17020: - Summary: Aggressive RS dedup can incorrectly remove OP tree branch Key: HIVE-17020 URL: https://issues.apache.org/jira/browse/HIVE-17020 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li Suppose we have an OP tree like this: {noformat} ... | RS[1] | SEL[2] /\ SEL[3] SEL[4] | | RS[5] FS[6] | ... {noformat} When doing aggressive RS dedup, we'll remove all the operators between RS5 and RS1, and thus the branch containing FS6 is lost. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-16876) RpcServer should be re-created when Rpc configs change
Rui Li created HIVE-16876: - Summary: RpcServer should be re-created when Rpc configs change Key: HIVE-16876 URL: https://issues.apache.org/jira/browse/HIVE-16876 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16767) Update people website with recent changes
Rui Li created HIVE-16767: - Summary: Update people website with recent changes Key: HIVE-16767 URL: https://issues.apache.org/jira/browse/HIVE-16767 Project: Hive Issue Type: Task Components: Documentation Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16739) HoS DPP generates malformed plan when hive.tez.dynamic.semijoin.reduction is on
Rui Li created HIVE-16739: - Summary: HoS DPP generates malformed plan when hive.tez.dynamic.semijoin.reduction is on Key: HIVE-16739 URL: https://issues.apache.org/jira/browse/HIVE-16739 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li HoS DPP currently can't handle dynamic semi join and will result in {{ClassCastException org.apache.hadoop.hive.ql.plan.ReduceWork cannot be cast to org.apache.hadoop.hive.ql.plan.MapWork}}. We should either disable or implement it for HoS. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle
Rui Li created HIVE-16659: - Summary: Query plan should reflect hive.spark.use.groupby.shuffle Key: HIVE-16659 URL: https://issues.apache.org/jira/browse/HIVE-16659 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li It's useful to show the shuffle type used in the query plan. Currently it shows "GROUP" no matter what we set for hive.spark.use.groupby.shuffle. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16613) SaslClientHandler.sendHello is eating exceptions
Rui Li created HIVE-16613: - Summary: SaslClientHandler.sendHello is eating exceptions Key: HIVE-16613 URL: https://issues.apache.org/jira/browse/HIVE-16613 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16593) SparkClientFactory.stop may prevent JVM from exiting
Rui Li created HIVE-16593: - Summary: SparkClientFactory.stop may prevent JVM from exiting Key: HIVE-16593 URL: https://issues.apache.org/jira/browse/HIVE-16593 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16573) In-place update for HoS can't be disabled
Rui Li created HIVE-16573: - Summary: In-place update for HoS can't be disabled Key: HIVE-16573 URL: https://issues.apache.org/jira/browse/HIVE-16573 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Minor {{hive.spark.exec.inplace.progress}} has no effect -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16459) Cancel outstanding RPCs when channel closes
Rui Li created HIVE-16459: - Summary: Cancel outstanding RPCs when channel closes Key: HIVE-16459 URL: https://issues.apache.org/jira/browse/HIVE-16459 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16418) Allow HiveKey to skip some bytes for comparison
Rui Li created HIVE-16418: - Summary: Allow HiveKey to skip some bytes for comparison Key: HIVE-16418 URL: https://issues.apache.org/jira/browse/HIVE-16418 Project: Hive Issue Type: New Feature Reporter: Rui Li Assignee: Rui Li The feature is required when we have to serialize some fields and prevent them from being used in comparison, e.g. HIVE-14412. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16315) Describe table doesn't show num of partitions
Rui Li created HIVE-16315: - Summary: Describe table doesn't show num of partitions Key: HIVE-16315 URL: https://issues.apache.org/jira/browse/HIVE-16315 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li This doesn't comply with our wiki: https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-Examples -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16155) No need for ConditionalTask if no conditional map join is created
Rui Li created HIVE-16155: - Summary: No need for ConditionalTask if no conditional map join is created Key: HIVE-16155 URL: https://issues.apache.org/jira/browse/HIVE-16155 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16047) Shouldn't try to get KeyProvider unless encryption is enabled
Rui Li created HIVE-16047: - Summary: Shouldn't try to get KeyProvider unless encryption is enabled Key: HIVE-16047 URL: https://issues.apache.org/jira/browse/HIVE-16047 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li Priority: Minor Found lots of following errors in HS2 log: {noformat} hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !! {noformat} Similar to HDFS-7931 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-15860) RemoteSparkJobMonitor may hang when RemoteDriver exists abnormally
Rui Li created HIVE-15860: - Summary: RemoteSparkJobMonitor may hang when RemoteDriver exists abnormally Key: HIVE-15860 URL: https://issues.apache.org/jira/browse/HIVE-15860 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-15526) Some tests need SORT_QUERY_RESULTS
Rui Li created HIVE-15526: - Summary: Some tests need SORT_QUERY_RESULTS Key: HIVE-15526 URL: https://issues.apache.org/jira/browse/HIVE-15526 Project: Hive Issue Type: Test Reporter: Rui Li Assignee: Rui Li Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15428) HoS DPP doesn't remove cyclic dependency
Rui Li created HIVE-15428: - Summary: HoS DPP doesn't remove cyclic dependency Key: HIVE-15428 URL: https://issues.apache.org/jira/browse/HIVE-15428 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15357) Fix and re-enable the spark-only tests
Rui Li created HIVE-15357: - Summary: Fix and re-enable the spark-only tests Key: HIVE-15357 URL: https://issues.apache.org/jira/browse/HIVE-15357 Project: Hive Issue Type: Test Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive
Rui Li created HIVE-15302: - Summary: Relax the requirement that HoS needs Spark built w/o Hive Key: HIVE-15302 URL: https://issues.apache.org/jira/browse/HIVE-15302 Project: Hive Issue Type: Improvement Reporter: Rui Li Assignee: Rui Li This requirement becomes more and more unacceptable as SparkSQL becomes widely adopted. Let's use this JIRA to find out how we can relax the limitation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15299) Yarn-cluster and yarn-client deprecated in Spark 2.0
Rui Li created HIVE-15299: - Summary: Yarn-cluster and yarn-client deprecated in Spark 2.0 Key: HIVE-15299 URL: https://issues.apache.org/jira/browse/HIVE-15299 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li Priority: Minor Need to use master "yarn" with specified deploy mode instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15202) Concurrent compactions for the same partition may generate malformed folder structure
Rui Li created HIVE-15202: - Summary: Concurrent compactions for the same partition may generate malformed folder structure Key: HIVE-15202 URL: https://issues.apache.org/jira/browse/HIVE-15202 Project: Hive Issue Type: Bug Reporter: Rui Li If two compactions run concurrently on a single partition, it may generate folder structure like this: (nested base dir) {noformat} drwxr-xr-x - root supergroup 0 2016-11-14 22:23 /user/hive/warehouse/test/z=1/base_007/base_007 -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 /user/hive/warehouse/test/z=1/base_007/bucket_0 -rw-r--r-- 3 root supergroup611 2016-11-14 21:46 /user/hive/warehouse/test/z=1/base_007/bucket_1 -rw-r--r-- 3 root supergroup614 2016-11-14 21:46 /user/hive/warehouse/test/z=1/base_007/bucket_2 -rw-r--r-- 3 root supergroup621 2016-11-14 21:46 /user/hive/warehouse/test/z=1/base_007/bucket_3 -rw-r--r-- 3 root supergroup621 2016-11-14 21:46 /user/hive/warehouse/test/z=1/base_007/bucket_4 -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 /user/hive/warehouse/test/z=1/base_007/bucket_5 -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 /user/hive/warehouse/test/z=1/base_007/bucket_6 -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 /user/hive/warehouse/test/z=1/base_007/bucket_7 -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 /user/hive/warehouse/test/z=1/base_007/bucket_8 -rw-r--r-- 3 root supergroup201 2016-11-14 21:46 /user/hive/warehouse/test/z=1/base_007/bucket_9 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15139) HoS local mode fails with NumberFormatException
Rui Li created HIVE-15139: - Summary: HoS local mode fails with NumberFormatException Key: HIVE-15139 URL: https://issues.apache.org/jira/browse/HIVE-15139 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li It's because we store {{stageId_attemptNum}} in JobMetricsListener but expect only {{stageId}} in LocalSparkJobStatus. {noformat} java.lang.NumberFormatException: For input string: "0_0" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at org.apache.hadoop.hive.ql.exec.spark.status.impl.LocalSparkJobStatus.getSparkStatistics(LocalSparkJobStatus.java:146) at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:104) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15081) RetryingMetaStoreClient.getProxy(HiveConf, Boolean) doesn't match constructor of HiveMetaStoreClient
Rui Li created HIVE-15081: - Summary: RetryingMetaStoreClient.getProxy(HiveConf, Boolean) doesn't match constructor of HiveMetaStoreClient Key: HIVE-15081 URL: https://issues.apache.org/jira/browse/HIVE-15081 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li Calling RetryingMetaStoreClient.getProxy(HiveConf, Boolean) will result in error {noformat} Exception in thread "main" java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1661) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:81) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:131) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:87) Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(org.apache.hadoop.hive.conf.HiveConf, java.lang.Boolean) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15039) A better job monitor console output for HoS
Rui Li created HIVE-15039: - Summary: A better job monitor console output for HoS Key: HIVE-15039 URL: https://issues.apache.org/jira/browse/HIVE-15039 Project: Hive Issue Type: Improvement Reporter: Rui Li Assignee: Rui Li When there're many stages, it's very difficult to read the console output of job progress of HoS. Attached screenshot is an example. We may learn from HoT as it does a much better than HoS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14728) Redundant orig files
Rui Li created HIVE-14728: - Summary: Redundant orig files Key: HIVE-14728 URL: https://issues.apache.org/jira/browse/HIVE-14728 Project: Hive Issue Type: Bug Reporter: Rui Li Priority: Minor I find some orig files in master, e.g. SemanticAnalyzer.java.orig. Wondering if they are added by mistake? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14719) ASTNode rootNode is not maintained properly when changing child/parent relation
Rui Li created HIVE-14719: - Summary: ASTNode rootNode is not maintained properly when changing child/parent relation Key: HIVE-14719 URL: https://issues.apache.org/jira/browse/HIVE-14719 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li When I run some query like: {code} set hive.cbo.enable=false; select * from A where exists (select * from B where B.k1=A.k1 and B.k2=A.k2); {code} It gets error like: {noformat} FAILED: SemanticException Line 0:-1 Invalid table alias or column reference 'sq_1': (possible column names are: _table_or_col b) k2) sq_corr_1)) (tok, (. (tok_table_or_col sq_1) sq_corr_1)) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14595) TimestampWritable::setTimestamp gives wrong result when 2nd VInt exists
Rui Li created HIVE-14595: - Summary: TimestampWritable::setTimestamp gives wrong result when 2nd VInt exists Key: HIVE-14595 URL: https://issues.apache.org/jira/browse/HIVE-14595 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14412) Add a timezone-aware timestamp
Rui Li created HIVE-14412: - Summary: Add a timezone-aware timestamp Key: HIVE-14412 URL: https://issues.apache.org/jira/browse/HIVE-14412 Project: Hive Issue Type: Sub-task Reporter: Rui Li Assignee: Rui Li Java's Timestamp stores the time elapsed since the epoch. While it's by itself unambiguous, ambiguity comes when we parse a string into timestamp, or convert a timestamp to string, causing problems like HIVE-14305. To solve the issue, I think we should make timestamp aware of timezone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14305) To/From UTC timestamp may return incorrect result because of DST
Rui Li created HIVE-14305: - Summary: To/From UTC timestamp may return incorrect result because of DST Key: HIVE-14305 URL: https://issues.apache.org/jira/browse/HIVE-14305 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14238) Ownership shouldn't be checked if external table location doesn't exist
Rui Li created HIVE-14238: - Summary: Ownership shouldn't be checked if external table location doesn't exist Key: HIVE-14238 URL: https://issues.apache.org/jira/browse/HIVE-14238 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li When creating external table with SQL authorization, we require RWX permission + ownership of the table location. If the location doesn't exist, we check on parent dir (recursively), which means we require the user owns everything under parent dir. I think this is not necessary - we don't have to check ownership of parent dir, or we just check non-recursively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14139) NPE dropping permanent function
Rui Li created HIVE-14139: - Summary: NPE dropping permanent function Key: HIVE-14139 URL: https://issues.apache.org/jira/browse/HIVE-14139 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li To reproduce: 1. Start a CLI session and create a permanent function. 2. Exit current CLI session. 3. Start a new CLI session and drop the function. Stack trace: {noformat} FAILED: error during drop function: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.Registry.removePersistentFunctionUnderLock(Registry.java:513) at org.apache.hadoop.hive.ql.exec.Registry.unregisterFunction(Registry.java:501) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.unregisterPermanentFunction(FunctionRegistry.java:1532) at org.apache.hadoop.hive.ql.exec.FunctionTask.dropPermanentFunction(FunctionTask.java:228) at org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:95) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1860) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1564) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1316) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1085) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1073) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13997) Insert overwrite directory doesn't overwrite existing files
Rui Li created HIVE-13997: - Summary: Insert overwrite directory doesn't overwrite existing files Key: HIVE-13997 URL: https://issues.apache.org/jira/browse/HIVE-13997 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li Can be easily reproduced by running {{INSERT OVERWRITE DIRECTORY}} to the same dir twice. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13921) Fix spark on yarn tests for HoS
Rui Li created HIVE-13921: - Summary: Fix spark on yarn tests for HoS Key: HIVE-13921 URL: https://issues.apache.org/jira/browse/HIVE-13921 Project: Hive Issue Type: Test Reporter: Rui Li Assignee: Rui Li {{index_bitmap3}} and {{constprog_partitioner}} have been failing. Let's fix them here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13895) HoS start-up overhead in yarn-client mode
Rui Li created HIVE-13895: - Summary: HoS start-up overhead in yarn-client mode Key: HIVE-13895 URL: https://issues.apache.org/jira/browse/HIVE-13895 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li To avoid the too verbose app state report, HIVE-13376 increases the state check interval to a default 60s. However, bigger interval brings considerable start-up wait time for yarn-client mode. Since the state report only exists in yarn-cluster mode, we can disable it using {{spark.yarn.submit.waitAppCompletion}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13843) Re-enable the HoS tests disabled in HIVE-13402
Rui Li created HIVE-13843: - Summary: Re-enable the HoS tests disabled in HIVE-13402 Key: HIVE-13843 URL: https://issues.apache.org/jira/browse/HIVE-13843 Project: Hive Issue Type: Test Reporter: Rui Li Assignee: Rui Li With HIVE-13525, we can now fix and re-enable the tests for Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13789) Repeatedly checking configuration in TextRecordWriter/Reader hurts performance
Rui Li created HIVE-13789: - Summary: Repeatedly checking configuration in TextRecordWriter/Reader hurts performance Key: HIVE-13789 URL: https://issues.apache.org/jira/browse/HIVE-13789 Project: Hive Issue Type: Improvement Reporter: Rui Li Assignee: Rui Li Priority: Minor We check configuration to decide whether to escape certain characters each time write/read a record for custom scripts. In our benchmark this becomes a hot spot method. And fixing it improves the execution of the custom script by 7% (3TB TPCx-BB dataset). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13662) Set file permission and ACL in file sink operator
Rui Li created HIVE-13662: - Summary: Set file permission and ACL in file sink operator Key: HIVE-13662 URL: https://issues.apache.org/jira/browse/HIVE-13662 Project: Hive Issue Type: Bug Reporter: Rui Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13572) Redundant setting full file status in Hive::copyFiles
Rui Li created HIVE-13572: - Summary: Redundant setting full file status in Hive::copyFiles Key: HIVE-13572 URL: https://issues.apache.org/jira/browse/HIVE-13572 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13525) HoS hangs when job is empty
Rui Li created HIVE-13525: - Summary: HoS hangs when job is empty Key: HIVE-13525 URL: https://issues.apache.org/jira/browse/HIVE-13525 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li Observed in local tests. This should be the cause of HIVE-13402. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13066) Hive on Spark gives incorrect results when speculation is on
Rui Li created HIVE-13066: - Summary: Hive on Spark gives incorrect results when speculation is on Key: HIVE-13066 URL: https://issues.apache.org/jira/browse/HIVE-13066 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Rui Li The issue is reported by users. One possible reason is that we always append 0 as the attempt ID for each task so that hive won't be able to distinguish between speculative tasks and original ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12940) Merge master into spark [Spark Branch]
Rui Li created HIVE-12940: - Summary: Merge master into spark [Spark Branch] Key: HIVE-12940 URL: https://issues.apache.org/jira/browse/HIVE-12940 Project: Hive Issue Type: Task Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12493) HIVE-11180 didn't merge cleanly to branch-1
Rui Li created HIVE-12493: - Summary: HIVE-11180 didn't merge cleanly to branch-1 Key: HIVE-12493 URL: https://issues.apache.org/jira/browse/HIVE-12493 Project: Hive Issue Type: Bug Reporter: Rui Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12466) SparkCounter not initialized error
Rui Li created HIVE-12466: - Summary: SparkCounter not initialized error Key: HIVE-12466 URL: https://issues.apache.org/jira/browse/HIVE-12466 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Assignee: Xuefu Zhang During a query, lots of the following error found in executor's log: {noformat} 03:47:28.759 [Executor task launch worker-0] ERROR org.apache.hive.spark.counter.SparkCounters - counter[HIVE, RECORDS_OUT_0] has not initialized before. 03:47:28.762 [Executor task launch worker-1] ERROR org.apache.hive.spark.counter.SparkCounters - counter[HIVE, RECORDS_OUT_0] has not initialized before. 03:47:30.707 [Executor task launch worker-1] ERROR org.apache.hive.spark.counter.SparkCounters - counter[HIVE, RECORDS_OUT_1_default.tmp_tmp] has not initialized before. 03:47:33.385 [Executor task launch worker-1] ERROR org.apache.hive.spark.counter.SparkCounters - counter[HIVE, RECORDS_OUT_1_default.test_table] has not initialized before. 03:47:33.388 [Executor task launch worker-0] ERROR org.apache.hive.spark.counter.SparkCounters - counter[HIVE, RECORDS_OUT_1_default.test_table] has not initialized before. 03:47:33.495 [Executor task launch worker-0] ERROR org.apache.hive.spark.counter.SparkCounters - counter[HIVE, RECORDS_OUT_1_default.test_table] has not initialized before. 03:47:35.141 [Executor task launch worker-1] ERROR org.apache.hive.spark.counter.SparkCounters - counter[HIVE, RECORDS_OUT_1_default.test_table] has not initialized before. ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12283) Fix test failures after HIVE-11844 [Spark Branch]
Rui Li created HIVE-12283: - Summary: Fix test failures after HIVE-11844 [Spark Branch] Key: HIVE-12283 URL: https://issues.apache.org/jira/browse/HIVE-12283 Project: Hive Issue Type: Sub-task Reporter: Rui Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11183) Enable optimized hash tables for spark [Spark Branch]
Rui Li created HIVE-11183: - Summary: Enable optimized hash tables for spark [Spark Branch] Key: HIVE-11183 URL: https://issues.apache.org/jira/browse/HIVE-11183 Project: Hive Issue Type: Improvement Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11182) Enable optimized hash tables for spark [Spark Branch]
Rui Li created HIVE-11182: - Summary: Enable optimized hash tables for spark [Spark Branch] Key: HIVE-11182 URL: https://issues.apache.org/jira/browse/HIVE-11182 Project: Hive Issue Type: Improvement Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11180) Enable native vectorized map join for spark [Spark Branch]
Rui Li created HIVE-11180: - Summary: Enable native vectorized map join for spark [Spark Branch] Key: HIVE-11180 URL: https://issues.apache.org/jira/browse/HIVE-11180 Project: Hive Issue Type: Sub-task Reporter: Rui Li Assignee: Rui Li The improvement was introduced in HIVE-9824. Let's use this task to track how we can enable that for spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11138) Query fails when there isn't a comparator for an operator [Spark Branch]
Rui Li created HIVE-11138: - Summary: Query fails when there isn't a comparator for an operator [Spark Branch] Key: HIVE-11138 URL: https://issues.apache.org/jira/browse/HIVE-11138 Project: Hive Issue Type: Sub-task Reporter: Rui Li Assignee: Rui Li In such case, OperatorComparatorFactory should default to false instead of throw exceptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11109) Replication factor is not properly set in SparkHashTableSinkOperator [Spark Branch]
Rui Li created HIVE-11109: - Summary: Replication factor is not properly set in SparkHashTableSinkOperator [Spark Branch] Key: HIVE-11109 URL: https://issues.apache.org/jira/browse/HIVE-11109 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li Priority: Trivial The replication factor only gets set in some abnormal cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11108) HashTableSinkOperator doesn't support vectorization [Spark Branch]
Rui Li created HIVE-11108: - Summary: HashTableSinkOperator doesn't support vectorization [Spark Branch] Key: HIVE-11108 URL: https://issues.apache.org/jira/browse/HIVE-11108 Project: Hive Issue Type: Sub-task Reporter: Rui Li This prevents any BaseWork containing HTS from being vectorized. It's basically specific to spark, because Tez doesn't use HTS and MR runs HTS in local tasks. We should verify if it makes sense to make HTS support vectorization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11032) Enable more tests for grouping by skewed data [Spark Branch]
Rui Li created HIVE-11032: - Summary: Enable more tests for grouping by skewed data [Spark Branch] Key: HIVE-11032 URL: https://issues.apache.org/jira/browse/HIVE-11032 Project: Hive Issue Type: Sub-task Reporter: Rui Li Priority: Minor Not all of such tests are enabled, e.g. {{groupby1_map_skew.q}}. We can use this JIRA to track whether we need more of them. Basically, we need to look at all tests with {{set hive.groupby.skewindata=true;}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10989) Spark can't control number of map tasks for runtime skew join [Spark Branch]
Rui Li created HIVE-10989: - Summary: Spark can't control number of map tasks for runtime skew join [Spark Branch] Key: HIVE-10989 URL: https://issues.apache.org/jira/browse/HIVE-10989 Project: Hive Issue Type: Sub-task Reporter: Rui Li Assignee: Rui Li Flags {{hive.skewjoin.mapjoin.map.tasks}} and {{hive.skewjoin.mapjoin.min.split}} are used to control the number of map tasks for the map join of runtime skew join. They work well for MR but have no effect for spark. This makes runtime skew join less useful, i.e. we just end up with slow mappers instead of reducers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10903) Add hive.in.test for HoS tests [Spark Branch]
Rui Li created HIVE-10903: - Summary: Add hive.in.test for HoS tests [Spark Branch] Key: HIVE-10903 URL: https://issues.apache.org/jira/browse/HIVE-10903 Project: Hive Issue Type: Test Reporter: Rui Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10816) NPE in ExecDriver::handleSampling when submitted via child JVM
Rui Li created HIVE-10816: - Summary: NPE in ExecDriver::handleSampling when submitted via child JVM Key: HIVE-10816 URL: https://issues.apache.org/jira/browse/HIVE-10816 Project: Hive Issue Type: Bug Reporter: Rui Li When {{hive.exec.submitviachild = true}}, parallel order by fails with NPE and falls back to single-reducer mode. Stack trace: {noformat} 2015-05-25 08:41:04,446 ERROR [main]: mr.ExecDriver (ExecDriver.java:execute(386)) - Sampling error java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.handleSampling(ExecDriver.java:513) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:379) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:750) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10527) NPE in SparkUtilities::isDedicatedCluster
Rui Li created HIVE-10527: - Summary: NPE in SparkUtilities::isDedicatedCluster Key: HIVE-10527 URL: https://issues.apache.org/jira/browse/HIVE-10527 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li We should add {{spark.master}} to HiveConf when it doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10458) Enable parallel order by for spark [Spark Branch]
Rui Li created HIVE-10458: - Summary: Enable parallel order by for spark [Spark Branch] Key: HIVE-10458 URL: https://issues.apache.org/jira/browse/HIVE-10458 Project: Hive Issue Type: Sub-task Reporter: Rui Li Assignee: Rui Li We don't have to force reducer# to 1 as spark supports parallel sorting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10261) Data size can be underestimated when computed with partial column stats
Rui Li created HIVE-10261: - Summary: Data size can be underestimated when computed with partial column stats Key: HIVE-10261 URL: https://issues.apache.org/jira/browse/HIVE-10261 Project: Hive Issue Type: Bug Reporter: Rui Li With {{hive.stats.fetch.column.stats=true}}, we'll estimate data size with column stats when annotating operators with statistics. However, when column stats is partial, we're likely to underestimate data size, which may hurt performance, e.g. picking an inappropriate small table for map join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9969) Avoid Utilities.getMapRedWork for spark [Spark Branch]
Rui Li created HIVE-9969: Summary: Avoid Utilities.getMapRedWork for spark [Spark Branch] Key: HIVE-9969 URL: https://issues.apache.org/jira/browse/HIVE-9969 Project: Hive Issue Type: Sub-task Reporter: Rui Li Priority: Minor The method shouldn't be used for spark mode. Specifically, map work and reduce work have different plan paths in spark. Calling this method will leave lots of errors in executor's log: {noformat} 15/03/16 02:57:23 INFO Utilities: Open file to read in plan: hdfs://node13-1:8020/tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml 15/03/16 02:57:23 INFO Utilities: File not found: File does not exist: /tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9924) Add SORT_QUERY_RESULT to union12.q
Rui Li created HIVE-9924: Summary: Add SORT_QUERY_RESULT to union12.q Key: HIVE-9924 URL: https://issues.apache.org/jira/browse/HIVE-9924 Project: Hive Issue Type: Test Reporter: Rui Li Assignee: Rui Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9927) MR doesn't produce correct result for runtime_skewjoin_mapjoin_spark
Rui Li created HIVE-9927: Summary: MR doesn't produce correct result for runtime_skewjoin_mapjoin_spark Key: HIVE-9927 URL: https://issues.apache.org/jira/browse/HIVE-9927 Project: Hive Issue Type: Bug Reporter: Rui Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9869) Trunk doesn't build with hadoop-1
Rui Li created HIVE-9869: Summary: Trunk doesn't build with hadoop-1 Key: HIVE-9869 URL: https://issues.apache.org/jira/browse/HIVE-9869 Project: Hive Issue Type: Bug Reporter: Rui Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9855) Runtime skew join doesn't work when skewed data only exists in big table
Rui Li created HIVE-9855: Summary: Runtime skew join doesn't work when skewed data only exists in big table Key: HIVE-9855 URL: https://issues.apache.org/jira/browse/HIVE-9855 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li To reproduce, enable runtime skew join and then join two tables that skewed data only exists in one of them. The task will fail with the following exception: {noformat} Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.IOException: Unable to rename output to: hdfs://.. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326921#comment-14326921 ] Rui Li commented on HIVE-9561: -- Thank you Xuefu! SHUFFLE_SORT should only be used for order by query [Spark Branch] -- Key: HIVE-9561 URL: https://issues.apache.org/jira/browse/HIVE-9561 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Fix For: spark-branch Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.5-spark.patch, HIVE-9561.6-spark.patch The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance and are difficult to control. So we should limit the use of {{sortByKey}} to order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325120#comment-14325120 ] Rui Li commented on HIVE-9561: -- Hi [~xuefuz], thanks very much for taking care of this. I can't really work on it due to limited network access. Sorry for the inconvenience. SHUFFLE_SORT should only be used for order by query [Spark Branch] -- Key: HIVE-9561 URL: https://issues.apache.org/jira/browse/HIVE-9561 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.5-spark.patch, HIVE-9561.6-spark.patch The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance and are difficult to control. So we should limit the use of {{sortByKey}} to order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9561: - Attachment: (was: HIVE-9561.4-spark.patch) SHUFFLE_SORT should only be used for order by query [Spark Branch] -- Key: HIVE-9561 URL: https://issues.apache.org/jira/browse/HIVE-9561 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance and are difficult to control. So we should limit the use of {{sortByKey}} to order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9561: - Attachment: HIVE-9561.4-spark.patch Try again SHUFFLE_SORT should only be used for order by query [Spark Branch] -- Key: HIVE-9561 URL: https://issues.apache.org/jira/browse/HIVE-9561 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.4-spark.patch The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance and are difficult to control. So we should limit the use of {{sortByKey}} to order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9696) Address RB comments for HIVE-9425 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323565#comment-14323565 ] Rui Li commented on HIVE-9696: -- The failure of union3 is introduced when I merge HIVE-9666 into spark. It should be fixed in HIVE-9561. Address RB comments for HIVE-9425 [Spark Branch] Key: HIVE-9696 URL: https://issues.apache.org/jira/browse/HIVE-9696 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Priority: Trivial Attachments: HIVE-9696.1-spark.patch, HIVE-9696.1-spark.patch, HIVE-9696.1-spark.patch A followup task of HIVE-9425. The pending RB comment can be found [here|https://reviews.apache.org/r/30984/#comment118482]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9561: - Attachment: (was: HIVE-9561.4-spark.patch) SHUFFLE_SORT should only be used for order by query [Spark Branch] -- Key: HIVE-9561 URL: https://issues.apache.org/jira/browse/HIVE-9561 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.4-spark.patch The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance and are difficult to control. So we should limit the use of {{sortByKey}} to order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9666) Improve some qtests
[ https://issues.apache.org/jira/browse/HIVE-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9666: - Fix Version/s: 1.2.0 Improve some qtests --- Key: HIVE-9666 URL: https://issues.apache.org/jira/browse/HIVE-9666 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Minor Fix For: 1.2.0 Attachments: HIVE-9666.1.patch, HIVE-9666.2.patch {code} groupby7_noskew_multi_single_reducer.q groupby_multi_single_reducer3.q parallel_join0.q union3.q union4.q {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9696) Address RB comments for HIVE-9425 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9696: - Description: A followup task of HIVE-9425. The pending RB comment can be found [here|https://reviews.apache.org/r/30984/#comment118482]. was: A followup task of HIVE-9425. Then pending RB comment can be found [here|https://reviews.apache.org/r/30984/#comment118482]. Address RB comments for HIVE-9425 [Spark Branch] Key: HIVE-9696 URL: https://issues.apache.org/jira/browse/HIVE-9696 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Priority: Trivial Attachments: HIVE-9696.1-spark.patch A followup task of HIVE-9425. The pending RB comment can be found [here|https://reviews.apache.org/r/30984/#comment118482]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9696) Address RB comments for HIVE-9425 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9696: - Summary: Address RB comments for HIVE-9425 [Spark Branch] (was: Address RB comments for HIVE-9425) Address RB comments for HIVE-9425 [Spark Branch] Key: HIVE-9696 URL: https://issues.apache.org/jira/browse/HIVE-9696 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Priority: Trivial A followup task of HIVE-9425. Then pending RB comment can be found [here|https://reviews.apache.org/r/30984/#comment118482]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9696) Address RB comments for HIVE-9425
Rui Li created HIVE-9696: Summary: Address RB comments for HIVE-9425 Key: HIVE-9696 URL: https://issues.apache.org/jira/browse/HIVE-9696 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Priority: Trivial A followup task of HIVE-9425. Then pending RB comment can be found [here|https://reviews.apache.org/r/30984/#comment118482]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9666) Improve some qtests
[ https://issues.apache.org/jira/browse/HIVE-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321795#comment-14321795 ] Rui Li commented on HIVE-9666: -- Committed to trunk and merged into spark. Hope I didn't screw up anything. Thanks [~xuefuz] for the review. Improve some qtests --- Key: HIVE-9666 URL: https://issues.apache.org/jira/browse/HIVE-9666 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Minor Attachments: HIVE-9666.1.patch, HIVE-9666.2.patch {code} groupby7_noskew_multi_single_reducer.q groupby_multi_single_reducer3.q parallel_join0.q union3.q union4.q {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9696) Address RB comments for HIVE-9425 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9696: - Status: Patch Available (was: Open) Address RB comments for HIVE-9425 [Spark Branch] Key: HIVE-9696 URL: https://issues.apache.org/jira/browse/HIVE-9696 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Priority: Trivial Attachments: HIVE-9696.1-spark.patch A followup task of HIVE-9425. Then pending RB comment can be found [here|https://reviews.apache.org/r/30984/#comment118482]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321958#comment-14321958 ] Rui Li commented on HIVE-9659: -- Hi [~jxiang], could you elaborate how you reproduced this? 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch] --- Key: HIVE-9659 URL: https://issues.apache.org/jira/browse/HIVE-9659 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xin Hao We found that 'Error while trying to create table container' occurs during Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'. If hive.optimize.skewjoin set to 'false', the case could pass. How to reproduce: 1. set hive.optimize.skewjoin=true; 2. Run BigBench case Q12 and it will fail. Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you will found error 'Error while trying to create table container' in the log and also a NullPointerException near the end of the log. (a) Detail error message for 'Error while trying to create table container': {noformat} 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158) at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115) ... 21 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a directory: hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106) ... 22 more 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480 15/02/12 01:29:49 INFO PerfLogger: PERFLOG method=SparkInitializeOperators from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler {noformat} (b) Detail error message for NullPointerException: {noformat} 5/02/12 01:29:50 ERROR MapJoinOperator:
[jira] [Updated] (HIVE-9666) Improve some qtests
[ https://issues.apache.org/jira/browse/HIVE-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9666: - Resolution: Fixed Status: Resolved (was: Patch Available) Improve some qtests --- Key: HIVE-9666 URL: https://issues.apache.org/jira/browse/HIVE-9666 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Minor Fix For: 1.2.0 Attachments: HIVE-9666.1.patch, HIVE-9666.2.patch {code} groupby7_noskew_multi_single_reducer.q groupby_multi_single_reducer3.q parallel_join0.q union3.q union4.q {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9561: - Attachment: HIVE-9561.4-spark.patch The failures seem strange. Try again. SHUFFLE_SORT should only be used for order by query [Spark Branch] -- Key: HIVE-9561 URL: https://issues.apache.org/jira/browse/HIVE-9561 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.4-spark.patch The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance and are difficult to control. So we should limit the use of {{sortByKey}} to order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9561: - Attachment: HIVE-9561.4-spark.patch Rebase my patch. SHUFFLE_SORT should only be used for order by query [Spark Branch] -- Key: HIVE-9561 URL: https://issues.apache.org/jira/browse/HIVE-9561 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance and are difficult to control. So we should limit the use of {{sortByKey}} to order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)