[jira] [Created] (HIVE-22053) Function name is not normalized when creating function

2019-07-26 Thread Rui Li (JIRA)
Rui Li created HIVE-22053:
-

 Summary: Function name is not normalized when creating function
 Key: HIVE-22053
 URL: https://issues.apache.org/jira/browse/HIVE-22053
 Project: Hive
  Issue Type: Bug
  Components: Standalone Metastore
Reporter: Rui Li
Assignee: Rui Li


If a function is created with a name containing upper case characters, we get 
NoSuchObjectException when trying to get that function.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-19895) The unique ID in SparkPartitionPruningSinkOperator is no longer needed

2018-06-14 Thread Rui Li (JIRA)
Rui Li created HIVE-19895:
-

 Summary: The unique ID in SparkPartitionPruningSinkOperator is no 
longer needed
 Key: HIVE-19895
 URL: https://issues.apache.org/jira/browse/HIVE-19895
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19671) Distribute by rand() can lead to data inconsistency

2018-05-23 Thread Rui Li (JIRA)
Rui Li created HIVE-19671:
-

 Summary: Distribute by rand() can lead to data inconsistency
 Key: HIVE-19671
 URL: https://issues.apache.org/jira/browse/HIVE-19671
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li


Noticed the following queries can give different results:
{code}
select count(*) from tbl;
select count(*) from (select * from tbl distribute by rand());
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19439) MapWork shouldn't be reused when Spark task fails during initialization

2018-05-07 Thread Rui Li (JIRA)
Rui Li created HIVE-19439:
-

 Summary: MapWork shouldn't be reused when Spark task fails during 
initialization
 Key: HIVE-19439
 URL: https://issues.apache.org/jira/browse/HIVE-19439
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li


Issue identified in HIVE-19388. When a Spark task fails during initializing the 
map operator, the task is retried with the same MapWork retrieved from cache. 
This can be problematic because the MapWork may be partially initialized, e.g. 
some operators are already in INIT state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19316) StatsTask fails due to ClassCastException

2018-04-26 Thread Rui Li (JIRA)
Rui Li created HIVE-19316:
-

 Summary: StatsTask fails due to ClassCastException
 Key: HIVE-19316
 URL: https://issues.apache.org/jira/browse/HIVE-19316
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Rui Li


The stack trace:
{noformat}
2018-04-26T20:17:37,674 ERROR [pool-7-thread-11] metastore.RetryingHMSHandler: 
java.lang.ClassCastException: 
org.apache.hadoop.hive.metastore.api.LongColumnStatsData cannot be cast to 
org.apache.hadoop.hive.metastore.columnstats.cache.LongColumnStatsDataInspector
at 
org.apache.hadoop.hive.metastore.columnstats.merge.LongColumnStatsMerger.merge(LongColumnStatsMerger.java:30)
at 
org.apache.hadoop.hive.metastore.utils.MetaStoreUtils.mergeColStats(MetaStoreUtils.java:1052)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:7202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at com.sun.proxy.$Proxy26.set_aggr_stats_for(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:16795)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$set_aggr_stats_for.getResult(ThriftHiveMetastore.java:16779)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18955) HoS: Unable to create Channel from class NioServerSocketChannel

2018-03-14 Thread Rui Li (JIRA)
Rui Li created HIVE-18955:
-

 Summary: HoS: Unable to create Channel from class 
NioServerSocketChannel
 Key: HIVE-18955
 URL: https://issues.apache.org/jira/browse/HIVE-18955
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li


Hit the issue when trying launch spark job. Stack trace:
{noformat}
Caused by: java.lang.NoSuchMethodError: 
io.netty.channel.DefaultChannelId.newInstance()Lio/netty/channel/DefaultChannelId;
at io.netty.channel.AbstractChannel.newId(AbstractChannel.java:111) 
~[netty-all-4.1.17.Final.jar:4.1.17.Final]
at io.netty.channel.AbstractChannel.(AbstractChannel.java:83) 
~[netty-all-4.1.17.Final.jar:4.1.17.Final]
at 
io.netty.channel.nio.AbstractNioChannel.(AbstractNioChannel.java:84) 
~[netty-all-4.1.17.Final.jar:4.1.17.Final]
at 
io.netty.channel.nio.AbstractNioMessageChannel.(AbstractNioMessageChannel.java:42)
 ~[netty-all-4.1.17.Final.jar:4.1.17.Final]
at 
io.netty.channel.socket.nio.NioServerSocketChannel.(NioServerSocketChannel.java:86)
 ~[netty-all-4.1.17.Final.jar:4.1.17.Final]
at 
io.netty.channel.socket.nio.NioServerSocketChannel.(NioServerSocketChannel.java:72)
 ~[netty-all-4.1.17.Final.jar:4.1.17.Final]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) ~[?:1.8.0_151]
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 ~[?:1.8.0_151]
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[?:1.8.0_151]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
~[?:1.8.0_151]
at 
io.netty.channel.ReflectiveChannelFactory.newChannel(ReflectiveChannelFactory.java:38)
 ~[netty-all-4.1.17.Final.jar:4.1.17.Final]
... 32 more
{noformat}

It seems we have conflicts versions of class 
{{io.netty.channel.DefaultChannelId}} from async-http-client.jar and 
netty-all.jar



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18647) Cannot create table: Unknown column 'CREATION_METADATA_MV_CREATION_METADATA_ID_OID'

2018-02-07 Thread Rui Li (JIRA)
Rui Li created HIVE-18647:
-

 Summary: Cannot create table: Unknown column 
'CREATION_METADATA_MV_CREATION_METADATA_ID_OID'
 Key: HIVE-18647
 URL: https://issues.apache.org/jira/browse/HIVE-18647
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18442) HoS: No FileSystem for scheme: nullscan

2018-01-11 Thread Rui Li (JIRA)
Rui Li created HIVE-18442:
-

 Summary: HoS: No FileSystem for scheme: nullscan
 Key: HIVE-18442
 URL: https://issues.apache.org/jira/browse/HIVE-18442
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18282) Spark tar is downloaded every time for itest

2017-12-14 Thread Rui Li (JIRA)
Rui Li created HIVE-18282:
-

 Summary: Spark tar is downloaded every time for itest
 Key: HIVE-18282
 URL: https://issues.apache.org/jira/browse/HIVE-18282
 Project: Hive
  Issue Type: Test
Reporter: Rui Li


Seems we missed the md5 file for spark-2.2.0?
cc [~kellyzly], [~stakiar]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18242) VectorizedRowBatch cast exception when analyzing partitioned table

2017-12-06 Thread Rui Li (JIRA)
Rui Li created HIVE-18242:
-

 Summary: VectorizedRowBatch cast exception when analyzing 
partitioned table
 Key: HIVE-18242
 URL: https://issues.apache.org/jira/browse/HIVE-18242
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li


Happens when I run the following (vectorization enabled):
{code}
ANALYZE TABLE srcpart PARTITION(ds, hr) COMPUTE STATISTICS;
{code}
The stack trace is:
{noformat}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch cannot be cast to 
org.apache.hadoop.io.Text
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.copyObject(WritableStringObjectInspector.java:36)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:425)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.partialCopyToStandardObject(ObjectInspectorUtils.java:314)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.gatherStats(TableScanOperator.java:191)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:138)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.setupPartitionContextVars(VectorMapOperator.java:682)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.cleanUpInputFileChangedOp(VectorMapOperator.java:607)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1187)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:784)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18148) NPE in SparkDynamicPartitionPruningResolver

2017-11-26 Thread Rui Li (JIRA)
Rui Li created HIVE-18148:
-

 Summary: NPE in SparkDynamicPartitionPruningResolver
 Key: HIVE-18148
 URL: https://issues.apache.org/jira/browse/HIVE-18148
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li


The stack trace is:
{noformat}
2017-11-27T10:32:38,752 ERROR [e6c8aab5-ddd2-461d-b185-a7597c3e7519 main] 
ql.Driver: FAILED: NullPointerException null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.optimizer.physical.SparkDynamicPartitionPruningResolver$SparkDynamicPartitionPruningDispatcher.dispatch(SparkDynamicPartitionPruningResolver.java:100)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180)
at 
org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125)
at 
org.apache.hadoop.hive.ql.optimizer.physical.SparkDynamicPartitionPruningResolver.resolve(SparkDynamicPartitionPruningResolver.java:74)
at 
org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeTaskPlan(SparkCompiler.java:568)
{noformat}
At this stage, there shouldn't be a DPP sink whose target map work is null. The 
root cause seems to be a malformed operator tree generated by SplitOpTreeForDPP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18129) The ConditionalResolverMergeFiles doesn't merge empty files

2017-11-21 Thread Rui Li (JIRA)
Rui Li created HIVE-18129:
-

 Summary: The ConditionalResolverMergeFiles doesn't merge empty 
files
 Key: HIVE-18129
 URL: https://issues.apache.org/jira/browse/HIVE-18129
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li


If a query produces lots of empty files, these files won't be merged by the 
merge-small-file feature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18111) Fix temp path for Spark DPP sink

2017-11-20 Thread Rui Li (JIRA)
Rui Li created HIVE-18111:
-

 Summary: Fix temp path for Spark DPP sink
 Key: HIVE-18111
 URL: https://issues.apache.org/jira/browse/HIVE-18111
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18041) Add SORT_QUERY_RESULTS to subquery_multi

2017-11-09 Thread Rui Li (JIRA)
Rui Li created HIVE-18041:
-

 Summary: Add SORT_QUERY_RESULTS to subquery_multi
 Key: HIVE-18041
 URL: https://issues.apache.org/jira/browse/HIVE-18041
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17976) HoS: don't set output collector if there's no data to process

2017-11-03 Thread Rui Li (JIRA)
Rui Li created HIVE-17976:
-

 Summary: HoS: don't set output collector if there's no data to 
process
 Key: HIVE-17976
 URL: https://issues.apache.org/jira/browse/HIVE-17976
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor


MR doesn't set an output collector if no row is processed, i.e. 
{{ExecMapper::map}} is never called. Let's investigate whether Spark should do 
the same.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17964) HoS: some spark configs doesn't require re-creating a session

2017-11-02 Thread Rui Li (JIRA)
Rui Li created HIVE-17964:
-

 Summary: HoS: some spark configs doesn't require re-creating a 
session
 Key: HIVE-17964
 URL: https://issues.apache.org/jira/browse/HIVE-17964
 Project: Hive
  Issue Type: Improvement
Reporter: Rui Li
Priority: Minor


I guess the {{hive.spark.}} configs were initially intended for the RSC. 
Therefore when they're changed, we'll re-create the session for them to take 
effect. There're some configs not related to RSC that also start with 
{{hive.spark.}}. We'd better rename them so that we don't unnecessarily 
re-create sessions, which is usually time consuming.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17877) HoS: combine equivalent DPP sink works

2017-10-23 Thread Rui Li (JIRA)
Rui Li created HIVE-17877:
-

 Summary: HoS: combine equivalent DPP sink works
 Key: HIVE-17877
 URL: https://issues.apache.org/jira/browse/HIVE-17877
 Project: Hive
  Issue Type: Improvement
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17383) ArrayIndexOutOfBoundsException in VectorGroupByOperator

2017-08-24 Thread Rui Li (JIRA)
Rui Li created HIVE-17383:
-

 Summary: ArrayIndexOutOfBoundsException in VectorGroupByOperator
 Key: HIVE-17383
 URL: https://issues.apache.org/jira/browse/HIVE-17383
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li


Query to reproduce:
{noformat}
set hive.cbo.enable=false;
select count(*) from (select key from src group by key) s where s.key='98';
{noformat}
The stack trace is:
{noformat}
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupKeyHelper.copyGroupKey(VectorGroupKeyHelper.java:107)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.doProcessBatch(VectorGroupByOperator.java:831)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processBatch(VectorGroupByOperator.java:174)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:1046)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:462)
... 18 more
{noformat}
More details can be found in HIVE-16823



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified

2017-08-15 Thread Rui Li (JIRA)
Rui Li created HIVE-17321:
-

 Summary: HoS: analyze ORC table doesn't compute raw data size when 
noscan/partialscan is not specified
 Key: HIVE-17321
 URL: https://issues.apache.org/jira/browse/HIVE-17321
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17193) HoS: don't combine map works that are targets of different DPPs

2017-07-27 Thread Rui Li (JIRA)
Rui Li created HIVE-17193:
-

 Summary: HoS: don't combine map works that are targets of 
different DPPs
 Key: HIVE-17193
 URL: https://issues.apache.org/jira/browse/HIVE-17193
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17133) NoSuchMethodError in Hadoop FileStatus.compareTo

2017-07-20 Thread Rui Li (JIRA)
Rui Li created HIVE-17133:
-

 Summary: NoSuchMethodError in Hadoop FileStatus.compareTo
 Key: HIVE-17133
 URL: https://issues.apache.org/jira/browse/HIVE-17133
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li


The stack trace is:
{noformat}
Caused by: java.lang.NoSuchMethodError: 
org.apache.hadoop.fs.FileStatus.compareTo(Lorg/apache/hadoop/fs/FileStatus;)I
at 
org.apache.hadoop.hive.ql.io.AcidUtils.lambda$getAcidState$0(AcidUtils.java:931)
at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
at java.util.TimSort.sort(TimSort.java:234)
at java.util.Arrays.sort(Arrays.java:1512)
at java.util.ArrayList.sort(ArrayList.java:1454)
at java.util.Collections.sort(Collections.java:175)
at 
org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:929)
{noformat}

I'm on Hive master and using Hadoop 2.7.2. The method signature in Hadoop 2.7.2 
is:
https://github.com/apache/hadoop/blob/release-2.7.2-RC2/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileStatus.java#L336
In Hadoop 2.8.0 it becomes:
https://github.com/apache/hadoop/blob/release-2.8.0-RC3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileStatus.java#L332
I think that breaks binary compatibility.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17114) HoS: Possible skew in shuffling when data is not really skewed

2017-07-17 Thread Rui Li (JIRA)
Rui Li created HIVE-17114:
-

 Summary: HoS: Possible skew in shuffling when data is not really 
skewed
 Key: HIVE-17114
 URL: https://issues.apache.org/jira/browse/HIVE-17114
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor


Observed in HoS and may apply to other engines as well.
When we join 2 tables on a single int key, we use the key itself as hash code 
in {{ObjectInspectorUtils.hashCode}}:
{code}
  case INT:
return ((IntObjectInspector) poi).get(o);
{code}
Suppose the keys are different but are all some multiples of 10. And if we 
choose 10 as #reducers, the shuffle will be skewed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17034) The spark tar for itests is downloaded every time if md5sum is not installed

2017-07-05 Thread Rui Li (JIRA)
Rui Li created HIVE-17034:
-

 Summary: The spark tar for itests is downloaded every time if 
md5sum is not installed
 Key: HIVE-17034
 URL: https://issues.apache.org/jira/browse/HIVE-17034
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li


I think we should either skip verifying md5, or fail the build to let developer 
know md5sum is required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17020) Aggressive RS dedup can incorrectly remove OP tree branch

2017-07-04 Thread Rui Li (JIRA)
Rui Li created HIVE-17020:
-

 Summary: Aggressive RS dedup can incorrectly remove OP tree branch
 Key: HIVE-17020
 URL: https://issues.apache.org/jira/browse/HIVE-17020
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li


Suppose we have an OP tree like this:
{noformat}
 ...
  |
 RS[1]
  |
SEL[2]
/\
SEL[3]   SEL[4]
  | |
RS[5] FS[6]
  |
 ... 
{noformat}
When doing aggressive RS dedup, we'll remove all the operators between RS5 and 
RS1, and thus the branch containing FS6 is lost.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16876) RpcServer should be re-created when Rpc configs change

2017-06-10 Thread Rui Li (JIRA)
Rui Li created HIVE-16876:
-

 Summary: RpcServer should be re-created when Rpc configs change
 Key: HIVE-16876
 URL: https://issues.apache.org/jira/browse/HIVE-16876
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16767) Update people website with recent changes

2017-05-26 Thread Rui Li (JIRA)
Rui Li created HIVE-16767:
-

 Summary: Update people website with recent changes
 Key: HIVE-16767
 URL: https://issues.apache.org/jira/browse/HIVE-16767
 Project: Hive
  Issue Type: Task
  Components: Documentation
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16739) HoS DPP generates malformed plan when hive.tez.dynamic.semijoin.reduction is on

2017-05-23 Thread Rui Li (JIRA)
Rui Li created HIVE-16739:
-

 Summary: HoS DPP generates malformed plan when 
hive.tez.dynamic.semijoin.reduction is on
 Key: HIVE-16739
 URL: https://issues.apache.org/jira/browse/HIVE-16739
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li


HoS DPP currently can't handle dynamic semi join and will result in 
{{ClassCastException org.apache.hadoop.hive.ql.plan.ReduceWork cannot be cast 
to org.apache.hadoop.hive.ql.plan.MapWork}}.
We should either disable or implement it for HoS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16659) Query plan should reflect hive.spark.use.groupby.shuffle

2017-05-11 Thread Rui Li (JIRA)
Rui Li created HIVE-16659:
-

 Summary: Query plan should reflect hive.spark.use.groupby.shuffle
 Key: HIVE-16659
 URL: https://issues.apache.org/jira/browse/HIVE-16659
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li


It's useful to show the shuffle type used in the query plan. Currently it shows 
"GROUP" no matter what we set for hive.spark.use.groupby.shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16613) SaslClientHandler.sendHello is eating exceptions

2017-05-09 Thread Rui Li (JIRA)
Rui Li created HIVE-16613:
-

 Summary: SaslClientHandler.sendHello is eating exceptions
 Key: HIVE-16613
 URL: https://issues.apache.org/jira/browse/HIVE-16613
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16593) SparkClientFactory.stop may prevent JVM from exiting

2017-05-05 Thread Rui Li (JIRA)
Rui Li created HIVE-16593:
-

 Summary: SparkClientFactory.stop may prevent JVM from exiting
 Key: HIVE-16593
 URL: https://issues.apache.org/jira/browse/HIVE-16593
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16573) In-place update for HoS can't be disabled

2017-05-03 Thread Rui Li (JIRA)
Rui Li created HIVE-16573:
-

 Summary: In-place update for HoS can't be disabled
 Key: HIVE-16573
 URL: https://issues.apache.org/jira/browse/HIVE-16573
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor


{{hive.spark.exec.inplace.progress}} has no effect



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16459) Cancel outstanding RPCs when channel closes

2017-04-17 Thread Rui Li (JIRA)
Rui Li created HIVE-16459:
-

 Summary: Cancel outstanding RPCs when channel closes
 Key: HIVE-16459
 URL: https://issues.apache.org/jira/browse/HIVE-16459
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16418) Allow HiveKey to skip some bytes for comparison

2017-04-11 Thread Rui Li (JIRA)
Rui Li created HIVE-16418:
-

 Summary: Allow HiveKey to skip some bytes for comparison
 Key: HIVE-16418
 URL: https://issues.apache.org/jira/browse/HIVE-16418
 Project: Hive
  Issue Type: New Feature
Reporter: Rui Li
Assignee: Rui Li


The feature is required when we have to serialize some fields and prevent them 
from being used in comparison, e.g. HIVE-14412.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16315) Describe table doesn't show num of partitions

2017-03-28 Thread Rui Li (JIRA)
Rui Li created HIVE-16315:
-

 Summary: Describe table doesn't show num of partitions
 Key: HIVE-16315
 URL: https://issues.apache.org/jira/browse/HIVE-16315
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li


This doesn't comply with our wiki: 
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-Examples



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16155) No need for ConditionalTask if no conditional map join is created

2017-03-09 Thread Rui Li (JIRA)
Rui Li created HIVE-16155:
-

 Summary: No need for ConditionalTask if no conditional map join is 
created
 Key: HIVE-16155
 URL: https://issues.apache.org/jira/browse/HIVE-16155
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16047) Shouldn't try to get KeyProvider unless encryption is enabled

2017-02-27 Thread Rui Li (JIRA)
Rui Li created HIVE-16047:
-

 Summary: Shouldn't try to get KeyProvider unless encryption is 
enabled
 Key: HIVE-16047
 URL: https://issues.apache.org/jira/browse/HIVE-16047
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor


Found lots of following errors in HS2 log:
{noformat}
hdfs.KeyProviderCache: Could not find uri with key 
[dfs.encryption.key.provider.uri] to create a keyProvider !!
{noformat}

Similar to HDFS-7931



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15860) RemoteSparkJobMonitor may hang when RemoteDriver exists abnormally

2017-02-09 Thread Rui Li (JIRA)
Rui Li created HIVE-15860:
-

 Summary: RemoteSparkJobMonitor may hang when RemoteDriver exists 
abnormally
 Key: HIVE-15860
 URL: https://issues.apache.org/jira/browse/HIVE-15860
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15526) Some tests need SORT_QUERY_RESULTS

2016-12-29 Thread Rui Li (JIRA)
Rui Li created HIVE-15526:
-

 Summary: Some tests need SORT_QUERY_RESULTS
 Key: HIVE-15526
 URL: https://issues.apache.org/jira/browse/HIVE-15526
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15428) HoS DPP doesn't remove cyclic dependency

2016-12-13 Thread Rui Li (JIRA)
Rui Li created HIVE-15428:
-

 Summary: HoS DPP doesn't remove cyclic dependency
 Key: HIVE-15428
 URL: https://issues.apache.org/jira/browse/HIVE-15428
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15357) Fix and re-enable the spark-only tests

2016-12-05 Thread Rui Li (JIRA)
Rui Li created HIVE-15357:
-

 Summary: Fix and re-enable the spark-only tests
 Key: HIVE-15357
 URL: https://issues.apache.org/jira/browse/HIVE-15357
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15302) Relax the requirement that HoS needs Spark built w/o Hive

2016-11-28 Thread Rui Li (JIRA)
Rui Li created HIVE-15302:
-

 Summary: Relax the requirement that HoS needs Spark built w/o Hive
 Key: HIVE-15302
 URL: https://issues.apache.org/jira/browse/HIVE-15302
 Project: Hive
  Issue Type: Improvement
Reporter: Rui Li
Assignee: Rui Li


This requirement becomes more and more unacceptable as SparkSQL becomes widely 
adopted. Let's use this JIRA to find out how we can relax the limitation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15299) Yarn-cluster and yarn-client deprecated in Spark 2.0

2016-11-28 Thread Rui Li (JIRA)
Rui Li created HIVE-15299:
-

 Summary: Yarn-cluster and yarn-client deprecated in Spark 2.0
 Key: HIVE-15299
 URL: https://issues.apache.org/jira/browse/HIVE-15299
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor


Need to use master "yarn" with specified deploy mode instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15202) Concurrent compactions for the same partition may generate malformed folder structure

2016-11-14 Thread Rui Li (JIRA)
Rui Li created HIVE-15202:
-

 Summary: Concurrent compactions for the same partition may 
generate malformed folder structure
 Key: HIVE-15202
 URL: https://issues.apache.org/jira/browse/HIVE-15202
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li


If two compactions run concurrently on a single partition, it may generate 
folder structure like this: (nested base dir)
{noformat}
drwxr-xr-x   - root supergroup  0 2016-11-14 22:23 
/user/hive/warehouse/test/z=1/base_007/base_007
-rw-r--r--   3 root supergroup201 2016-11-14 21:46 
/user/hive/warehouse/test/z=1/base_007/bucket_0
-rw-r--r--   3 root supergroup611 2016-11-14 21:46 
/user/hive/warehouse/test/z=1/base_007/bucket_1
-rw-r--r--   3 root supergroup614 2016-11-14 21:46 
/user/hive/warehouse/test/z=1/base_007/bucket_2
-rw-r--r--   3 root supergroup621 2016-11-14 21:46 
/user/hive/warehouse/test/z=1/base_007/bucket_3
-rw-r--r--   3 root supergroup621 2016-11-14 21:46 
/user/hive/warehouse/test/z=1/base_007/bucket_4
-rw-r--r--   3 root supergroup201 2016-11-14 21:46 
/user/hive/warehouse/test/z=1/base_007/bucket_5
-rw-r--r--   3 root supergroup201 2016-11-14 21:46 
/user/hive/warehouse/test/z=1/base_007/bucket_6
-rw-r--r--   3 root supergroup201 2016-11-14 21:46 
/user/hive/warehouse/test/z=1/base_007/bucket_7
-rw-r--r--   3 root supergroup201 2016-11-14 21:46 
/user/hive/warehouse/test/z=1/base_007/bucket_8
-rw-r--r--   3 root supergroup201 2016-11-14 21:46 
/user/hive/warehouse/test/z=1/base_007/bucket_9
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15139) HoS local mode fails with NumberFormatException

2016-11-07 Thread Rui Li (JIRA)
Rui Li created HIVE-15139:
-

 Summary: HoS local mode fails with NumberFormatException
 Key: HIVE-15139
 URL: https://issues.apache.org/jira/browse/HIVE-15139
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li


It's because we store {{stageId_attemptNum}} in JobMetricsListener but expect 
only {{stageId}} in LocalSparkJobStatus.
{noformat}
java.lang.NumberFormatException: For input string: "0_0"
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at 
org.apache.hadoop.hive.ql.exec.spark.status.impl.LocalSparkJobStatus.getSparkStatistics(LocalSparkJobStatus.java:146)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:104)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15081) RetryingMetaStoreClient.getProxy(HiveConf, Boolean) doesn't match constructor of HiveMetaStoreClient

2016-10-26 Thread Rui Li (JIRA)
Rui Li created HIVE-15081:
-

 Summary: RetryingMetaStoreClient.getProxy(HiveConf, Boolean) 
doesn't match constructor of HiveMetaStoreClient
 Key: HIVE-15081
 URL: https://issues.apache.org/jira/browse/HIVE-15081
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li


Calling RetryingMetaStoreClient.getProxy(HiveConf, Boolean) will result in error
{noformat}
Exception in thread "main" java.lang.RuntimeException: Unable to instantiate 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1661)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:81)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:131)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:87)
Caused by: java.lang.NoSuchMethodException: 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(org.apache.hadoop.hive.conf.HiveConf,
 java.lang.Boolean)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15039) A better job monitor console output for HoS

2016-10-23 Thread Rui Li (JIRA)
Rui Li created HIVE-15039:
-

 Summary: A better job monitor console output for HoS
 Key: HIVE-15039
 URL: https://issues.apache.org/jira/browse/HIVE-15039
 Project: Hive
  Issue Type: Improvement
Reporter: Rui Li
Assignee: Rui Li


When there're many stages, it's very difficult to read the console output of 
job progress of HoS. Attached screenshot is an example.
We may learn from HoT as it does a much better than HoS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14728) Redundant orig files

2016-09-09 Thread Rui Li (JIRA)
Rui Li created HIVE-14728:
-

 Summary: Redundant orig files
 Key: HIVE-14728
 URL: https://issues.apache.org/jira/browse/HIVE-14728
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Priority: Minor


I find some orig files in master, e.g. SemanticAnalyzer.java.orig. Wondering if 
they are added by mistake?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14719) ASTNode rootNode is not maintained properly when changing child/parent relation

2016-09-08 Thread Rui Li (JIRA)
Rui Li created HIVE-14719:
-

 Summary: ASTNode rootNode is not maintained properly when changing 
child/parent relation
 Key: HIVE-14719
 URL: https://issues.apache.org/jira/browse/HIVE-14719
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li


When I run some query like:
{code}
set hive.cbo.enable=false;
select * from A where exists (select * from B where B.k1=A.k1 and B.k2=A.k2);
{code}
It gets error like:
{noformat}
FAILED: SemanticException Line 0:-1 Invalid table alias or column reference 
'sq_1': (possible column names are: _table_or_col b) k2) sq_corr_1)) (tok, (. 
(tok_table_or_col sq_1) sq_corr_1))
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14595) TimestampWritable::setTimestamp gives wrong result when 2nd VInt exists

2016-08-21 Thread Rui Li (JIRA)
Rui Li created HIVE-14595:
-

 Summary: TimestampWritable::setTimestamp gives wrong result when 
2nd VInt exists
 Key: HIVE-14595
 URL: https://issues.apache.org/jira/browse/HIVE-14595
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14412) Add a timezone-aware timestamp

2016-08-03 Thread Rui Li (JIRA)
Rui Li created HIVE-14412:
-

 Summary: Add a timezone-aware timestamp
 Key: HIVE-14412
 URL: https://issues.apache.org/jira/browse/HIVE-14412
 Project: Hive
  Issue Type: Sub-task
Reporter: Rui Li
Assignee: Rui Li


Java's Timestamp stores the time elapsed since the epoch. While it's by itself 
unambiguous, ambiguity comes when we parse a string into timestamp, or convert 
a timestamp to string, causing problems like HIVE-14305.
To solve the issue, I think we should make timestamp aware of timezone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14305) To/From UTC timestamp may return incorrect result because of DST

2016-07-21 Thread Rui Li (JIRA)
Rui Li created HIVE-14305:
-

 Summary: To/From UTC timestamp may return incorrect result because 
of DST
 Key: HIVE-14305
 URL: https://issues.apache.org/jira/browse/HIVE-14305
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14238) Ownership shouldn't be checked if external table location doesn't exist

2016-07-14 Thread Rui Li (JIRA)
Rui Li created HIVE-14238:
-

 Summary: Ownership shouldn't be checked if external table location 
doesn't exist
 Key: HIVE-14238
 URL: https://issues.apache.org/jira/browse/HIVE-14238
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li


When creating external table with SQL authorization, we require RWX permission 
+ ownership of the table location. If the location doesn't exist, we check on 
parent dir (recursively), which means we require the user owns everything under 
parent dir. I think this is not necessary - we don't have to check ownership of 
parent dir, or we just check non-recursively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14139) NPE dropping permanent function

2016-06-30 Thread Rui Li (JIRA)
Rui Li created HIVE-14139:
-

 Summary: NPE dropping permanent function
 Key: HIVE-14139
 URL: https://issues.apache.org/jira/browse/HIVE-14139
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li


To reproduce:
1. Start a CLI session and create a permanent function.
2. Exit current CLI session.
3. Start a new CLI session and drop the function.

Stack trace:
{noformat}
FAILED: error during drop function: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.Registry.removePersistentFunctionUnderLock(Registry.java:513)
at 
org.apache.hadoop.hive.ql.exec.Registry.unregisterFunction(Registry.java:501)
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.unregisterPermanentFunction(FunctionRegistry.java:1532)
at 
org.apache.hadoop.hive.ql.exec.FunctionTask.dropPermanentFunction(FunctionTask.java:228)
at 
org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:95)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1860)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1564)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1316)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1085)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1073)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13997) Insert overwrite directory doesn't overwrite existing files

2016-06-11 Thread Rui Li (JIRA)
Rui Li created HIVE-13997:
-

 Summary: Insert overwrite directory doesn't overwrite existing 
files
 Key: HIVE-13997
 URL: https://issues.apache.org/jira/browse/HIVE-13997
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li


Can be easily reproduced by running {{INSERT OVERWRITE DIRECTORY}} to the same 
dir twice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13921) Fix spark on yarn tests for HoS

2016-06-01 Thread Rui Li (JIRA)
Rui Li created HIVE-13921:
-

 Summary: Fix spark on yarn tests for HoS
 Key: HIVE-13921
 URL: https://issues.apache.org/jira/browse/HIVE-13921
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li


{{index_bitmap3}} and {{constprog_partitioner}} have been failing. Let's fix 
them here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13895) HoS start-up overhead in yarn-client mode

2016-05-29 Thread Rui Li (JIRA)
Rui Li created HIVE-13895:
-

 Summary: HoS start-up overhead in yarn-client mode
 Key: HIVE-13895
 URL: https://issues.apache.org/jira/browse/HIVE-13895
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li


To avoid the too verbose app state report, HIVE-13376 increases the state check 
interval to a default 60s. However, bigger interval brings considerable 
start-up wait time for yarn-client mode.
Since the state report only exists in yarn-cluster mode, we can disable it 
using {{spark.yarn.submit.waitAppCompletion}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13843) Re-enable the HoS tests disabled in HIVE-13402

2016-05-25 Thread Rui Li (JIRA)
Rui Li created HIVE-13843:
-

 Summary: Re-enable the HoS tests disabled in HIVE-13402
 Key: HIVE-13843
 URL: https://issues.apache.org/jira/browse/HIVE-13843
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li


With HIVE-13525, we can now fix and re-enable the tests for Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13789) Repeatedly checking configuration in TextRecordWriter/Reader hurts performance

2016-05-19 Thread Rui Li (JIRA)
Rui Li created HIVE-13789:
-

 Summary: Repeatedly checking configuration in 
TextRecordWriter/Reader hurts performance
 Key: HIVE-13789
 URL: https://issues.apache.org/jira/browse/HIVE-13789
 Project: Hive
  Issue Type: Improvement
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor


We check configuration to decide whether to escape certain characters each time 
write/read a record for custom scripts.
In our benchmark this becomes a hot spot method. And fixing it improves the 
execution of the custom script by 7% (3TB TPCx-BB dataset).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13662) Set file permission and ACL in file sink operator

2016-04-30 Thread Rui Li (JIRA)
Rui Li created HIVE-13662:
-

 Summary: Set file permission and ACL in file sink operator
 Key: HIVE-13662
 URL: https://issues.apache.org/jira/browse/HIVE-13662
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13572) Redundant setting full file status in Hive::copyFiles

2016-04-20 Thread Rui Li (JIRA)
Rui Li created HIVE-13572:
-

 Summary: Redundant setting full file status in Hive::copyFiles
 Key: HIVE-13572
 URL: https://issues.apache.org/jira/browse/HIVE-13572
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13525) HoS hangs when job is empty

2016-04-15 Thread Rui Li (JIRA)
Rui Li created HIVE-13525:
-

 Summary: HoS hangs when job is empty
 Key: HIVE-13525
 URL: https://issues.apache.org/jira/browse/HIVE-13525
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li


Observed in local tests. This should be the cause of HIVE-13402.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13066) Hive on Spark gives incorrect results when speculation is on

2016-02-16 Thread Rui Li (JIRA)
Rui Li created HIVE-13066:
-

 Summary: Hive on Spark gives incorrect results when speculation is 
on
 Key: HIVE-13066
 URL: https://issues.apache.org/jira/browse/HIVE-13066
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li


The issue is reported by users. One possible reason is that we always append 0 
as the attempt ID for each task so that hive won't be able to distinguish 
between speculative tasks and original ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12940) Merge master into spark [Spark Branch]

2016-01-26 Thread Rui Li (JIRA)
Rui Li created HIVE-12940:
-

 Summary: Merge master into spark [Spark Branch]
 Key: HIVE-12940
 URL: https://issues.apache.org/jira/browse/HIVE-12940
 Project: Hive
  Issue Type: Task
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12493) HIVE-11180 didn't merge cleanly to branch-1

2015-11-22 Thread Rui Li (JIRA)
Rui Li created HIVE-12493:
-

 Summary: HIVE-11180 didn't merge cleanly to branch-1
 Key: HIVE-12493
 URL: https://issues.apache.org/jira/browse/HIVE-12493
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12466) SparkCounter not initialized error

2015-11-19 Thread Rui Li (JIRA)
Rui Li created HIVE-12466:
-

 Summary: SparkCounter not initialized error
 Key: HIVE-12466
 URL: https://issues.apache.org/jira/browse/HIVE-12466
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Assignee: Xuefu Zhang


During a query, lots of the following error found in executor's log:
{noformat}
03:47:28.759 [Executor task launch worker-0] ERROR 
org.apache.hive.spark.counter.SparkCounters - counter[HIVE, RECORDS_OUT_0] has 
not initialized before.
03:47:28.762 [Executor task launch worker-1] ERROR 
org.apache.hive.spark.counter.SparkCounters - counter[HIVE, RECORDS_OUT_0] has 
not initialized before.
03:47:30.707 [Executor task launch worker-1] ERROR 
org.apache.hive.spark.counter.SparkCounters - counter[HIVE, 
RECORDS_OUT_1_default.tmp_tmp] has not initialized before.
03:47:33.385 [Executor task launch worker-1] ERROR 
org.apache.hive.spark.counter.SparkCounters - counter[HIVE, 
RECORDS_OUT_1_default.test_table] has not initialized before.
03:47:33.388 [Executor task launch worker-0] ERROR 
org.apache.hive.spark.counter.SparkCounters - counter[HIVE, 
RECORDS_OUT_1_default.test_table] has not initialized before.
03:47:33.495 [Executor task launch worker-0] ERROR 
org.apache.hive.spark.counter.SparkCounters - counter[HIVE, 
RECORDS_OUT_1_default.test_table] has not initialized before.
03:47:35.141 [Executor task launch worker-1] ERROR 
org.apache.hive.spark.counter.SparkCounters - counter[HIVE, 
RECORDS_OUT_1_default.test_table] has not initialized before.

...
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12283) Fix test failures after HIVE-11844 [Spark Branch]

2015-10-27 Thread Rui Li (JIRA)
Rui Li created HIVE-12283:
-

 Summary: Fix test failures after HIVE-11844 [Spark Branch]
 Key: HIVE-12283
 URL: https://issues.apache.org/jira/browse/HIVE-12283
 Project: Hive
  Issue Type: Sub-task
Reporter: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11183) Enable optimized hash tables for spark [Spark Branch]

2015-07-06 Thread Rui Li (JIRA)
Rui Li created HIVE-11183:
-

 Summary: Enable optimized hash tables for spark [Spark Branch]
 Key: HIVE-11183
 URL: https://issues.apache.org/jira/browse/HIVE-11183
 Project: Hive
  Issue Type: Improvement
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11182) Enable optimized hash tables for spark [Spark Branch]

2015-07-06 Thread Rui Li (JIRA)
Rui Li created HIVE-11182:
-

 Summary: Enable optimized hash tables for spark [Spark Branch]
 Key: HIVE-11182
 URL: https://issues.apache.org/jira/browse/HIVE-11182
 Project: Hive
  Issue Type: Improvement
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11180) Enable native vectorized map join for spark [Spark Branch]

2015-07-03 Thread Rui Li (JIRA)
Rui Li created HIVE-11180:
-

 Summary: Enable native vectorized map join for spark [Spark Branch]
 Key: HIVE-11180
 URL: https://issues.apache.org/jira/browse/HIVE-11180
 Project: Hive
  Issue Type: Sub-task
Reporter: Rui Li
Assignee: Rui Li


The improvement was introduced in HIVE-9824. Let's use this task to track how 
we can enable that for spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11138) Query fails when there isn't a comparator for an operator [Spark Branch]

2015-06-28 Thread Rui Li (JIRA)
Rui Li created HIVE-11138:
-

 Summary: Query fails when there isn't a comparator for an operator 
[Spark Branch]
 Key: HIVE-11138
 URL: https://issues.apache.org/jira/browse/HIVE-11138
 Project: Hive
  Issue Type: Sub-task
Reporter: Rui Li
Assignee: Rui Li


In such case, OperatorComparatorFactory should default to false instead of 
throw exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11109) Replication factor is not properly set in SparkHashTableSinkOperator [Spark Branch]

2015-06-25 Thread Rui Li (JIRA)
Rui Li created HIVE-11109:
-

 Summary: Replication factor is not properly set in 
SparkHashTableSinkOperator [Spark Branch]
 Key: HIVE-11109
 URL: https://issues.apache.org/jira/browse/HIVE-11109
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li
Priority: Trivial


The replication factor only gets set in some abnormal cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11108) HashTableSinkOperator doesn't support vectorization [Spark Branch]

2015-06-25 Thread Rui Li (JIRA)
Rui Li created HIVE-11108:
-

 Summary: HashTableSinkOperator doesn't support vectorization 
[Spark Branch]
 Key: HIVE-11108
 URL: https://issues.apache.org/jira/browse/HIVE-11108
 Project: Hive
  Issue Type: Sub-task
Reporter: Rui Li


This prevents any BaseWork containing HTS from being vectorized. It's basically 
specific to spark, because Tez doesn't use HTS and MR runs HTS in local tasks.
We should verify if it makes sense to make HTS support vectorization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11032) Enable more tests for grouping by skewed data [Spark Branch]

2015-06-16 Thread Rui Li (JIRA)
Rui Li created HIVE-11032:
-

 Summary: Enable more tests for grouping by skewed data [Spark 
Branch]
 Key: HIVE-11032
 URL: https://issues.apache.org/jira/browse/HIVE-11032
 Project: Hive
  Issue Type: Sub-task
Reporter: Rui Li
Priority: Minor


Not all of such tests are enabled, e.g. {{groupby1_map_skew.q}}. We can use 
this JIRA to track whether we need more of them.
Basically, we need to look at all tests with {{set 
hive.groupby.skewindata=true;}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10989) Spark can't control number of map tasks for runtime skew join [Spark Branch]

2015-06-12 Thread Rui Li (JIRA)
Rui Li created HIVE-10989:
-

 Summary: Spark can't control number of map tasks for runtime skew 
join [Spark Branch]
 Key: HIVE-10989
 URL: https://issues.apache.org/jira/browse/HIVE-10989
 Project: Hive
  Issue Type: Sub-task
Reporter: Rui Li
Assignee: Rui Li


Flags {{hive.skewjoin.mapjoin.map.tasks}} and 
{{hive.skewjoin.mapjoin.min.split}} are used to control the number of map tasks 
for the map join of runtime skew join. They work well for MR but have no effect 
for spark.
This makes runtime skew join less useful, i.e. we just end up with slow mappers 
instead of reducers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10903) Add hive.in.test for HoS tests [Spark Branch]

2015-06-03 Thread Rui Li (JIRA)
Rui Li created HIVE-10903:
-

 Summary: Add hive.in.test for HoS tests [Spark Branch]
 Key: HIVE-10903
 URL: https://issues.apache.org/jira/browse/HIVE-10903
 Project: Hive
  Issue Type: Test
Reporter: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10816) NPE in ExecDriver::handleSampling when submitted via child JVM

2015-05-25 Thread Rui Li (JIRA)
Rui Li created HIVE-10816:
-

 Summary: NPE in ExecDriver::handleSampling when submitted via 
child JVM
 Key: HIVE-10816
 URL: https://issues.apache.org/jira/browse/HIVE-10816
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li


When {{hive.exec.submitviachild = true}}, parallel order by fails with NPE and 
falls back to single-reducer mode. Stack trace:
{noformat}
2015-05-25 08:41:04,446 ERROR [main]: mr.ExecDriver 
(ExecDriver.java:execute(386)) - Sampling error
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.handleSampling(ExecDriver.java:513)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:379)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:750)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10527) NPE in SparkUtilities::isDedicatedCluster

2015-04-28 Thread Rui Li (JIRA)
Rui Li created HIVE-10527:
-

 Summary: NPE in SparkUtilities::isDedicatedCluster
 Key: HIVE-10527
 URL: https://issues.apache.org/jira/browse/HIVE-10527
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li


We should add {{spark.master}} to HiveConf when it doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10458) Enable parallel order by for spark [Spark Branch]

2015-04-22 Thread Rui Li (JIRA)
Rui Li created HIVE-10458:
-

 Summary: Enable parallel order by for spark [Spark Branch]
 Key: HIVE-10458
 URL: https://issues.apache.org/jira/browse/HIVE-10458
 Project: Hive
  Issue Type: Sub-task
Reporter: Rui Li
Assignee: Rui Li


We don't have to force reducer# to 1 as spark supports parallel sorting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10261) Data size can be underestimated when computed with partial column stats

2015-04-08 Thread Rui Li (JIRA)
Rui Li created HIVE-10261:
-

 Summary: Data size can be underestimated when computed with 
partial column stats
 Key: HIVE-10261
 URL: https://issues.apache.org/jira/browse/HIVE-10261
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li


With {{hive.stats.fetch.column.stats=true}}, we'll estimate data size with 
column  stats when annotating operators with statistics. However, when column 
stats is partial, we're likely to underestimate data size, which may hurt 
performance, e.g. picking an inappropriate small table for map join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9969) Avoid Utilities.getMapRedWork for spark [Spark Branch]

2015-03-15 Thread Rui Li (JIRA)
Rui Li created HIVE-9969:


 Summary: Avoid Utilities.getMapRedWork for spark [Spark Branch]
 Key: HIVE-9969
 URL: https://issues.apache.org/jira/browse/HIVE-9969
 Project: Hive
  Issue Type: Sub-task
Reporter: Rui Li
Priority: Minor


The method shouldn't be used for spark mode. Specifically, map work and reduce 
work have different plan paths in spark. Calling this method will leave lots of 
errors in executor's log:
{noformat}
15/03/16 02:57:23 INFO Utilities: Open file to read in plan: 
hdfs://node13-1:8020/tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml
15/03/16 02:57:23 INFO Utilities: File not found: File does not exist: 
/tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9924) Add SORT_QUERY_RESULT to union12.q

2015-03-11 Thread Rui Li (JIRA)
Rui Li created HIVE-9924:


 Summary: Add SORT_QUERY_RESULT to union12.q
 Key: HIVE-9924
 URL: https://issues.apache.org/jira/browse/HIVE-9924
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9927) MR doesn't produce correct result for runtime_skewjoin_mapjoin_spark

2015-03-11 Thread Rui Li (JIRA)
Rui Li created HIVE-9927:


 Summary: MR doesn't produce correct result for 
runtime_skewjoin_mapjoin_spark
 Key: HIVE-9927
 URL: https://issues.apache.org/jira/browse/HIVE-9927
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9869) Trunk doesn't build with hadoop-1

2015-03-04 Thread Rui Li (JIRA)
Rui Li created HIVE-9869:


 Summary: Trunk doesn't build with hadoop-1
 Key: HIVE-9869
 URL: https://issues.apache.org/jira/browse/HIVE-9869
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9855) Runtime skew join doesn't work when skewed data only exists in big table

2015-03-04 Thread Rui Li (JIRA)
Rui Li created HIVE-9855:


 Summary: Runtime skew join doesn't work when skewed data only 
exists in big table
 Key: HIVE-9855
 URL: https://issues.apache.org/jira/browse/HIVE-9855
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li


To reproduce, enable runtime skew join and then join two tables that skewed 
data only exists in one of them. The task will fail with the following 
exception:
{noformat}
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: 
java.io.IOException: Unable to rename output to: hdfs://..
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-18 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326921#comment-14326921
 ] 

Rui Li commented on HIVE-9561:
--

Thank you Xuefu!

 SHUFFLE_SORT should only be used for order by query [Spark Branch]
 --

 Key: HIVE-9561
 URL: https://issues.apache.org/jira/browse/HIVE-9561
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Fix For: spark-branch

 Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, 
 HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.5-spark.patch, 
 HIVE-9561.6-spark.patch


 The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
 and are difficult to control. So we should limit the use of {{sortByKey}} to 
 order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-17 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325120#comment-14325120
 ] 

Rui Li commented on HIVE-9561:
--

Hi [~xuefuz], thanks very much for taking care of this. I can't really work on 
it due to limited network access. Sorry for the inconvenience.

 SHUFFLE_SORT should only be used for order by query [Spark Branch]
 --

 Key: HIVE-9561
 URL: https://issues.apache.org/jira/browse/HIVE-9561
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, 
 HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.5-spark.patch, 
 HIVE-9561.6-spark.patch


 The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
 and are difficult to control. So we should limit the use of {{sortByKey}} to 
 order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-16 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9561:
-
Attachment: (was: HIVE-9561.4-spark.patch)

 SHUFFLE_SORT should only be used for order by query [Spark Branch]
 --

 Key: HIVE-9561
 URL: https://issues.apache.org/jira/browse/HIVE-9561
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, 
 HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch


 The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
 and are difficult to control. So we should limit the use of {{sortByKey}} to 
 order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-16 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9561:
-
Attachment: HIVE-9561.4-spark.patch

Try again

 SHUFFLE_SORT should only be used for order by query [Spark Branch]
 --

 Key: HIVE-9561
 URL: https://issues.apache.org/jira/browse/HIVE-9561
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, 
 HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.4-spark.patch, 
 HIVE-9561.4-spark.patch


 The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
 and are difficult to control. So we should limit the use of {{sortByKey}} to 
 order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9696) Address RB comments for HIVE-9425 [Spark Branch]

2015-02-16 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323565#comment-14323565
 ] 

Rui Li commented on HIVE-9696:
--

The failure of union3 is introduced when I merge HIVE-9666 into spark.
It should be fixed in HIVE-9561.

 Address RB comments for HIVE-9425 [Spark Branch]
 

 Key: HIVE-9696
 URL: https://issues.apache.org/jira/browse/HIVE-9696
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Priority: Trivial
 Attachments: HIVE-9696.1-spark.patch, HIVE-9696.1-spark.patch, 
 HIVE-9696.1-spark.patch


 A followup task of HIVE-9425.
 The pending RB comment can be found 
 [here|https://reviews.apache.org/r/30984/#comment118482].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-16 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9561:
-
Attachment: (was: HIVE-9561.4-spark.patch)

 SHUFFLE_SORT should only be used for order by query [Spark Branch]
 --

 Key: HIVE-9561
 URL: https://issues.apache.org/jira/browse/HIVE-9561
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, 
 HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.4-spark.patch


 The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
 and are difficult to control. So we should limit the use of {{sortByKey}} to 
 order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9666) Improve some qtests

2015-02-15 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9666:
-
Fix Version/s: 1.2.0

 Improve some qtests
 ---

 Key: HIVE-9666
 URL: https://issues.apache.org/jira/browse/HIVE-9666
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor
 Fix For: 1.2.0

 Attachments: HIVE-9666.1.patch, HIVE-9666.2.patch


 {code}
 groupby7_noskew_multi_single_reducer.q
 groupby_multi_single_reducer3.q
 parallel_join0.q
 union3.q
 union4.q
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9696) Address RB comments for HIVE-9425 [Spark Branch]

2015-02-15 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9696:
-
Description: 
A followup task of HIVE-9425.
The pending RB comment can be found 
[here|https://reviews.apache.org/r/30984/#comment118482].

  was:
A followup task of HIVE-9425.
Then pending RB comment can be found 
[here|https://reviews.apache.org/r/30984/#comment118482].


 Address RB comments for HIVE-9425 [Spark Branch]
 

 Key: HIVE-9696
 URL: https://issues.apache.org/jira/browse/HIVE-9696
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Priority: Trivial
 Attachments: HIVE-9696.1-spark.patch


 A followup task of HIVE-9425.
 The pending RB comment can be found 
 [here|https://reviews.apache.org/r/30984/#comment118482].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9696) Address RB comments for HIVE-9425 [Spark Branch]

2015-02-15 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9696:
-
Summary: Address RB comments for HIVE-9425 [Spark Branch]  (was: Address RB 
comments for HIVE-9425)

 Address RB comments for HIVE-9425 [Spark Branch]
 

 Key: HIVE-9696
 URL: https://issues.apache.org/jira/browse/HIVE-9696
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Priority: Trivial

 A followup task of HIVE-9425.
 Then pending RB comment can be found 
 [here|https://reviews.apache.org/r/30984/#comment118482].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9696) Address RB comments for HIVE-9425

2015-02-15 Thread Rui Li (JIRA)
Rui Li created HIVE-9696:


 Summary: Address RB comments for HIVE-9425
 Key: HIVE-9696
 URL: https://issues.apache.org/jira/browse/HIVE-9696
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Priority: Trivial


A followup task of HIVE-9425.
Then pending RB comment can be found 
[here|https://reviews.apache.org/r/30984/#comment118482].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9666) Improve some qtests

2015-02-15 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321795#comment-14321795
 ] 

Rui Li commented on HIVE-9666:
--

Committed to trunk and merged into spark. Hope I didn't screw up anything.
Thanks [~xuefuz] for the review.

 Improve some qtests
 ---

 Key: HIVE-9666
 URL: https://issues.apache.org/jira/browse/HIVE-9666
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor
 Attachments: HIVE-9666.1.patch, HIVE-9666.2.patch


 {code}
 groupby7_noskew_multi_single_reducer.q
 groupby_multi_single_reducer3.q
 parallel_join0.q
 union3.q
 union4.q
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9696) Address RB comments for HIVE-9425 [Spark Branch]

2015-02-15 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9696:
-
Status: Patch Available  (was: Open)

 Address RB comments for HIVE-9425 [Spark Branch]
 

 Key: HIVE-9696
 URL: https://issues.apache.org/jira/browse/HIVE-9696
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Priority: Trivial
 Attachments: HIVE-9696.1-spark.patch


 A followup task of HIVE-9425.
 Then pending RB comment can be found 
 [here|https://reviews.apache.org/r/30984/#comment118482].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]

2015-02-15 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14321958#comment-14321958
 ] 

Rui Li commented on HIVE-9659:
--

Hi [~jxiang], could you elaborate how you reproduced this?

 'Error while trying to create table container' occurs during hive query case 
 execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
 ---

 Key: HIVE-9659
 URL: https://issues.apache.org/jira/browse/HIVE-9659
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xin Hao

 We found that 'Error while trying to create table container'  occurs during 
 Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'.
 If hive.optimize.skewjoin set to 'false', the case could pass.
 How to reproduce:
 1. set hive.optimize.skewjoin=true;
 2. Run BigBench case Q12 and it will fail. 
 Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you 
 will found error 'Error while trying to create table container' in the log 
 and also a NullPointerException near the end of the log.
 (a) Detail error message for 'Error while trying to create table container':
 {noformat}
 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
   at 
 org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while 
 trying to create table container
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115)
   ... 21 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a 
 directory: 
 hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106)
   ... 22 more
 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480
 15/02/12 01:29:49 INFO PerfLogger: PERFLOG method=SparkInitializeOperators 
 from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler
 {noformat}
 (b) Detail error message for NullPointerException:
 {noformat}
 5/02/12 01:29:50 ERROR MapJoinOperator: 

[jira] [Updated] (HIVE-9666) Improve some qtests

2015-02-15 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9666:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Improve some qtests
 ---

 Key: HIVE-9666
 URL: https://issues.apache.org/jira/browse/HIVE-9666
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor
 Fix For: 1.2.0

 Attachments: HIVE-9666.1.patch, HIVE-9666.2.patch


 {code}
 groupby7_noskew_multi_single_reducer.q
 groupby_multi_single_reducer3.q
 parallel_join0.q
 union3.q
 union4.q
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-15 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9561:
-
Attachment: HIVE-9561.4-spark.patch

The failures seem strange. Try again.

 SHUFFLE_SORT should only be used for order by query [Spark Branch]
 --

 Key: HIVE-9561
 URL: https://issues.apache.org/jira/browse/HIVE-9561
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, 
 HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.4-spark.patch


 The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
 and are difficult to control. So we should limit the use of {{sortByKey}} to 
 order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-15 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9561:
-
Attachment: HIVE-9561.4-spark.patch

Rebase my patch.

 SHUFFLE_SORT should only be used for order by query [Spark Branch]
 --

 Key: HIVE-9561
 URL: https://issues.apache.org/jira/browse/HIVE-9561
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, 
 HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch


 The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
 and are difficult to control. So we should limit the use of {{sortByKey}} to 
 order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   >