[jira] [Commented] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column

2015-10-28 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979282#comment-14979282
 ] 

Xuefu Zhang commented on HIVE-12063:


Yeah. It's a change, but I wouldn't call it out as an incompatibility though. 
Release note makes sense. Thanks.

> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: HIVE-12063
> URL: https://issues.apache.org/jira/browse/HIVE-12063
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12063.1.patch, HIVE-12063.2.patch, HIVE-12063.patch
>
>
> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
> which caused many problems including treating 0.0, 0.00 and so on as 0, which 
> has different precision/scale. Please refer to HIVE-7373 description. 
> However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems 
> remained. HIVE-11835 was resolved recently to address one of the problems, 
> where 0.0, 0.00, and so on cannot be read into decimal(1,1).
> However, HIVE-11835 didn't address the problem of showing as 0 in query 
> result for any decimal values such as 0.0, 0.00, etc. This causes confusion 
> as 0 and 0.0 have different precision/scale than 0.
> The proposal here is to pad zeros for query result to the type's scale. This 
> not only removes the confusion described above, but also aligns with many 
> other DBs. Internal decimal number representation doesn't change, however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11616) DelegationTokenSecretManager reuse the same objectstore ,which has cocurrent issue

2015-10-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-11616:
---
Description: 
sometime in metastore log, will get below exception,  after analysis, we found 
that :
when hivemetastore start, the DelegationTokenSecretManager will maintain the 
same objectstore, see here
{code}
saslServer.startDelegationTokenSecretManager(conf, *baseHandler.getMS()*, 
ServerMode.METASTORE);
{code}
this lead to the cocurrent issue.
{code}
2015-08-18 20:59:10,520 | ERROR | pool-6-thread-200 | Error occurred during 
processing of message. | 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:296)
org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: 
org.datanucleus.transaction.NucleusTransactionException: Invalid state. 
Transaction has already started
at 
org.apache.hadoop.hive.thrift.DBTokenStore.invokeOnRawStore(DBTokenStore.java:154)
at 
org.apache.hadoop.hive.thrift.DBTokenStore.getToken(DBTokenStore.java:88)
at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:112)
at 
org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:56)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.getPassword(HadoopThriftAuthBridge.java:565)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.handle(HadoopThriftAuthBridge.java:596)
at 
com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
at 
com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
at 
org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:539)
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283)
at 
org.apache.thrift.transport.HiveTSaslServerTransport.open(HiveTSaslServerTransport.java:133)
at 
org.apache.thrift.transport.HiveTSaslServerTransport$Factory.getTransport(HiveTSaslServerTransport.java:261)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:739)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:736)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1652)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:736)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.datanucleus.transaction.NucleusTransactionException: Invalid 
state. Transaction has already started
at 
org.datanucleus.transaction.TransactionManager.begin(TransactionManager.java:47)
at org.datanucleus.TransactionImpl.begin(TransactionImpl.java:131)
at 
org.datanucleus.api.jdo.JDOTransaction.internalBegin(JDOTransaction.java:88)
at org.datanucleus.api.jdo.JDOTransaction.begin(JDOTransaction.java:80)
at 
org.apache.hadoop.hive.metastore.ObjectStore.openTransaction(ObjectStore.java:420)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getToken(ObjectStore.java:6455)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:98)
at com.sun.proxy.$Proxy4.getToken(Unknown Source)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.hive.thrift.DBTokenStore.invokeOnRawStore(DBTokenStore.java:146)
... 21 more
{code}

  was:
sometime in metastore log, will get below exception,  after analysis, we found 
that :
when hivemetastore start, the DelegationTokenSecretManager will maintain the 
same objectstore, see here
saslServer.startDelegationTokenSecretManager(conf, *baseHandler.getMS()*, 
ServerMode.METASTORE);
this lead to the 

[jira] [Commented] (HIVE-11616) DelegationTokenSecretManager reuse the same objectstore ,which has cocurrent issue

2015-10-28 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979306#comment-14979306
 ] 

Xuefu Zhang commented on HIVE-11616:


Agreed for users hit the bug, but for preventive measures, it's good to know 
for a user to know if their version is affected and they might want to move to 
the newer version or backport the fix.

I understand sometimes it's hard to list all the affected versions. Sometimes 
we do, when we know when a piece of the code or feature is introduced. In that 
case, it doesn't take much to do so. I guess backtracking last few release is 
reasonable.

> DelegationTokenSecretManager reuse the same objectstore ,which has cocurrent 
> issue
> --
>
> Key: HIVE-11616
> URL: https://issues.apache.org/jira/browse/HIVE-11616
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1
>Reporter: wangwenli
>Assignee: Cody Fu
> Attachments: HIVE-11616.01.patch, HIVE-11616.02.patch, 
> HIVE-11616.patch
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> sometime in metastore log, will get below exception,  after analysis, we 
> found that :
> when hivemetastore start, the DelegationTokenSecretManager will maintain the 
> same objectstore, see here
> {code}
> saslServer.startDelegationTokenSecretManager(conf, *baseHandler.getMS()*, 
> ServerMode.METASTORE);
> {code}
> this lead to the cocurrent issue.
> {code}
> 2015-08-18 20:59:10,520 | ERROR | pool-6-thread-200 | Error occurred during 
> processing of message. | 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:296)
> org.apache.hadoop.hive.thrift.DelegationTokenStore$TokenStoreException: 
> org.datanucleus.transaction.NucleusTransactionException: Invalid state. 
> Transaction has already started
>   at 
> org.apache.hadoop.hive.thrift.DBTokenStore.invokeOnRawStore(DBTokenStore.java:154)
>   at 
> org.apache.hadoop.hive.thrift.DBTokenStore.getToken(DBTokenStore.java:88)
>   at 
> org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:112)
>   at 
> org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:56)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.getPassword(HadoopThriftAuthBridge.java:565)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.handle(HadoopThriftAuthBridge.java:596)
>   at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
>   at 
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
>   at 
> org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:539)
>   at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283)
>   at 
> org.apache.thrift.transport.HiveTSaslServerTransport.open(HiveTSaslServerTransport.java:133)
>   at 
> org.apache.thrift.transport.HiveTSaslServerTransport$Factory.getTransport(HiveTSaslServerTransport.java:261)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:739)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:736)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1652)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:736)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.datanucleus.transaction.NucleusTransactionException: Invalid 
> state. Transaction has already started
>   at 
> org.datanucleus.transaction.TransactionManager.begin(TransactionManager.java:47)
>   at org.datanucleus.TransactionImpl.begin(TransactionImpl.java:131)
>   at 
> org.datanucleus.api.jdo.JDOTransaction.internalBegin(JDOTransaction.java:88)
>   at org.datanucleus.api.jdo.JDOTransaction.begin(JDOTransaction.java:80)
>   at 
> 

[jira] [Updated] (HIVE-9882) Add jar/file doesn't work with yarn-cluster mode [Spark Branch]

2015-10-22 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9882:
--
Component/s: (was: spark-branch)
 Spark

> Add jar/file doesn't work with yarn-cluster mode [Spark Branch]
> ---
>
> Key: HIVE-9882
> URL: https://issues.apache.org/jira/browse/HIVE-9882
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Spark
>Affects Versions: spark-branch
>Reporter: Xiaomin Zhang
>Assignee: Rui Li
> Fix For: 1.2.0
>
> Attachments: HIVE-9882.1-spark.patch, HIVE-9882.1-spark.patch
>
>
> It seems current fix for HIVE-9425 only uploads the Jar/Files to HDFS, 
> however, they are not accessible by the Driver/Executor.
> I found below in the AM log:
> {noformat}
> 15/02/26 15:10:36 INFO Configuration.deprecation: mapred.min.split.size is 
> deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
> 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added 
> jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/hive-exec-1.2.0-SNAPSHOT.jar]
>  to classpath.
> 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added 
> jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/opennlp-maxent-3.0.3.jar]
>  to classpath.
> 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added 
> jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/bigbenchqueriesmr.jar]
>  to classpath.
> 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added 
> jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/opennlp-tools-1.5.3.jar]
>  to classpath.
> 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added 
> jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/jcl-over-slf4j-1.7.5.jar]
>  to classpath.
> 15/02/26 15:10:36 INFO client.RemoteDriver: Failed to run job 
> 6886df05-f430-456c-a0ff-c7621db712d6
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: de.bankmark.bigbench.queries.q10.SentimentUDF 
> {noformat}
> As above shows, the file path which was attempted to add to Classpath is 
> invalid, so actually all uploaded Jars/Files are still not available for 
> Driver/Executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-12222) Define port range in property for RPCServer

2015-10-21 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967996#comment-14967996
 ] 

Xuefu Zhang edited comment on HIVE-1 at 10/21/15 9:38 PM:
--

[~alee526], are you interested in contributing on this?


was (Author: xuefuz):
[~alee526], are you going to contribute on this?

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12222) Define port range in property for RPCServer

2015-10-21 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967996#comment-14967996
 ] 

Xuefu Zhang commented on HIVE-1:


[~alee526], are you going to contribute on this?

> Define port range in property for RPCServer
> ---
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.2.1
> Environment: Apache Hadoop 2.7.0
> Apache Hive 1.2.1
> Apache Spark 1.5.1
>Reporter: Andrew Lee
>
> Creating this JIRA after discussin with Xuefu on the dev mailing list. Would 
> need some help to review and update the fields in this JIRA ticket, thanks.
> I notice that in 
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> The port number is assigned with 0 which means it will be a random port every 
> time when the RPC Server is created to talk to Spark in the same session.
> Because of this, this is causing problems to configure firewall between the 
> HiveCLI RPC Server and Spark due to unpredictable port numbers here. In other 
> word, users need to open all hive ports range 
> from Data Node => HiveCLI (edge node).
> {code}
>  this.channel = new ServerBootstrap()
>   .group(group)
>   .channel(NioServerSocketChannel.class)
>   .childHandler(new ChannelInitializer() {
>   @Override
>   public void initChannel(SocketChannel ch) throws Exception {
> SaslServerHandler saslHandler = new SaslServerHandler(config);
> final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, 
> group);
> saslHandler.rpc = newRpc;
> Runnable cancelTask = new Runnable() {
> @Override
> public void run() {
>   LOG.warn("Timed out waiting for hello from client.");
>   newRpc.close();
> }
> };
> saslHandler.cancelTask = group.schedule(cancelTask,
> RpcServer.this.config.getServerConnectTimeoutMs(),
> TimeUnit.MILLISECONDS);
>   }
>   })
> {code}
> 2 Main reasons.
> - Most users (what I see and encounter) use HiveCLI as a command line tool, 
> and in order to use that, they need to login to the edge node (via SSH). Now, 
> here comes the interesting part.
> Could be true or not, but this is what I observe and encounter from time to 
> time. Most users will abuse the resource on that edge node (increasing 
> HADOOP_HEAPSIZE, dumping output to local disk, running huge python workflow, 
> etc), this may cause the HS2 process to run into OOME, choke and die, etc. 
> various resource issues including others like login, etc.
> - Analyst connects to Hive via HS2 + ODBC. So HS2 needs to be highly 
> available. This makes sense to run it on the gateway node or a service node 
> and separated from the HiveCLI.
> The logs are located in different location, monitoring and auditing is easier 
> to run HS2 with a daemon user account, etc. so we don't want users to run 
> HiveCLI where HS2 is running.
> It's better to isolate the resource this way to avoid any memory, file 
> handlers, disk space, issues.
> From a security standpoint, 
> - Since users can login to edge node (via SSH), the security on the edge node 
> needs to be fortified and enhanced. Therefore, all the FW comes in and 
> auditing.
> - Regulation/compliance for auditing is another requirement to monitor all 
> traffic, specifying ports and locking down the ports makes it easier since we 
> can focus
> on a range to monitor and audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column

2015-10-21 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968082#comment-14968082
 ] 

Xuefu Zhang commented on HIVE-12063:


Thanks, Szehon. Please note, this is actually not that far from my original 
thought in HIVE-7373. My point there was that we shouldn't append zeros or 
trimming trailing zeros. The patch here doesn't append zeros internally, but 
mainly formatting output according to the output schema. (HIVE-7373 failed in 
this because it changed the internal representation.) This is in line with 
other DBs, though I'm not aware of any SQL standard on this. Yes, I said that 
the practice of outputting with appending zeros was questionable, but it makes 
sense in Hive's case as Hive aggressively trims 0.0, 0.00, 0.00 etc all the 
way to 0, which is too confusing.

BTW, all vectorization tests passed. [~jdere] or [~hagleitn], please review and 
comment. Thanks.

> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: HIVE-12063
> URL: https://issues.apache.org/jira/browse/HIVE-12063
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12063.1.patch, HIVE-12063.2.patch, HIVE-12063.patch
>
>
> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
> which caused many problems including treating 0.0, 0.00 and so on as 0, which 
> has different precision/scale. Please refer to HIVE-7373 description. 
> However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems 
> remained. HIVE-11835 was resolved recently to address one of the problems, 
> where 0.0, 0.00, and so on cannot be read into decimal(1,1).
> However, HIVE-11835 didn't address the problem of showing as 0 in query 
> result for any decimal values such as 0.0, 0.00, etc. This causes confusion 
> as 0 and 0.0 have different precision/scale than 0.
> The proposal here is to pad zeros for query result to the type's scale. This 
> not only removes the confusion described above, but also aligns with many 
> other DBs. Internal decimal number representation doesn't change, however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11985) don't store type names in metastore when metastore type names are not used

2015-10-21 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968111#comment-14968111
 ] 

Xuefu Zhang commented on HIVE-11985:


[~sershe], could you explained a little on your new approach. I cannot follow 
through the patch to the extent of full understanding. It would be nice if a RB 
entry can be provided as the changes have become non-trial.

> don't store type names in metastore when metastore type names are not used
> --
>
> Key: HIVE-11985
> URL: https://issues.apache.org/jira/browse/HIVE-11985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11985.01.patch, HIVE-11985.02.patch, 
> HIVE-11985.03.patch, HIVE-11985.05.patch, HIVE-11985.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11100) Beeline should escape semi-colon in queries

2015-10-21 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968388#comment-14968388
 ] 

Xuefu Zhang commented on HIVE-11100:


What makes ';' start appearing in the connection string and did that happen 
before this one? Since ';' is reserved for query terminator, it seems 
reasonable to require escaping it all the time.

> Beeline should escape semi-colon in queries
> ---
>
> Key: HIVE-11100
> URL: https://issues.apache.org/jira/browse/HIVE-11100
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.2.0, 1.1.0
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>Priority: Minor
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11100.patch
>
>
> Beeline should escape the semicolon in queries. for example, the query like 
> followings:
> CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ';' LINES TERMINATED BY '\n';
> or 
> CREATE TABLE beeline_tb (c1 int, c2 string) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY '\;' LINES TERMINATED BY '\n';
> both failed.
> But the 2nd query with semicolon escaped with "\" works in CLI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11473) Upgrade Spark dependency to 1.5 [Spark Branch]

2015-10-21 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968430#comment-14968430
 ] 

Xuefu Zhang commented on HIVE-11473:


Unfortunately no. However, I think we can commit the patch as it is. It's just 
that we need to verify locally the test failures.

[~spena], have you made any progress in recreating precommit instance for Spark 
branch? Thanks.

> Upgrade Spark dependency to 1.5 [Spark Branch]
> --
>
> Key: HIVE-11473
> URL: https://issues.apache.org/jira/browse/HIVE-11473
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Jimmy Xiang
>Assignee: Rui Li
> Attachments: HIVE-11473.1-spark.patch, HIVE-11473.2-spark.patch, 
> HIVE-11473.3-spark.patch, HIVE-11473.3-spark.patch
>
>
> In Spark 1.5, SparkListener interface is changed. So HoS may fail to create 
> the spark client if the un-implemented event callback method is invoked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11473) Upgrade Spark dependency to 1.5 [Spark Branch]

2015-10-21 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968452#comment-14968452
 ] 

Xuefu Zhang commented on HIVE-11473:


Yes, it passed on my side.

Please also feel free to work on master directly for any Spark related JIRAs, 
as the job queue isn't long these days.

> Upgrade Spark dependency to 1.5 [Spark Branch]
> --
>
> Key: HIVE-11473
> URL: https://issues.apache.org/jira/browse/HIVE-11473
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Jimmy Xiang
>Assignee: Rui Li
> Attachments: HIVE-11473.1-spark.patch, HIVE-11473.2-spark.patch, 
> HIVE-11473.3-spark.patch, HIVE-11473.3-spark.patch
>
>
> In Spark 1.5, SparkListener interface is changed. So HoS may fail to create 
> the spark client if the un-implemented event callback method is invoked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12370) Hive Query got failure with larger scale data set with enablng sampling order optimization

2015-11-09 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996629#comment-14996629
 ] 

Xuefu Zhang commented on HIVE-12370:


Have you tried other data format? The stack trace seems suggesting a problem of 
that.

> Hive Query got failure with larger scale data set with enablng sampling order 
> optimization
> --
>
> Key: HIVE-12370
> URL: https://issues.apache.org/jira/browse/HIVE-12370
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Yi Zhou
>
> Found that hive would get failure on Hive on MR with larger scale 
> data(e.g.,3TB/10TB) when enabling sampling optimization(it got passed with 
> 1GB data set).
> hive.optimize.sampling.orderby=true
> hive.optimize.sampling.orderby.number=2
> hive.optimize.sampling.orderby.percent=0.1
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
> ... 8 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:121)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
> ... 9 more
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12045) ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)

2015-11-09 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996587#comment-14996587
 ] 

Xuefu Zhang commented on HIVE-12045:


Thanks, Rui. I didn't know that. Yes, I think we should if it doesn't cause too 
much trouble.

> ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)
> --
>
> Key: HIVE-12045
> URL: https://issues.apache.org/jira/browse/HIVE-12045
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
> Environment: Cloudera QuickStart VM - CDH5.4.2
> beeline
>Reporter: Zsolt Tóth
>Assignee: Rui Li
> Attachments: HIVE-12045.1-spark.patch, example.jar, genUDF.patch
>
>
> If I execute the following query in beeline, I get ClassNotFoundException for 
> the UDF class.
> {code}
> drop function myGenericUdf;
> create function myGenericUdf as 'org.example.myGenericUdf' using jar 
> 'hdfs:///tmp/myudf.jar';
> select distinct myGenericUdf(1,2,1) from mytable;
> {code}
> In my example, myGenericUdf just looks for the 1st argument's value in the 
> others and returns the index. I don't think this is related to the actual 
> GenericUDF function.
> Note that:
> "select myGenericUdf(1,2,1) from mytable;" succeeds
> If I use the non-generic implementation of the same UDF, the select distinct 
> call succeeds.
> StackTrace:
> {code}
> 15/10/06 05:20:25 ERROR exec.Utilities: Failed to load plan: 
> hdfs://quickstart.cloudera:8020/tmp/hive/hive/f9de3f09-c12d-4528-9ee6-1f12932a14ae/hive_2015-10-06_05-20-07_438_6519207588897968406-20/-mr-10003/27cd7226-3e22-46f4-bddd-fb8fd4aa4b8d/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> 

[jira] [Updated] (HIVE-12184) DESCRIBE of fully qualified table fails when db and table name match and non-default database is in use

2015-11-09 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12184:
---
Hadoop Flags: Incompatible change

> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use
> ---
>
> Key: HIVE-12184
> URL: https://issues.apache.org/jira/browse/HIVE-12184
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1
>Reporter: Lenni Kuff
>Assignee: Naveen Gangam
> Attachments: HIVE-12184.2.patch, HIVE-12184.3.patch, 
> HIVE-12184.4.patch, HIVE-12184.5.patch, HIVE-12184.patch
>
>
> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use.
> Repro:
> {code}
> : jdbc:hive2://localhost:1/default> create database foo;
> No rows affected (0.116 seconds)
> 0: jdbc:hive2://localhost:1/default> create table foo.foo(i int);
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> +---++--+--+
> | col_name  | data_type  | comment  |
> +---++--+--+
> | i | int|  |
> +---++--+--+
> 1 row selected (0.049 seconds)
> 0: jdbc:hive2://localhost:1/default> use foo;
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error in getting fields from 
> serde.Invalid Field foo (state=08S01,code=1)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12184) DESCRIBE of fully qualified table fails when db and table name match and non-default database is in use

2015-11-09 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996613#comment-14996613
 ] 

Xuefu Zhang commented on HIVE-12184:


Could you please provide a RB for this? Thanks.


> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use
> ---
>
> Key: HIVE-12184
> URL: https://issues.apache.org/jira/browse/HIVE-12184
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1
>Reporter: Lenni Kuff
>Assignee: Naveen Gangam
> Attachments: HIVE-12184.2.patch, HIVE-12184.3.patch, 
> HIVE-12184.4.patch, HIVE-12184.5.patch, HIVE-12184.patch
>
>
> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use.
> Repro:
> {code}
> : jdbc:hive2://localhost:1/default> create database foo;
> No rows affected (0.116 seconds)
> 0: jdbc:hive2://localhost:1/default> create table foo.foo(i int);
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> +---++--+--+
> | col_name  | data_type  | comment  |
> +---++--+--+
> | i | int|  |
> +---++--+--+
> 1 row selected (0.049 seconds)
> 0: jdbc:hive2://localhost:1/default> use foo;
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error in getting fields from 
> serde.Invalid Field foo (state=08S01,code=1)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12330) Fix precommit Spark test part2

2015-11-10 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999516#comment-14999516
 ] 

Xuefu Zhang commented on HIVE-12330:


Yes, it's also failing on my local box too, but it passes in master. I guess 
this could be due to the fact that spark branch doesn't have all changes in the 
master. I can do another merge to resolve the issue. In the meantime, if those 
are the only failures, I think you patch is good.

> Fix precommit Spark test part2
> --
>
> Key: HIVE-12330
> URL: https://issues.apache.org/jira/browse/HIVE-12330
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Szehon Ho
>Assignee: Sergio Peña
> Attachments: HIVE-12229.3-spark.patch, HIVE-12330.4-spark.patch, 
> HIVE-12330.5-spark.patch
>
>
> Regression because of HIVE-11489



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12355) Keep Obj Inspectors in Sync with RowSchema

2015-11-09 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997319#comment-14997319
 ] 

Xuefu Zhang commented on HIVE-12355:


Removed the fixed version and it can be filled once this gets resolved.

> Keep Obj Inspectors in Sync with RowSchema
> --
>
> Key: HIVE-12355
> URL: https://issues.apache.org/jira/browse/HIVE-12355
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.0, 1.1.0, 1.2.1
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
>
> Currently Not all operators match their Output Obj inspectors to Row schema.
> Many times OutputObjectInspectors may be more than needed.
> This causes problems especially with union.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12355) Keep Obj Inspectors in Sync with RowSchema

2015-11-09 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12355:
---
Fix Version/s: (was: 2.0.0)

> Keep Obj Inspectors in Sync with RowSchema
> --
>
> Key: HIVE-12355
> URL: https://issues.apache.org/jira/browse/HIVE-12355
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.0, 1.1.0, 1.2.1
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
>
> Currently Not all operators match their Output Obj inspectors to Row schema.
> Many times OutputObjectInspectors may be more than needed.
> This causes problems especially with union.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12390) Merge master to Spark branch 11/11/2015 [Spark Branch]

2015-11-11 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12390:
---
Attachment: HIVE-12390.1-spark.patch

> Merge master to Spark branch 11/11/2015 [Spark Branch]
> --
>
> Key: HIVE-12390
> URL: https://issues.apache.org/jira/browse/HIVE-12390
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12390.1-spark.patch
>
>
> To fix some test failures such as those for Llap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12390) Merge master to Spark branch 11/11/2015 [Spark Branch]

2015-11-11 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001662#comment-15001662
 ] 

Xuefu Zhang commented on HIVE-12390:


Minor conflicts, about slf log imports. I committed, but will attach a dummy 
patch to verify the tests are okay. If not, I will address them with a separate 
patch.

> Merge master to Spark branch 11/11/2015 [Spark Branch]
> --
>
> Key: HIVE-12390
> URL: https://issues.apache.org/jira/browse/HIVE-12390
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> To fix some test failures such as those for Llap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11120) Generic interface for file format validation

2015-11-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002798#comment-15002798
 ] 

Xuefu Zhang commented on HIVE-11120:


Patch looks good. One comment on RB.

> Generic interface for file format validation
> 
>
> Key: HIVE-11120
> URL: https://issues.apache.org/jira/browse/HIVE-11120
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11120.2.patch, HIVE-11120.patch
>
>
> https://issues.apache.org/jira/browse/HIVE-8?focusedCommentId=14602302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14602302
> We need generic interfaces for verify if a specified file is of valid format 
> so that load data statement can make some sanity check before copying the 
> file to destination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11120) Generic interface for file format validation

2015-11-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003104#comment-15003104
 ] 

Xuefu Zhang commented on HIVE-11120:


+1

> Generic interface for file format validation
> 
>
> Key: HIVE-11120
> URL: https://issues.apache.org/jira/browse/HIVE-11120
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-11120.2.patch, HIVE-11120.3.patch, 
> HIVE-11120.4.patch, HIVE-11120.patch
>
>
> https://issues.apache.org/jira/browse/HIVE-8?focusedCommentId=14602302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14602302
> We need generic interfaces for verify if a specified file is of valid format 
> so that load data statement can make some sanity check before copying the 
> file to destination.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12330) Fix precommit Spark test part2

2015-11-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003158#comment-15003158
 ] 

Xuefu Zhang commented on HIVE-12330:


spark-branch

> Fix precommit Spark test part2
> --
>
> Key: HIVE-12330
> URL: https://issues.apache.org/jira/browse/HIVE-12330
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Szehon Ho
>Assignee: Sergio Peña
> Fix For: 2.0.0
>
> Attachments: HIVE-12229.3-spark.patch, HIVE-12330.4-spark.patch, 
> HIVE-12330.5-spark.patch, HIVE-12330.6-spark.patch, HIVE-12330.7-spark.patch
>
>
> Regression because of HIVE-11489



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12045) ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)

2015-11-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003375#comment-15003375
 ] 

Xuefu Zhang commented on HIVE-12045:


Hi Rui, For more info about the test run failures, please take a look at 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/996/artifact/.
 Another thing to try is to run the test locally to see if the failure can be 
reproduced. I guess the tests have been run with yarn-client since, so 
switching to yarn-cluster can cause some headache. However, it seems a good 
thing to do. Thanks.

> ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)
> --
>
> Key: HIVE-12045
> URL: https://issues.apache.org/jira/browse/HIVE-12045
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
> Environment: Cloudera QuickStart VM - CDH5.4.2
> beeline
>Reporter: Zsolt Tóth
>Assignee: Rui Li
> Attachments: HIVE-12045.1-spark.patch, HIVE-12045.2-spark.patch, 
> example.jar, genUDF.patch
>
>
> If I execute the following query in beeline, I get ClassNotFoundException for 
> the UDF class.
> {code}
> drop function myGenericUdf;
> create function myGenericUdf as 'org.example.myGenericUdf' using jar 
> 'hdfs:///tmp/myudf.jar';
> select distinct myGenericUdf(1,2,1) from mytable;
> {code}
> In my example, myGenericUdf just looks for the 1st argument's value in the 
> others and returns the index. I don't think this is related to the actual 
> GenericUDF function.
> Note that:
> "select myGenericUdf(1,2,1) from mytable;" succeeds
> If I use the non-generic implementation of the same UDF, the select distinct 
> call succeeds.
> StackTrace:
> {code}
> 15/10/06 05:20:25 ERROR exec.Utilities: Failed to load plan: 
> hdfs://quickstart.cloudera:8020/tmp/hive/hive/f9de3f09-c12d-4528-9ee6-1f12932a14ae/hive_2015-10-06_05-20-07_438_6519207588897968406-20/-mr-10003/27cd7226-3e22-46f4-bddd-fb8fd4aa4b8d/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> 

[jira] [Commented] (HIVE-12330) Fix precommit Spark test part2

2015-11-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003097#comment-15003097
 ] 

Xuefu Zhang commented on HIVE-12330:


Yeah. It's a little unpredictable as well. Let's commit this and investigate 
those failures in a separate JIRA. Thanks.

> Fix precommit Spark test part2
> --
>
> Key: HIVE-12330
> URL: https://issues.apache.org/jira/browse/HIVE-12330
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Szehon Ho
>Assignee: Sergio Peña
> Attachments: HIVE-12229.3-spark.patch, HIVE-12330.4-spark.patch, 
> HIVE-12330.5-spark.patch, HIVE-12330.6-spark.patch, HIVE-12330.7-spark.patch
>
>
> Regression because of HIVE-11489



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12330) Fix precommit Spark test part2

2015-11-12 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12330:
---
Fix Version/s: (was: 2.0.0)
   spark-branch

> Fix precommit Spark test part2
> --
>
> Key: HIVE-12330
> URL: https://issues.apache.org/jira/browse/HIVE-12330
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Szehon Ho
>Assignee: Sergio Peña
> Fix For: spark-branch
>
> Attachments: HIVE-12229.3-spark.patch, HIVE-12330.4-spark.patch, 
> HIVE-12330.5-spark.patch, HIVE-12330.6-spark.patch, HIVE-12330.7-spark.patch
>
>
> Regression because of HIVE-11489



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12045) ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)

2015-11-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003412#comment-15003412
 ] 

Xuefu Zhang commented on HIVE-12045:


I just realized that didn't give any useful info either. Hive.log is supposed 
to be here: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-996/failed/TestMiniSparkOnYarnCliDriver/.
 Not sure why it's missing. For other test, it's there. For instance: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-996/failed/TestCliDriver-show_conf.q-nonblock_op_deduplicate.q-avro_joins.q-and-12-more/.

> ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)
> --
>
> Key: HIVE-12045
> URL: https://issues.apache.org/jira/browse/HIVE-12045
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
> Environment: Cloudera QuickStart VM - CDH5.4.2
> beeline
>Reporter: Zsolt Tóth
>Assignee: Rui Li
> Attachments: HIVE-12045.1-spark.patch, HIVE-12045.2-spark.patch, 
> example.jar, genUDF.patch
>
>
> If I execute the following query in beeline, I get ClassNotFoundException for 
> the UDF class.
> {code}
> drop function myGenericUdf;
> create function myGenericUdf as 'org.example.myGenericUdf' using jar 
> 'hdfs:///tmp/myudf.jar';
> select distinct myGenericUdf(1,2,1) from mytable;
> {code}
> In my example, myGenericUdf just looks for the 1st argument's value in the 
> others and returns the index. I don't think this is related to the actual 
> GenericUDF function.
> Note that:
> "select myGenericUdf(1,2,1) from mytable;" succeeds
> If I use the non-generic implementation of the same UDF, the select distinct 
> call succeeds.
> StackTrace:
> {code}
> 15/10/06 05:20:25 ERROR exec.Utilities: Failed to load plan: 
> hdfs://quickstart.cloudera:8020/tmp/hive/hive/f9de3f09-c12d-4528-9ee6-1f12932a14ae/hive_2015-10-06_05-20-07_438_6519207588897968406-20/-mr-10003/27cd7226-3e22-46f4-bddd-fb8fd4aa4b8d/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> 

[jira] [Comment Edited] (HIVE-12045) ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)

2015-11-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003412#comment-15003412
 ] 

Xuefu Zhang edited comment on HIVE-12045 at 11/13/15 2:07 AM:
--

I just realized that didn't give any useful info either. Hive.log is supposed 
to be here: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-996/failed/TestMiniSparkOnYarnCliDriver/.
 Not sure why it's missing. For other test, it's there. For instance: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-996/failed/TestCliDriver-show_conf.q-nonblock_op_deduplicate.q-avro_joins.q-and-12-more/.

[~spena], [~szehon], do you have any idea?


was (Author: xuefuz):
I just realized that didn't give any useful info either. Hive.log is supposed 
to be here: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-996/failed/TestMiniSparkOnYarnCliDriver/.
 Not sure why it's missing. For other test, it's there. For instance: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-996/failed/TestCliDriver-show_conf.q-nonblock_op_deduplicate.q-avro_joins.q-and-12-more/.

> ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)
> --
>
> Key: HIVE-12045
> URL: https://issues.apache.org/jira/browse/HIVE-12045
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
> Environment: Cloudera QuickStart VM - CDH5.4.2
> beeline
>Reporter: Zsolt Tóth
>Assignee: Rui Li
> Attachments: HIVE-12045.1-spark.patch, HIVE-12045.2-spark.patch, 
> example.jar, genUDF.patch
>
>
> If I execute the following query in beeline, I get ClassNotFoundException for 
> the UDF class.
> {code}
> drop function myGenericUdf;
> create function myGenericUdf as 'org.example.myGenericUdf' using jar 
> 'hdfs:///tmp/myudf.jar';
> select distinct myGenericUdf(1,2,1) from mytable;
> {code}
> In my example, myGenericUdf just looks for the 1st argument's value in the 
> others and returns the index. I don't think this is related to the actual 
> GenericUDF function.
> Note that:
> "select myGenericUdf(1,2,1) from mytable;" succeeds
> If I use the non-generic implementation of the same UDF, the select distinct 
> call succeeds.
> StackTrace:
> {code}
> 15/10/06 05:20:25 ERROR exec.Utilities: Failed to load plan: 
> hdfs://quickstart.cloudera:8020/tmp/hive/hive/f9de3f09-c12d-4528-9ee6-1f12932a14ae/hive_2015-10-06_05-20-07_438_6519207588897968406-20/-mr-10003/27cd7226-3e22-46f4-bddd-fb8fd4aa4b8d/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> 

[jira] [Commented] (HIVE-12045) ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)

2015-11-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005747#comment-15005747
 ] 

Xuefu Zhang commented on HIVE-12045:


[~lirui], the log problem seems being caused by HIVE-11304. However, even if I 
revert that, hive.log doesn't give any useful information. For your reference, 
I attached hive.log when I ran "mvn test -Dtest=TestMiniSparkOnYarnCliDriver 
-Dqfile=orc_merge1.q". Please check if there is anything suspicious. Thanks.


> ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)
> --
>
> Key: HIVE-12045
> URL: https://issues.apache.org/jira/browse/HIVE-12045
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
> Environment: Cloudera QuickStart VM - CDH5.4.2
> beeline
>Reporter: Zsolt Tóth
>Assignee: Rui Li
> Attachments: HIVE-12045.1-spark.patch, HIVE-12045.2-spark.patch, 
> example.jar, genUDF.patch
>
>
> If I execute the following query in beeline, I get ClassNotFoundException for 
> the UDF class.
> {code}
> drop function myGenericUdf;
> create function myGenericUdf as 'org.example.myGenericUdf' using jar 
> 'hdfs:///tmp/myudf.jar';
> select distinct myGenericUdf(1,2,1) from mytable;
> {code}
> In my example, myGenericUdf just looks for the 1st argument's value in the 
> others and returns the index. I don't think this is related to the actual 
> GenericUDF function.
> Note that:
> "select myGenericUdf(1,2,1) from mytable;" succeeds
> If I use the non-generic implementation of the same UDF, the select distinct 
> call succeeds.
> StackTrace:
> {code}
> 15/10/06 05:20:25 ERROR exec.Utilities: Failed to load plan: 
> hdfs://quickstart.cloudera:8020/tmp/hive/hive/f9de3f09-c12d-4528-9ee6-1f12932a14ae/hive_2015-10-06_05-20-07_438_6519207588897968406-20/-mr-10003/27cd7226-3e22-46f4-bddd-fb8fd4aa4b8d/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> 

[jira] [Updated] (HIVE-12045) ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)

2015-11-14 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12045:
---
Attachment: hive.log.gz

> ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)
> --
>
> Key: HIVE-12045
> URL: https://issues.apache.org/jira/browse/HIVE-12045
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
> Environment: Cloudera QuickStart VM - CDH5.4.2
> beeline
>Reporter: Zsolt Tóth
>Assignee: Rui Li
> Attachments: HIVE-12045.1-spark.patch, HIVE-12045.2-spark.patch, 
> example.jar, genUDF.patch, hive.log.gz
>
>
> If I execute the following query in beeline, I get ClassNotFoundException for 
> the UDF class.
> {code}
> drop function myGenericUdf;
> create function myGenericUdf as 'org.example.myGenericUdf' using jar 
> 'hdfs:///tmp/myudf.jar';
> select distinct myGenericUdf(1,2,1) from mytable;
> {code}
> In my example, myGenericUdf just looks for the 1st argument's value in the 
> others and returns the index. I don't think this is related to the actual 
> GenericUDF function.
> Note that:
> "select myGenericUdf(1,2,1) from mytable;" succeeds
> If I use the non-generic implementation of the same UDF, the select distinct 
> call succeeds.
> StackTrace:
> {code}
> 15/10/06 05:20:25 ERROR exec.Utilities: Failed to load plan: 
> hdfs://quickstart.cloudera:8020/tmp/hive/hive/f9de3f09-c12d-4528-9ee6-1f12932a14ae/hive_2015-10-06_05-20-07_438_6519207588897968406-20/-mr-10003/27cd7226-3e22-46f4-bddd-fb8fd4aa4b8d/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at 
> 

[jira] [Commented] (HIVE-11304) Migrate to Log4j2 from Log4j 1.x

2015-11-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006789#comment-15006789
 ] 

Xuefu Zhang commented on HIVE-11304:


Thanks for looking at it. I will retry and post my new findings.

> Migrate to Log4j2 from Log4j 1.x
> 
>
> Key: HIVE-11304
> URL: https://issues.apache.org/jira/browse/HIVE-11304
> Project: Hive
>  Issue Type: Improvement
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>  Labels: TODOC2.0, incompatibleChange
> Fix For: 2.0.0
>
> Attachments: HIVE-11304.10.patch, HIVE-11304.11.patch, 
> HIVE-11304.2.patch, HIVE-11304.3.patch, HIVE-11304.4.patch, 
> HIVE-11304.5.patch, HIVE-11304.6.patch, HIVE-11304.7.patch, 
> HIVE-11304.8.patch, HIVE-11304.9.patch, HIVE-11304.patch
>
>
> Log4J2 has some great benefits and can benefit hive significantly. Some 
> notable features include
> 1) Performance (parametrized logging, performance when logging is disabled 
> etc.) More details can be found here 
> https://logging.apache.org/log4j/2.x/performance.html
> 2) RoutingAppender - Route logs to different log files based on MDC context 
> (useful for HS2, LLAP etc.)
> 3) Asynchronous logging
> This is an umbrella jira to track changes related to Log4j2 migration.
> Log4J1 EOL - 
> https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12045) ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)

2015-11-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007630#comment-15007630
 ] 

Xuefu Zhang commented on HIVE-12045:


[~lirui], it seems that hive.log is generated using master. Could you migrate 
your work on master instead? Spark branch seems having some test-related 
issues. Thanks.

> ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)
> --
>
> Key: HIVE-12045
> URL: https://issues.apache.org/jira/browse/HIVE-12045
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
> Environment: Cloudera QuickStart VM - CDH5.4.2
> beeline
>Reporter: Zsolt Tóth
>Assignee: Rui Li
> Attachments: HIVE-12045.1-spark.patch, HIVE-12045.2-spark.patch, 
> example.jar, genUDF.patch, hive.log.gz
>
>
> If I execute the following query in beeline, I get ClassNotFoundException for 
> the UDF class.
> {code}
> drop function myGenericUdf;
> create function myGenericUdf as 'org.example.myGenericUdf' using jar 
> 'hdfs:///tmp/myudf.jar';
> select distinct myGenericUdf(1,2,1) from mytable;
> {code}
> In my example, myGenericUdf just looks for the 1st argument's value in the 
> others and returns the index. I don't think this is related to the actual 
> GenericUDF function.
> Note that:
> "select myGenericUdf(1,2,1) from mytable;" succeeds
> If I use the non-generic implementation of the same UDF, the select distinct 
> call succeeds.
> StackTrace:
> {code}
> 15/10/06 05:20:25 ERROR exec.Utilities: Failed to load plan: 
> hdfs://quickstart.cloudera:8020/tmp/hive/hive/f9de3f09-c12d-4528-9ee6-1f12932a14ae/hive_2015-10-06_05-20-07_438_6519207588897968406-20/-mr-10003/27cd7226-3e22-46f4-bddd-fb8fd4aa4b8d/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> 

[jira] [Commented] (HIVE-11304) Migrate to Log4j2 from Log4j 1.x

2015-11-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007636#comment-15007636
 ] 

Xuefu Zhang commented on HIVE-11304:


I retried and confirm what you observed. I don't know why I didn't get it 
first. Thanks.

> Migrate to Log4j2 from Log4j 1.x
> 
>
> Key: HIVE-11304
> URL: https://issues.apache.org/jira/browse/HIVE-11304
> Project: Hive
>  Issue Type: Improvement
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>  Labels: TODOC2.0, incompatibleChange
> Fix For: 2.0.0
>
> Attachments: HIVE-11304.10.patch, HIVE-11304.11.patch, 
> HIVE-11304.2.patch, HIVE-11304.3.patch, HIVE-11304.4.patch, 
> HIVE-11304.5.patch, HIVE-11304.6.patch, HIVE-11304.7.patch, 
> HIVE-11304.8.patch, HIVE-11304.9.patch, HIVE-11304.patch
>
>
> Log4J2 has some great benefits and can benefit hive significantly. Some 
> notable features include
> 1) Performance (parametrized logging, performance when logging is disabled 
> etc.) More details can be found here 
> https://logging.apache.org/log4j/2.x/performance.html
> 2) RoutingAppender - Route logs to different log files based on MDC context 
> (useful for HS2, LLAP etc.)
> 3) Asynchronous logging
> This is an umbrella jira to track changes related to Log4j2 migration.
> Log4J1 EOL - 
> https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-11304) Migrate to Log4j2 from Log4j 1.x

2015-11-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007636#comment-15007636
 ] 

Xuefu Zhang edited comment on HIVE-11304 at 11/16/15 11:40 PM:
---

I retried and confirmed what you observed. I don't know why I didn't get it the 
first time. Thanks.


was (Author: xuefuz):
I retried and confirm what you observed. I don't know why I didn't get it 
first. Thanks.

> Migrate to Log4j2 from Log4j 1.x
> 
>
> Key: HIVE-11304
> URL: https://issues.apache.org/jira/browse/HIVE-11304
> Project: Hive
>  Issue Type: Improvement
>  Components: Logging
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>  Labels: TODOC2.0, incompatibleChange
> Fix For: 2.0.0
>
> Attachments: HIVE-11304.10.patch, HIVE-11304.11.patch, 
> HIVE-11304.2.patch, HIVE-11304.3.patch, HIVE-11304.4.patch, 
> HIVE-11304.5.patch, HIVE-11304.6.patch, HIVE-11304.7.patch, 
> HIVE-11304.8.patch, HIVE-11304.9.patch, HIVE-11304.patch
>
>
> Log4J2 has some great benefits and can benefit hive significantly. Some 
> notable features include
> 1) Performance (parametrized logging, performance when logging is disabled 
> etc.) More details can be found here 
> https://logging.apache.org/log4j/2.x/performance.html
> 2) RoutingAppender - Route logs to different log files based on MDC context 
> (useful for HS2, LLAP etc.)
> 3) Asynchronous logging
> This is an umbrella jira to track changes related to Log4j2 migration.
> Log4J1 EOL - 
> https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12390) Merge master to Spark branch 11/11/2015 [Spark Branch]

2015-11-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002487#comment-15002487
 ] 

Xuefu Zhang commented on HIVE-12390:


cbo_rp_annotate_stats_groupby.q also fails on master. Cannot reproduce 
groupby3_map_multi_distinct.q and explainuser_3.q. Others seems unrelated. 
Thus, I going to close this merge JIRA and will deal with the test failures in 
separate JIRAs if any.

> Merge master to Spark branch 11/11/2015 [Spark Branch]
> --
>
> Key: HIVE-12390
> URL: https://issues.apache.org/jira/browse/HIVE-12390
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12390.1-spark.patch
>
>
> To fix some test failures such as those for Llap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12330) Fix precommit Spark test part2

2015-11-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002667#comment-15002667
 ] 

Xuefu Zhang commented on HIVE-12330:


[~spena], thanks for working on this. I'll leave the code review to Szehon, but 
I have two questions: 1. To confirm, these changes are applicable to master, 
right? That will happen the next time we merge spark branch to master. 2. Test 
run #998, which ran against latest spark branch w/ a dummy patch, seems having 
less failures than this one. I'd just to make sure that the env is indeed okay. 
Could you double check? We can reattach the patch to have another run.

> Fix precommit Spark test part2
> --
>
> Key: HIVE-12330
> URL: https://issues.apache.org/jira/browse/HIVE-12330
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Szehon Ho
>Assignee: Sergio Peña
> Attachments: HIVE-12229.3-spark.patch, HIVE-12330.4-spark.patch, 
> HIVE-12330.5-spark.patch, HIVE-12330.6-spark.patch
>
>
> Regression because of HIVE-11489



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12433) Merge master into spark 11/17/2015 [Spark Branch]

2015-11-17 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12433:
---
Summary: Merge master into spark 11/17/2015 [Spark Branch]  (was: Merge 
trunk into spark 11/17/2015 [Spark Branch])

> Merge master into spark 11/17/2015 [Spark Branch]
> -
>
> Key: HIVE-12433
> URL: https://issues.apache.org/jira/browse/HIVE-12433
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Xuefu Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-12433) Merge master into spark 11/17/2015 [Spark Branch]

2015-11-17 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-12433.

   Resolution: Fixed
Fix Version/s: spark-branch

Clean merge. Pushed to Spark branch.

> Merge master into spark 11/17/2015 [Spark Branch]
> -
>
> Key: HIVE-12433
> URL: https://issues.apache.org/jira/browse/HIVE-12433
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Xuefu Zhang
> Fix For: spark-branch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12433) Merge trunk into spark 11/17/2015 [Spark Branch]

2015-11-17 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12433:
---
Fix Version/s: (was: 1.1.0)

> Merge trunk into spark 11/17/2015 [Spark Branch]
> 
>
> Key: HIVE-12433
> URL: https://issues.apache.org/jira/browse/HIVE-12433
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Xuefu Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12433) Merge trunk into spark 11/17/2015 [Spark Branch]

2015-11-17 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-12433:
--

Assignee: Xuefu Zhang  (was: Brock Noland)

> Merge trunk into spark 11/17/2015 [Spark Branch]
> 
>
> Key: HIVE-12433
> URL: https://issues.apache.org/jira/browse/HIVE-12433
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Xuefu Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12434) Merge spark into master 11/17/1015

2015-11-17 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12434:
---
Attachment: HIVE-12434.patch

> Merge spark into master 11/17/1015
> --
>
> Key: HIVE-12434
> URL: https://issues.apache.org/jira/browse/HIVE-12434
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Affects Versions: 2.0.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12434.patch
>
>
> There are still a few patches that are in Spark branch only. We need to merge 
> them to master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12370) Hive Query got failure with larger scale data set with enablng sampling order optimization

2015-11-10 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998612#comment-14998612
 ] 

Xuefu Zhang commented on HIVE-12370:


Could you try your case with other data formats, such as text, sequence file, 
or parquet?

> Hive Query got failure with larger scale data set with enablng sampling order 
> optimization
> --
>
> Key: HIVE-12370
> URL: https://issues.apache.org/jira/browse/HIVE-12370
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Yi Zhou
>
> Found that hive would get failure on Hive on MR with larger scale 
> data(e.g.,3TB/10TB) when enabling sampling optimization(it got passed with 
> 1GB data set).
> hive.optimize.sampling.orderby=true
> hive.optimize.sampling.orderby.number=2
> hive.optimize.sampling.orderby.percent=0.1
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
> ... 8 more
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot be cast to 
> org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:121)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
> ... 9 more
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-02 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985244#comment-14985244
 ] 

Xuefu Zhang commented on HIVE-12229:


Hi [~szehon]/[~spena], could you please take a look to see if there is some 
problem with the env? At one point, it went away, but now it seems it has 
resurfaced. Thanks.

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
> ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12215) Exchange partition does not show outputs field for post/pre execute hooks

2015-10-30 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983264#comment-14983264
 ] 

Xuefu Zhang commented on HIVE-12215:


+1 Patch looks good. There is a trailing space/tab, removal of which would be 
nice.

> Exchange partition does not show outputs field for post/pre execute hooks
> -
>
> Key: HIVE-12215
> URL: https://issues.apache.org/jira/browse/HIVE-12215
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-12215.2.patch, HIVE-12215.patch
>
>
> The pre/post execute hook interface has fields that indicate which Hive 
> objects were read / written to as a result of running the query. For the 
> exchange partition operation, these fields (ReadEntity and WriteEntity) are 
> empty. 
> This is an important issue as the hook interface may be configured to perform 
> critical warehouse operations.
> See
> {noformat}
> ql/src/test/results/clientpositive/exchange_partition3.q.out
> {noformat}
> {noformat}
> PREHOOK: query: -- This will exchange both partitions hr=1 and hr=2
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2
> PREHOOK: type: ALTERTABLE_EXCHANGEPARTITION
> POSTHOOK: query: -- This will exchange both partitions hr=1 and hr=2
> ALTER TABLE exchange_part_test1 EXCHANGE PARTITION (ds='2013-04-05') WITH 
> TABLE exchange_part_test2
> POSTHOOK: type: ALTERTABLE_EXCHANGEPARTITION
> {noformat}
> Seems it should also print output fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-04 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990788#comment-14990788
 ] 

Xuefu Zhang commented on HIVE-12229:


+1 for the patch. I have a followup question though. Based on your previous 
comment, what happens if spark.files.overwrite is false? Will user see an 
error? User should be able to set this property to true at Hive (say, beeline), 
right?

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
> ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA

[jira] [Updated] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column

2015-11-03 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12063:
---
Attachment: HIVE-12063.3.patch

Rebased the patch with the latest master.

> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: HIVE-12063
> URL: https://issues.apache.org/jira/browse/HIVE-12063
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12063.1.patch, HIVE-12063.2.patch, 
> HIVE-12063.3.patch, HIVE-12063.patch
>
>
> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
> which caused many problems including treating 0.0, 0.00 and so on as 0, which 
> has different precision/scale. Please refer to HIVE-7373 description. 
> However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems 
> remained. HIVE-11835 was resolved recently to address one of the problems, 
> where 0.0, 0.00, and so on cannot be read into decimal(1,1).
> However, HIVE-11835 didn't address the problem of showing as 0 in query 
> result for any decimal values such as 0.0, 0.00, etc. This causes confusion 
> as 0 and 0.0 have different precision/scale than 0.
> The proposal here is to pad zeros for query result to the type's scale. This 
> not only removes the confusion described above, but also aligns with many 
> other DBs. Internal decimal number representation doesn't change, however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-04 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991067#comment-14991067
 ] 

Xuefu Zhang commented on HIVE-12229:


Then the real question is: when the new RSC comes up, should we make all jars 
that have been added so far in the user session available to the new executors, 
including new jars that overwriting the previous ones?

It seems to be a usability issue if a user adds a jar, and then change some 
configuration (possibly irrelevant to add jar) that ends up with a new RSC, 
which results the new executors having no idea of the added jar?

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> 

[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-04 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991017#comment-14991017
 ] 

Xuefu Zhang commented on HIVE-12229:


When the new executors are launched, shouldn't the added jars, which is 
available in the context be added to the classpath? On Hive side, we should be 
able to update the context in a deterministic manner, right?

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
> ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-04 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991155#comment-14991155
 ] 

Xuefu Zhang commented on HIVE-12229:


Okay. Thanks for all the explanation. I guess the same problem can happen to MR 
as well. That problem can be handled in a separate JIRA if ever.

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch, HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
> ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12045) ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)

2015-11-06 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12045:
---
Attachment: genUDF.patch

Hi [~ruili], the attached genUDF.patch contains an example generic udf that can 
be used for testing in a similar way as we tested for non-generic udf. Please 
see if you can put it in any use. Thanks.

> ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)
> --
>
> Key: HIVE-12045
> URL: https://issues.apache.org/jira/browse/HIVE-12045
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
> Environment: Cloudera QuickStart VM - CDH5.4.2
> beeline
>Reporter: Zsolt Tóth
>Assignee: Rui Li
> Attachments: HIVE-12045.1-spark.patch, example.jar, genUDF.patch
>
>
> If I execute the following query in beeline, I get ClassNotFoundException for 
> the UDF class.
> {code}
> drop function myGenericUdf;
> create function myGenericUdf as 'org.example.myGenericUdf' using jar 
> 'hdfs:///tmp/myudf.jar';
> select distinct myGenericUdf(1,2,1) from mytable;
> {code}
> In my example, myGenericUdf just looks for the 1st argument's value in the 
> others and returns the index. I don't think this is related to the actual 
> GenericUDF function.
> Note that:
> "select myGenericUdf(1,2,1) from mytable;" succeeds
> If I use the non-generic implementation of the same UDF, the select distinct 
> call succeeds.
> StackTrace:
> {code}
> 15/10/06 05:20:25 ERROR exec.Utilities: Failed to load plan: 
> hdfs://quickstart.cloudera:8020/tmp/hive/hive/f9de3f09-c12d-4528-9ee6-1f12932a14ae/hive_2015-10-06_05-20-07_438_6519207588897968406-20/-mr-10003/27cd7226-3e22-46f4-bddd-fb8fd4aa4b8d/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   

[jira] [Commented] (HIVE-12365) Added resource path is sent to cluster as an empty string when externally removed

2015-11-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995497#comment-14995497
 ] 

Xuefu Zhang commented on HIVE-12365:


+1

> Added resource path is sent to cluster as an empty string when externally 
> removed
> -
>
> Key: HIVE-12365
> URL: https://issues.apache.org/jira/browse/HIVE-12365
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-12365.patch, HIVE-12365.patch
>
>
> Sometimes the resources (e.g. jar) added via command like "add jars 
> " are removed externally from their filepath for some reasons. 
> Their paths are sent to cluster as empty strings which causes the failures to 
> the query that even do not need these jars in execution. The error look like 
> as following:
> {code}
> 15/11/06 21:56:44 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> file:/tmp/hadoop-ctang/mapred/staging/ctang734817191/.staging/job_local734817191_0003
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>   at org.apache.hadoop.fs.Path.(Path.java:135)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:215)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12206) ClassNotFound Exception during query compilation with Tez and Union query and GenericUDFs

2015-11-03 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988823#comment-14988823
 ] 

Xuefu Zhang commented on HIVE-12206:


It seems that the added q test case is failing, at least producible on my 
machine. 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5908/testReport

> ClassNotFound Exception during query compilation with Tez and Union query and 
> GenericUDFs
> -
>
> Key: HIVE-12206
> URL: https://issues.apache.org/jira/browse/HIVE-12206
> Project: Hive
>  Issue Type: Bug
>  Components: Tez, UDF
>Reporter: Jason Dere
>Assignee: Jason Dere
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12206.1.patch, HIVE-12206.2.patch
>
>
> {noformat}
> -- union query without UDF
> explain
> select * from (select key + key from src limit 1) a
> union all
> select * from (select key + key from src limit 1) b;
> add jar /tmp/udf-2.2.0-snapshot.jar;
> create temporary function myudf as 
> 'com.aginity.amp.hive.udf.UniqueNumberGenerator';
> -- Now try the query with the UDF
> explain
> select myudf()from (select key from src limit 1) a
> union all
> select myudf() from (select key from src limit 1) a;
> {noformat}
> Got error:
> {noformat}
> 2015-10-16 17:00:55,557 ERROR ql.Driver (SessionState.java:printError(963)) - 
> FAILED: KryoException Unable to find class: 
> com.aginity.amp.hive.udf.UniqueNumberGenerator
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
> parentOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.LimitOperator)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: com.aginity.amp.hive.udf.UniqueNumberGenerator
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
> parentOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.LimitOperator)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
> at 
> 

[jira] [Comment Edited] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column

2015-11-03 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988825#comment-14988825
 ] 

Xuefu Zhang edited comment on HIVE-12063 at 11/4/15 3:38 AM:
-

The test failures are unrelated to this patch. Specifically, the union test 
case failure is reproducible w/o the patch, possibly caused by HIVE-12206.


was (Author: xuefuz):
The test failures are unrelated to this patch. Specifically, the union test 
case failure is reproducible w/o the patch.

> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: HIVE-12063
> URL: https://issues.apache.org/jira/browse/HIVE-12063
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12063.1.patch, HIVE-12063.2.patch, 
> HIVE-12063.3.patch, HIVE-12063.patch
>
>
> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
> which caused many problems including treating 0.0, 0.00 and so on as 0, which 
> has different precision/scale. Please refer to HIVE-7373 description. 
> However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems 
> remained. HIVE-11835 was resolved recently to address one of the problems, 
> where 0.0, 0.00, and so on cannot be read into decimal(1,1).
> However, HIVE-11835 didn't address the problem of showing as 0 in query 
> result for any decimal values such as 0.0, 0.00, etc. This causes confusion 
> as 0 and 0.0 have different precision/scale than 0.
> The proposal here is to pad zeros for query result to the type's scale. This 
> not only removes the confusion described above, but also aligns with many 
> other DBs. Internal decimal number representation doesn't change, however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column

2015-11-03 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988825#comment-14988825
 ] 

Xuefu Zhang commented on HIVE-12063:


The test failures are unrelated to this patch. Specifically, the union test 
case failure is reproducible w/o the patch.

> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: HIVE-12063
> URL: https://issues.apache.org/jira/browse/HIVE-12063
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12063.1.patch, HIVE-12063.2.patch, 
> HIVE-12063.3.patch, HIVE-12063.patch
>
>
> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
> which caused many problems including treating 0.0, 0.00 and so on as 0, which 
> has different precision/scale. Please refer to HIVE-7373 description. 
> However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems 
> remained. HIVE-11835 was resolved recently to address one of the problems, 
> where 0.0, 0.00, and so on cannot be read into decimal(1,1).
> However, HIVE-11835 didn't address the problem of showing as 0 in query 
> result for any decimal values such as 0.0, 0.00, etc. This causes confusion 
> as 0 and 0.0 have different precision/scale than 0.
> The proposal here is to pad zeros for query result to the type's scale. This 
> not only removes the confusion described above, but also aligns with many 
> other DBs. Internal decimal number representation doesn't change, however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

2015-11-02 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985517#comment-14985517
 ] 

Xuefu Zhang commented on HIVE-12229:


Thanks, [~spena]. Is the following the problem a consequence of that, or else?
{noformat}
TestSparkCliDriver-bucketmapjoin12.q-avro_decimal_native.q-udf_percentile.q-and-12-more
 - did not produce a TEST-*.xml file
{noformat}

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> --
>
> Key: HIVE-12229
> URL: https://issues.apache.org/jira/browse/HIVE-12229
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Lifeng Wang
>Assignee: Rui Li
> Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
> at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
> ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12045) ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)

2015-11-05 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992646#comment-14992646
 ] 

Xuefu Zhang commented on HIVE-12045:


[~ztoth], Finally I got a chance to do some research on this and was able to 
reproduce the problem with example.jar you provided. CDH has HIVE-9882, so that 
doesn't seem to be a solution.

[~lirui], you mentioned some know issues. I'd like to know the kind of issues 
and whether there is a way to address this. It's a little sad that our test 
covered only non-generic UDF and as a result the problem has survived up to 
now. Thanks.

> ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)
> --
>
> Key: HIVE-12045
> URL: https://issues.apache.org/jira/browse/HIVE-12045
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
> Environment: Cloudera QuickStart VM - CDH5.4.2
> beeline
>Reporter: Zsolt Tóth
> Attachments: example.jar
>
>
> If I execute the following query in beeline, I get ClassNotFoundException for 
> the UDF class.
> {code}
> drop function myGenericUdf;
> create function myGenericUdf as 'org.example.myGenericUdf' using jar 
> 'hdfs:///tmp/myudf.jar';
> select distinct myGenericUdf(1,2,1) from mytable;
> {code}
> In my example, myGenericUdf just looks for the 1st argument's value in the 
> others and returns the index. I don't think this is related to the actual 
> GenericUDF function.
> Note that:
> "select myGenericUdf(1,2,1) from mytable;" succeeds
> If I use the non-generic implementation of the same UDF, the select distinct 
> call succeeds.
> StackTrace:
> {code}
> 15/10/06 05:20:25 ERROR exec.Utilities: Failed to load plan: 
> hdfs://quickstart.cloudera:8020/tmp/hive/hive/f9de3f09-c12d-4528-9ee6-1f12932a14ae/hive_2015-10-06_05-20-07_438_6519207588897968406-20/-mr-10003/27cd7226-3e22-46f4-bddd-fb8fd4aa4b8d/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> 

[jira] [Commented] (HIVE-12045) ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)

2015-11-06 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994047#comment-14994047
 ] 

Xuefu Zhang commented on HIVE-12045:


+1. Thanks a lot for the fix.

BTW, it might be good to add a test for generic udf, similar to the non-generic 
one. I will attach a java file for a simple Generic UDF to see if it can be 
included in /contrib.

> ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)
> --
>
> Key: HIVE-12045
> URL: https://issues.apache.org/jira/browse/HIVE-12045
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
> Environment: Cloudera QuickStart VM - CDH5.4.2
> beeline
>Reporter: Zsolt Tóth
>Assignee: Rui Li
> Attachments: HIVE-12045.1-spark.patch, example.jar
>
>
> If I execute the following query in beeline, I get ClassNotFoundException for 
> the UDF class.
> {code}
> drop function myGenericUdf;
> create function myGenericUdf as 'org.example.myGenericUdf' using jar 
> 'hdfs:///tmp/myudf.jar';
> select distinct myGenericUdf(1,2,1) from mytable;
> {code}
> In my example, myGenericUdf just looks for the 1st argument's value in the 
> others and returns the index. I don't think this is related to the actual 
> GenericUDF function.
> Note that:
> "select myGenericUdf(1,2,1) from mytable;" succeeds
> If I use the non-generic implementation of the same UDF, the select distinct 
> call succeeds.
> StackTrace:
> {code}
> 15/10/06 05:20:25 ERROR exec.Utilities: Failed to load plan: 
> hdfs://quickstart.cloudera:8020/tmp/hive/hive/f9de3f09-c12d-4528-9ee6-1f12932a14ae/hive_2015-10-06_05-20-07_438_6519207588897968406-20/-mr-10003/27cd7226-3e22-46f4-bddd-fb8fd4aa4b8d/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   

[jira] [Commented] (HIVE-12184) DESCRIBE of fully qualified table fails when db and table name match and non-default database is in use

2015-11-06 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994498#comment-14994498
 ] 

Xuefu Zhang commented on HIVE-12184:


Actually I'm not sure if we need .$...$ part after the column. Since the 
structure can be nested to artificially any level, going down one layer doesn't 
help much. Plus, it seems making things complicated.

> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use
> ---
>
> Key: HIVE-12184
> URL: https://issues.apache.org/jira/browse/HIVE-12184
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1
>Reporter: Lenni Kuff
>Assignee: Naveen Gangam
> Attachments: HIVE-12184.2.patch, HIVE-12184.3.patch, 
> HIVE-12184.4.patch, HIVE-12184.patch
>
>
> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use.
> Repro:
> {code}
> : jdbc:hive2://localhost:1/default> create database foo;
> No rows affected (0.116 seconds)
> 0: jdbc:hive2://localhost:1/default> create table foo.foo(i int);
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> +---++--+--+
> | col_name  | data_type  | comment  |
> +---++--+--+
> | i | int|  |
> +---++--+--+
> 1 row selected (0.049 seconds)
> 0: jdbc:hive2://localhost:1/default> use foo;
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error in getting fields from 
> serde.Invalid Field foo (state=08S01,code=1)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-12184) DESCRIBE of fully qualified table fails when db and table name match and non-default database is in use

2015-11-06 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12184:
---
Comment: was deleted

(was: typo in the prior comment. it was clientnegative/describe_xpath{1- 4}.q )

> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use
> ---
>
> Key: HIVE-12184
> URL: https://issues.apache.org/jira/browse/HIVE-12184
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1
>Reporter: Lenni Kuff
>Assignee: Naveen Gangam
> Attachments: HIVE-12184.2.patch, HIVE-12184.3.patch, 
> HIVE-12184.4.patch, HIVE-12184.patch
>
>
> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use.
> Repro:
> {code}
> : jdbc:hive2://localhost:1/default> create database foo;
> No rows affected (0.116 seconds)
> 0: jdbc:hive2://localhost:1/default> create table foo.foo(i int);
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> +---++--+--+
> | col_name  | data_type  | comment  |
> +---++--+--+
> | i | int|  |
> +---++--+--+
> 1 row selected (0.049 seconds)
> 0: jdbc:hive2://localhost:1/default> use foo;
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error in getting fields from 
> serde.Invalid Field foo (state=08S01,code=1)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-12184) DESCRIBE of fully qualified table fails when db and table name match and non-default database is in use

2015-11-06 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994522#comment-14994522
 ] 

Xuefu Zhang edited comment on HIVE-12184 at 11/6/15 9:57 PM:
-

It was already supported prior to this change. I dint have to do anything 
specific with my change, except had to be mindful that the tree could contain 
other children for such usage.
There are existing unit tests that tested this behavior 
clientpositive/describe_xpath.q and clientnegative/describe_xpath{1..4}.q



was (Author: ngangam):
It was already supported prior to this change. I dint have to do anything 
specific with my change, except had to be mindful that the tree could contain 
other children for such usage.
There are existing unit tests that tested this behavior 
clientpositive/describe_xpath.q and clientnegative/describe{1..4}.q


> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use
> ---
>
> Key: HIVE-12184
> URL: https://issues.apache.org/jira/browse/HIVE-12184
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1
>Reporter: Lenni Kuff
>Assignee: Naveen Gangam
> Attachments: HIVE-12184.2.patch, HIVE-12184.3.patch, 
> HIVE-12184.4.patch, HIVE-12184.patch
>
>
> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use.
> Repro:
> {code}
> : jdbc:hive2://localhost:1/default> create database foo;
> No rows affected (0.116 seconds)
> 0: jdbc:hive2://localhost:1/default> create table foo.foo(i int);
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> +---++--+--+
> | col_name  | data_type  | comment  |
> +---++--+--+
> | i | int|  |
> +---++--+--+
> 1 row selected (0.049 seconds)
> 0: jdbc:hive2://localhost:1/default> use foo;
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error in getting fields from 
> serde.Invalid Field foo (state=08S01,code=1)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12184) DESCRIBE of fully qualified table fails when db and table name match and non-default database is in use

2015-11-06 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994533#comment-14994533
 ] 

Xuefu Zhang commented on HIVE-12184:


Thanks for the clarification. It's good to know that Hive always goes beyond 
the expectation.

> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use
> ---
>
> Key: HIVE-12184
> URL: https://issues.apache.org/jira/browse/HIVE-12184
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1
>Reporter: Lenni Kuff
>Assignee: Naveen Gangam
> Attachments: HIVE-12184.2.patch, HIVE-12184.3.patch, 
> HIVE-12184.4.patch, HIVE-12184.patch
>
>
> DESCRIBE of fully qualified table fails when db and table name match and 
> non-default database is in use.
> Repro:
> {code}
> : jdbc:hive2://localhost:1/default> create database foo;
> No rows affected (0.116 seconds)
> 0: jdbc:hive2://localhost:1/default> create table foo.foo(i int);
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> +---++--+--+
> | col_name  | data_type  | comment  |
> +---++--+--+
> | i | int|  |
> +---++--+--+
> 1 row selected (0.049 seconds)
> 0: jdbc:hive2://localhost:1/default> use foo;
> 0: jdbc:hive2://localhost:1/default> describe foo.foo;
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error in getting fields from 
> serde.Invalid Field foo (state=08S01,code=1)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-12299) Hive Column Data Type definition in schema limited to 4000 characters - too small

2015-10-29 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981709#comment-14981709
 ] 

Xuefu Zhang edited comment on HIVE-12299 at 10/30/15 1:49 AM:
--

I think we are seeing how increasingly prominent the issue is. Link the related 
issue together: HIVE-12274, HIVE-11985.


was (Author: xuefuz):
I think we are seeing how increasingly prominent the issue is. Link the related 
issue together: HVIE-12274, HIVE-11985.

> Hive Column Data Type definition in schema limited to 4000 characters - too 
> small
> -
>
> Key: HIVE-12299
> URL: https://issues.apache.org/jira/browse/HIVE-12299
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Lakshmi Ramakrishnan
>
> The data type definitions in the table schema are limited to 4K characters - 
> as per the code here: 
> https://github.com/apache/hive/tree/master/metastore/scripts/upgrade/derby
> I checked quickly and all meta stores have similar schema definition for the 
> column type. 
> Is there any reason why this limit is low? We had a table that had defined a 
> struct, which had over 200 columns and the column names were rather verbose 
> (for readability). This caused a non-obvious failure like 
> FAILED: IllegalArgumentException Error: : expected at the end of 
> 'string:array configurable or at least increase it to something much higher?
> Additionally, there is no validation error that communicates this limitation 
> to the user, it required non-trivial debugging and looking into the table 
> definitions when it failed trying to parse what was essentially a truncated 
> type. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12299) Hive Column Data Type definition in schema limited to 4000 characters - too small

2015-10-29 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981709#comment-14981709
 ] 

Xuefu Zhang commented on HIVE-12299:


I think we are seeing how increasingly prominent the issue is. Link the related 
issue together: HVIE-12274, HIVE-11985.

> Hive Column Data Type definition in schema limited to 4000 characters - too 
> small
> -
>
> Key: HIVE-12299
> URL: https://issues.apache.org/jira/browse/HIVE-12299
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Lakshmi Ramakrishnan
>
> The data type definitions in the table schema are limited to 4K characters - 
> as per the code here: 
> https://github.com/apache/hive/tree/master/metastore/scripts/upgrade/derby
> I checked quickly and all meta stores have similar schema definition for the 
> column type. 
> Is there any reason why this limit is low? We had a table that had defined a 
> struct, which had over 200 columns and the column names were rather verbose 
> (for readability). This caused a non-obvious failure like 
> FAILED: IllegalArgumentException Error: : expected at the end of 
> 'string:array configurable or at least increase it to something much higher?
> Additionally, there is no validation error that communicates this limitation 
> to the user, it required non-trivial debugging and looking into the table 
> definitions when it failed trying to parse what was essentially a truncated 
> type. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column

2015-10-19 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12063:
---
Attachment: HIVE-12063.2.patch

> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: HIVE-12063
> URL: https://issues.apache.org/jira/browse/HIVE-12063
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12063.1.patch, HIVE-12063.2.patch, HIVE-12063.patch
>
>
> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
> which caused many problems including treating 0.0, 0.00 and so on as 0, which 
> has different precision/scale. Please refer to HIVE-7373 description. 
> However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems 
> remained. HIVE-11835 was resolved recently to address one of the problems, 
> where 0.0, 0.00, and so on cannot be read into decimal(1,1).
> However, HIVE-11835 didn't address the problem of showing as 0 in query 
> result for any decimal values such as 0.0, 0.00, etc. This causes confusion 
> as 0 and 0.0 have different precision/scale than 0.
> The proposal here is to pad zeros for query result to the type's scale. This 
> not only removes the confusion described above, but also aligns with many 
> other DBs. Internal decimal number representation doesn't change, however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12082) Null comparison for greatest and least operator

2015-10-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957394#comment-14957394
 ] 

Xuefu Zhang commented on HIVE-12082:


+1

> Null comparison for greatest and least operator
> ---
>
> Key: HIVE-12082
> URL: https://issues.apache.org/jira/browse/HIVE-12082
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-12082.2.patch, HIVE-12082.patch
>
>
> In mysql comparisons if any of the entries are null, then the result is null.
> [https://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html|https://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html]
>  and 
> [https://dev.mysql.com/doc/refman/5.0/en/type-conversion.html|https://dev.mysql.com/doc/refman/5.0/en/type-conversion.html].
> This can be demonstrated by the following mysql query:
> {noformat}
> mysql> select greatest(1, null) from test;
> +---+
> | greatest(1, null) |
> +---+
> |  NULL |
> +---+
> 1 row in set (0.00 sec)
> mysql> select greatest(-1, null) from test;
> ++
> | greatest(-1, null) |
> ++
> |   NULL |
> ++
> 1 row in set (0.00 sec)
> {noformat}
> This is in contrast to Hive, where null are ignored in the comparisons.
> {noformat}
> hive> select greatest(null, 1) from test;
> OK
> 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11892) UDTF run in local fetch task does not return rows forwarded during GenericUDTF.close()

2015-10-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957517#comment-14957517
 ] 

Xuefu Zhang commented on HIVE-11892:


Can we get "affected version" filled? Thanks.

> UDTF run in local fetch task does not return rows forwarded during 
> GenericUDTF.close()
> --
>
> Key: HIVE-11892
> URL: https://issues.apache.org/jira/browse/HIVE-11892
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11892.1.patch, HIVE-11892.2.patch
>
>
> Using the example UDTF GenericUDTFCount2, which is part of hive-contrib:
> {noformat}
> create temporary function udtfCount2 as 
> 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount2';
> set hive.fetch.task.conversion=minimal;
> -- Task created, correct output (2 rows)
> select udtfCount2() from src;
> set hive.fetch.task.conversion=more;
> -- Runs in local task, incorrect output (0 rows)
> select udtfCount2() from src;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12179) Add option to not add spark-assembly.jar to Hive classpath

2015-10-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957890#comment-14957890
 ] 

Xuefu Zhang commented on HIVE-12179:


I think the option should be another way around: not adding it if a flag is 
set. Otherwise, existing users will have to set the flag in order to get the 
existing behavior.

> Add option to not add spark-assembly.jar to Hive classpath
> --
>
> Key: HIVE-12179
> URL: https://issues.apache.org/jira/browse/HIVE-12179
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-12179.1.patch
>
>
> After running the following Hive script:
> {noformat}
> add jar hdfs:///tmp/junit-4.11.jar;
> show tables;
> {noformat}
> I can see the following lines getting printed to stdout when Hive exits:
> {noformat}
> WARN: The method class 
> org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.
> WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.
> {noformat}
> Also seeing the following warnings in stderr:
> {noformat}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/2.3.3.0-2981/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/2.3.3.0-2981/spark/lib/spark-assembly-1.4.1.2.3.3.0-2981-hadoop2.7.1.2.3.3.0-2981.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/2.3.3.0-2981/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/2.3.3.0-2981/spark/lib/spark-assembly-1.4.1.2.3.3.0-2981-hadoop2.7.1.2.3.3.0-2981.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> {noformat}
> It looks like this is due to the addition of the shaded spark-assembly.jar to 
> the classpath, which contains classes from icl-over-slf4j.jar (which is 
> causing the stdout messages) and slf4j-log4j12.jar.
> Removing spark-assembly.jar from being added to the classpath causes these 
> messages to go away. It would be good to have a way to specify that Hive not 
> add spark-assembly.jar to the class path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column

2015-10-07 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12063:
---
Description: 
HIVE-7373 was to address the problems of trimming tailing zeros by Hive, which 
caused many problems including treating 0.0, 0.00 and so on as 0, which has 
different precision/scale. Please refer to HIVE-7373 description. However, 
HIVE-7373 was reverted by HIVE-8745 while the underlying problems remained. 
HIVE-11835 was resolved recently to address one of the problems, where 0.0, 
0.00, and so on cannot be read into decimal(1,1).

However, HIVE-11835 didn't address the problem of showing as 0 in query result 
for any decimal values such as 0.0, 0.00, etc. This causes confusion as 0 and 
0.0 have different precision/scale than 0.

The proposal here is to pad zeros for query result to the type's scale. This 
not only removes the confusion described above, but also aligns with many other 
DBs. Internal decimal number representation doesn't change, however.

  was:
HIVE-7373 was to address the problem of trimming tailing zeros by Hive, which 
caused many problems including treating 0.0, 0.00 and so on as 0, which has 
different precision/scale. Please refer to HIVE-7373 description. However, 
HIVE-7373 was reverted by HIVE-8745 while the underlying problems remained. 
HIVE-11835 was resolved recently to address one of the problems, where 0.0, 
0.00, and so cannot be read into decimal(1,1).

However, HIVE-11835 didn't address the problem of showing as 0 in query result 
for any decimal values such as 0.0, 0.00, etc. This causes confusion as 0 and 
0.0 have different precision/scale than 0.

The proposal here is to pad zeros for query result to the type's scale. This 
not only removes the confusion described above, but also aligns with many other 
DBs. Internal decimal number representation doesn't change, however.


> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: HIVE-12063
> URL: https://issues.apache.org/jira/browse/HIVE-12063
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
> which caused many problems including treating 0.0, 0.00 and so on as 0, which 
> has different precision/scale. Please refer to HIVE-7373 description. 
> However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems 
> remained. HIVE-11835 was resolved recently to address one of the problems, 
> where 0.0, 0.00, and so on cannot be read into decimal(1,1).
> However, HIVE-11835 didn't address the problem of showing as 0 in query 
> result for any decimal values such as 0.0, 0.00, etc. This causes confusion 
> as 0 and 0.0 have different precision/scale than 0.
> The proposal here is to pad zeros for query result to the type's scale. This 
> not only removes the confusion described above, but also aligns with many 
> other DBs. Internal decimal number representation doesn't change, however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column

2015-10-15 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12063:
---
Attachment: HIVE-12063.1.patch

> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: HIVE-12063
> URL: https://issues.apache.org/jira/browse/HIVE-12063
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12063.1.patch, HIVE-12063.patch
>
>
> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
> which caused many problems including treating 0.0, 0.00 and so on as 0, which 
> has different precision/scale. Please refer to HIVE-7373 description. 
> However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems 
> remained. HIVE-11835 was resolved recently to address one of the problems, 
> where 0.0, 0.00, and so on cannot be read into decimal(1,1).
> However, HIVE-11835 didn't address the problem of showing as 0 in query 
> result for any decimal values such as 0.0, 0.00, etc. This causes confusion 
> as 0 and 0.0 have different precision/scale than 0.
> The proposal here is to pad zeros for query result to the type's scale. This 
> not only removes the confusion described above, but also aligns with many 
> other DBs. Internal decimal number representation doesn't change, however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10178) DateWritable incorrectly calculates daysSinceEpoch for negative Unix time

2015-10-15 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959201#comment-14959201
 ] 

Xuefu Zhang commented on HIVE-10178:


Could we update "Affects version/s"? Thanks.

> DateWritable incorrectly calculates daysSinceEpoch for negative Unix time
> -
>
> Key: HIVE-10178
> URL: https://issues.apache.org/jira/browse/HIVE-10178
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
> Fix For: 1.2.0, 1.0.2
>
> Attachments: HIVE-10178.01.patch, HIVE-10178.02.patch, 
> HIVE-10178.03-branch-1.0.patch, HIVE-10178.03.patch
>
>
> For example:
> {code}
> select cast(cast('1966-01-01 00:00:01' as timestamp) as date);
> 1966-01-02
> {code}
> Another example:
> {code}
> select last_day(cast('1966-01-31 00:00:01' as timestamp));
> OK
> 1966-02-28
> {code}
> more details:
> Date: 1966-01-01 00:00:01
> unix time UTC: -126230399
> daysSinceEpoch=−126230399000 / 8640 = -1460.88
> int daysSinceEpoch = -1460
> DateWritable having daysSinceEpoch=-1460 is 1966-01-02
> daysSinceEpoch should be -1461 instead  (1966-01-01)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11915) BoneCP returns closed connections from the pool

2015-10-15 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959177#comment-14959177
 ] 

Xuefu Zhang commented on HIVE-11915:


Could we update the "affected version" please?

> BoneCP returns closed connections from the pool
> ---
>
> Key: HIVE-11915
> URL: https://issues.apache.org/jira/browse/HIVE-11915
> Project: Hive
>  Issue Type: Bug
>Reporter: Takahiko Saito
>Assignee: Sergey Shelukhin
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11915.01.patch, HIVE-11915.02.patch, 
> HIVE-11915.03.patch, HIVE-11915.WIP.patch, HIVE-11915.patch
>
>
> It's a very old bug in BoneCP and it will never be fixed... There are 
> multiple workarounds on the internet but according to responses they are all 
> unreliable. We should upgrade to HikariCP (which in turn is only supported by 
> DN 4), meanwhile try some shamanic rituals. In this JIRA we will try a 
> relatively weak drum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12191) Hive timestamp problems

2015-10-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960939#comment-14960939
 ] 

Xuefu Zhang commented on HIVE-12191:


Thanks for reporting the problems, [~b...@cloudera.com]. It seems time to 
realign Hive's timestamp implementations. 

> Hive timestamp problems
> ---
>
> Key: HIVE-12191
> URL: https://issues.apache.org/jira/browse/HIVE-12191
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1
>Reporter: Ryan Blue
>
> This is an umbrella JIRA for problems found with Hive's timestamp (without 
> time zone) implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11721) non-ascii characters shows improper with "insert into"

2015-10-19 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964461#comment-14964461
 ] 

Xuefu Zhang commented on HIVE-11721:


You will need to reattach the patch to trigger another test run.

> non-ascii characters shows improper with "insert into"
> --
>
> Key: HIVE-11721
> URL: https://issues.apache.org/jira/browse/HIVE-11721
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 1.1.0, 1.2.1, 2.0.0
>Reporter: Jun Yin
>Assignee: Aleksei S
> Attachments: HIVE-11721.patch
>
>
> Hive: 1.1.0
> hive> create table char_255_noascii as select cast("Garçu 谢谢 Kôkaku 
> ありがとうございますkidôtai한국어" as char(255));
> hive> select * from char_255_noascii;
> OK
> Garçu 谢谢 Kôkaku ありがとうございますkidôtai>한국어
> it shows correct, and also it works good with "LOAD DATA" 
> but when I try another way to insert data as below:
> hive> create table nonascii(t1 char(255));
> OK
> Time taken: 0.125 seconds
> hive> insert into nonascii values("Garçu 谢谢 Kôkaku ありがとうございますkidôtai한국어");
> hive> select * from nonascii;
> OK
> Gar�u "" K�kaku B�LhFTVD~Ykid�tai\m� 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6712) HS2 JDBC driver is inconsistent w.r.t. auto commit

2015-10-20 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965717#comment-14965717
 ] 

Xuefu Zhang commented on HIVE-6712:
---

+1, looks good to me.

> HS2 JDBC driver is inconsistent w.r.t. auto commit
> --
>
> Key: HIVE-6712
> URL: https://issues.apache.org/jira/browse/HIVE-6712
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Xuefu Zhang
>Assignee: David McWhorter
>  Labels: jdbc
> Fix For: 2.0.0
>
> Attachments: HIVE-6712.patch
>
>
> I see an inconsistency in HS2 JDBC driver code:
> {code}
>   @Override
>   public void setAutoCommit(boolean autoCommit) throws SQLException {
> if (autoCommit) {
>   throw new SQLException("enabling autocommit is not supported");
> }
>   }
> {code}
> From above, it seems that auto commit is not supported. However, 
> {code}
>   @Override
>   public boolean getAutoCommit() throws SQLException {
> return true;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-11528) incrementally read query results when there's no ORDER BY

2015-10-20 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-11528:
--

Assignee: Keisuke Ogiwara

> incrementally read query results when there's no ORDER BY
> -
>
> Key: HIVE-11528
> URL: https://issues.apache.org/jira/browse/HIVE-11528
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Keisuke Ogiwara
>
> May require HIVE-11527. When there's no ORDER BY and there's more than one 
> reducer on the last stage of the query, it should be possible to return data 
> to the user as it is produced, instead of waiting for all reducers to finish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12218) Unable to create a like table for an hbase backed table

2015-10-20 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966193#comment-14966193
 ] 

Xuefu Zhang commented on HIVE-12218:


+1

> Unable to create a like table for an hbase backed table
> ---
>
> Key: HIVE-12218
> URL: https://issues.apache.org/jira/browse/HIVE-12218
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.1
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-12218.patch
>
>
> For an HBase backed table:
> {code}
> CREATE TABLE hbasetbl (key string, state string, country string, country_id 
> int)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = "info:state,info:country,info:country_id"
> );
> {code}
> Create its like table using query such as 
> create table hbasetbl_like like hbasetbl;
> It fails with error:
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. 
> org.apache.hadoop.hive.ql.metadata.HiveException: must specify an InputFormat 
> class



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column

2015-10-14 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12063:
---
Attachment: HIVE-12063.patch

Initial patch. There could be more test results that need to be updated.

> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: HIVE-12063
> URL: https://issues.apache.org/jira/browse/HIVE-12063
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-12063.patch
>
>
> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
> which caused many problems including treating 0.0, 0.00 and so on as 0, which 
> has different precision/scale. Please refer to HIVE-7373 description. 
> However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems 
> remained. HIVE-11835 was resolved recently to address one of the problems, 
> where 0.0, 0.00, and so on cannot be read into decimal(1,1).
> However, HIVE-11835 didn't address the problem of showing as 0 in query 
> result for any decimal values such as 0.0, 0.00, etc. This causes confusion 
> as 0 and 0.0 have different precision/scale than 0.
> The proposal here is to pad zeros for query result to the type's scale. This 
> not only removes the confusion described above, but also aligns with many 
> other DBs. Internal decimal number representation doesn't change, however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12175) Upgrade Kryo version to 3.0.x

2015-10-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956939#comment-14956939
 ] 

Xuefu Zhang commented on HIVE-12175:


Is 3.0.x compatible with 2.22?. It might break Hive on Spark due to any 
incompatibility issue. Thus, I think we need to be careful about this upgrade.

> Upgrade Kryo version to 3.0.x
> -
>
> Key: HIVE-12175
> URL: https://issues.apache.org/jira/browse/HIVE-12175
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Current version of kryo (2.22) has some issue with serializing ArrayLists 
> generated using Arrays.asList(). We need to either replace all occurrences of 
>  Arrays.asList() or change the current StdInstantiatorStrategy. This issue is 
> fixed in later versions and kryo community recommends using 
> DefaultInstantiatorStrategy with fallback to StdInstantiatorStrategy. More 
> discussion about this issue is here 
> https://github.com/EsotericSoftware/kryo/issues/216. Alternatively, custom 
> serilization/deserilization class can be provided for Arrays.asList.
> Also, kryo 3.0 introduced unsafe based serialization which claims to have 
> much better performance for certain types of serialization. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-11710) Beeline embedded mode doesn't output query progress after setting any session property

2015-10-19 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-11710:
---
Comment: was deleted

(was: OK. Seems we don't need flush the string manually since autoFlush is set 
to true in PrintStream {{PrintStream(OutputStream out, boolean autoFlush, 
String encoding) }}.)

> Beeline embedded mode doesn't output query progress after setting any session 
> property
> --
>
> Key: HIVE-11710
> URL: https://issues.apache.org/jira/browse/HIVE-11710
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11710.2.patch, HIVE-11710.3.patch, 
> HIVE-11710.4.patch, HIVE-11710.patch
>
>
> Connect to beeline embedded mode {{beeline -u jdbc:hive2://}}. Then set 
> anything in the session like {{set aa=true;}}.
> After that, any query like {{select count(*) from src;}} will only output 
> result but no query progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11710) Beeline embedded mode doesn't output query progress after setting any session property

2015-10-19 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963592#comment-14963592
 ] 

Xuefu Zhang commented on HIVE-11710:


+1

> Beeline embedded mode doesn't output query progress after setting any session 
> property
> --
>
> Key: HIVE-11710
> URL: https://issues.apache.org/jira/browse/HIVE-11710
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-11710.2.patch, HIVE-11710.3.patch, 
> HIVE-11710.4.patch, HIVE-11710.patch
>
>
> Connect to beeline embedded mode {{beeline -u jdbc:hive2://}}. Then set 
> anything in the session like {{set aa=true;}}.
> After that, any query like {{select count(*) from src;}} will only output 
> result but no query progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11919) Hive Union Type Mismatch

2015-10-08 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949401#comment-14949401
 ] 

Xuefu Zhang commented on HIVE-11919:


Could we please update the affected and fixed versions?

> Hive Union Type Mismatch
> 
>
> Key: HIVE-11919
> URL: https://issues.apache.org/jira/browse/HIVE-11919
> Project: Hive
>  Issue Type: Bug
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-11919.1.patch, HIVE-11919.2.patch
>
>
> In Hive for union right most type wins out for most primitive types during 
> plan gen. However when union op gets initialized the type gets switched.
> This could result in bad data & type exceptions.
> This happens only in non cbo mode.
> In CBO mode, Hive would add explicit type casts that would prevent such type 
> issues.
> Sample Query: 
> select cd/sum(cd) over() from(select cd from u1 union all select cd from u2 
> union all select cd from u3)u4;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12091) HiveException (Failed to close AbstractFileMergeOperator) occurs during loading data to ORC file, when hive.merge.sparkfiles is set to true. [Spark Branch]

2015-10-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953316#comment-14953316
 ] 

Xuefu Zhang commented on HIVE-12091:


+1.
Also +1 to the idea of test case covering this.

> HiveException (Failed to close AbstractFileMergeOperator) occurs during 
> loading data to ORC file, when hive.merge.sparkfiles is set to true. [Spark 
> Branch]
> ---
>
> Key: HIVE-12091
> URL: https://issues.apache.org/jira/browse/HIVE-12091
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xin Hao
>Assignee: Rui Li
> Attachments: HIVE-12091.1-spark.patch
>
>
> This issue occurs when hive.merge.sparkfiles is set to true. And can be 
> workaround by setting hive.merge.sparkfiles to false.
> BTW, we did a local experiment to run the case with MR engine (set 
> hive.merge.mapfiles=true; set hive.merge.mapredfiles=true;), it can pass.
> (1)Component Version:
> -- Hive Spark Branch 70eeadd2f019dcb2e301690290c8807731eab7a1  +  Hive-11473 
> patch (HIVE-11473.3-spark.patch)  ---> This is to support Spark 1.5 for Hive 
> on Spark
> -- Spark 1.5.1
> (2)Case used:
> -- Big-Bench  Data Load (load data from HDFS to Hive warehouse, scored as ORC 
> format). The related HiveQL:
> {noformat}
> DROP TABLE IF EXISTS customer_temporary;
> CREATE EXTERNAL TABLE customer_temporary
>   ( c_customer_sk bigint  --not null
>   , c_customer_id string  --not null
>   , c_current_cdemo_skbigint
>   , c_current_hdemo_skbigint
>   , c_current_addr_sk bigint
>   , c_first_shipto_date_skbigint
>   , c_first_sales_date_sk bigint
>   , c_salutation  string
>   , c_first_name  string
>   , c_last_name   string
>   , c_preferred_cust_flag string
>   , c_birth_day   int
>   , c_birth_month int
>   , c_birth_year  int
>   , c_birth_country   string
>   , c_login   string
>   , c_email_address   string
>   , c_last_review_datestring
>   )
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
>   STORED AS TEXTFILE LOCATION 
> '/user/root/benchmarks/bigbench_n1t/data/customer'
> ;
> DROP TABLE IF EXISTS customer;
> CREATE TABLE customer
> STORED AS ORC
> AS
> SELECT * FROM customer_temporary
> ;
> {noformat}
> (3)Error/Exception Message:
> {noformat}
> 15/10/12 14:28:38 INFO exec.Utilities: PLAN PATH = 
> hdfs://bhx2:8020/tmp/hive/root/4e145415-d4ea-4751-9e16-ff31edb0c258/hive_2015-10-12_14-28-12_485_2093357701513622173-1/-mr-10005/d891fdec-eacc-4f66-8827-e2b650c24810/map.xml
> 15/10/12 14:28:38 INFO OrcFileMergeOperator: ORC merge file input path: 
> hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/-ext-10003/01_0
> 15/10/12 14:28:38 INFO OrcFileMergeOperator: Merged stripe from file 
> hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/-ext-10003/01_0
>  [ offset : 3 length: 10525754 row: 247500 ]
> 15/10/12 14:28:38 INFO spark.SparkMergeFileRecordHandler: Closing Merge 
> Operator OFM
> 15/10/12 14:28:38 ERROR executor.Executor: Exception in task 1.0 in stage 1.0 
> (TID 4)
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Failed to close AbstractFileMergeOperator
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMergeFileRecordHandler.close(SparkMergeFileRecordHandler.java:115)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:118)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:118)
>   at 
> org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
>   at 
> org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:88)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at 

[jira] [Commented] (HIVE-12028) An empty array is of type Array and incompatible with other array types

2015-10-12 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953484#comment-14953484
 ] 

Xuefu Zhang commented on HIVE-12028:


Just for my understanding, what does "INT(NULL)" mean in this case?

> An empty array is of type Array and incompatible with other array 
> types
> ---
>
> Key: HIVE-12028
> URL: https://issues.apache.org/jira/browse/HIVE-12028
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 1.2.1
>Reporter: Furcy Pin
>
> How to reproduce:
> ```sql
> SELECT ARRAY(ARRAY(1),ARRAY()) ;
> FAILED: SemanticException [Error 10016]: Line 1:22 Argument type mismatch 
> 'ARRAY': Argument type "array" is different from preceding arguments. 
> Previous type was "array"
> SELECT COALESCE(ARRAY(1),ARRAY()) ;
> FAILED: SemanticException [Error 10016]: Line 1:25 Argument type mismatch 
> 'ARRAY': The expressions after COALESCE should all have the same type: 
> "array" is expected but "array" is found
> ```
> This is especially painful for COALESCE, as we cannot
> remove NULLS after doing a JOIN.
> The same problem holds with maps.
> The only workaround I could think of (except adding my own UDF)
> is quite ugly :
> ```sql
> SELECT ARRAY(ARRAY(1),empty.arr) FROM (SELECT collect_set(id) as arr FROM 
> (SELECT 1 as id) T WHERE id=0) empty ;
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12046) Re-create spark client if connection is dropped

2015-10-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946785#comment-14946785
 ] 

Xuefu Zhang commented on HIVE-12046:


Look at the patch again and it seems that it handles the case where the remote 
spark client is in a bad state when trying to submit a spark job. This is good. 
However, it's unclear that what's going to happen when either 
getDefaultParallelism() or getExecutorCount() is called in such a situation. 
Also even in case that exectue() is called, the remote client can become bad 
right after the isActive() check.

Therefore, I think we need to define a scope for this JIRA. If we want to be 
resilient to the connection loss, then we need to consider more cases and the 
way to handle it. However, it's also acceptable in my opinion that we can 
detect the error and ask user to log out and log in again to get a valid 
session. The latter seems simpler and easier.

> Re-create spark client if connection is dropped
> ---
>
> Key: HIVE-12046
> URL: https://issues.apache.org/jira/browse/HIVE-12046
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12046.1.patch
>
>
> Currently, if the connection to the spark cluster is dropped, the spark 
> client will stay in a bad state. A new Hive session is needed to re-establish 
> the connection. It is better to auto reconnect in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12035) branch-1 build broken

2015-10-05 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943903#comment-14943903
 ] 

Xuefu Zhang commented on HIVE-12035:


+1

> branch-1 build broken
> -
>
> Key: HIVE-12035
> URL: https://issues.apache.org/jira/browse/HIVE-12035
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-12035.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11473) Upgrade Spark dependency to 1.5 [Spark Branch]

2015-10-05 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944018#comment-14944018
 ] 

Xuefu Zhang commented on HIVE-11473:


Hi [~lirui], precommit-test has been suffering some env related issues and we 
are looking into it. I will take a look at the parquet test problem. Thanks.

> Upgrade Spark dependency to 1.5 [Spark Branch]
> --
>
> Key: HIVE-11473
> URL: https://issues.apache.org/jira/browse/HIVE-11473
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Jimmy Xiang
>Assignee: Rui Li
> Attachments: HIVE-11473.1-spark.patch, HIVE-11473.2-spark.patch, 
> HIVE-11473.3-spark.patch, HIVE-11473.3-spark.patch
>
>
> In Spark 1.5, SparkListener interface is changed. So HoS may fail to create 
> the spark client if the un-implemented event callback method is invoked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12046) Re-create spark client if connection is dropped

2015-10-06 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945913#comment-14945913
 ] 

Xuefu Zhang commented on HIVE-12046:


+1 pending on test.

> Re-create spark client if connection is dropped
> ---
>
> Key: HIVE-12046
> URL: https://issues.apache.org/jira/browse/HIVE-12046
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12046.1.patch
>
>
> Currently, if the connection to the spark cluster is dropped, the spark 
> client will stay in a bad state. A new Hive session is needed to re-establish 
> the connection. It is better to auto reconnect in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12011) unable to create temporary table using CTAS if regular table with that name already exists

2015-10-06 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945915#comment-14945915
 ] 

Xuefu Zhang commented on HIVE-12011:


could we please update the affected version and the fix version? Also, does 
this apply to branch-1? Thanks.

> unable to create temporary table using CTAS if regular table with that name 
> already exists
> --
>
> Key: HIVE-12011
> URL: https://issues.apache.org/jira/browse/HIVE-12011
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12011.01.patch
>
>
> CTAS temporary table query fails if regular table with the same name already 
> exists. 
> Steps to reproduce the issue:
> {noformat}
> hive> use dbtemptable;
> OK
> Time taken: 0.273 seconds
> hive> create table a(i int);
> OK
> Time taken: 0.297 seconds
> hive> create temporary table a(i int);
> OK
> Time taken: 0.165 seconds
> hive> create table b(i int);
> OK
> Time taken: 0.212 seconds
> hive> create temporary table b as select * from a;
> FAILED: SemanticException org.apache.hadoop.hive.ql.parse.SemanticException: 
> Table already exists: dbtemptable.b
> hive> create table c(i int);
> OK
> Time taken: 0.264 seconds
> hive> create temporary table b as select * from c;
> FAILED: SemanticException org.apache.hadoop.hive.ql.parse.SemanticException: 
> Table already exists: dbtemptable.b
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12045) ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)

2015-10-06 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946030#comment-14946030
 ] 

Xuefu Zhang commented on HIVE-12045:


[~ztoth], thanks for reporting the problem. Would it be okay for your to attach 
the jars (for both generic and non-generic versions) that you used to reproduce 
the problem? Thanks.

> ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)
> --
>
> Key: HIVE-12045
> URL: https://issues.apache.org/jira/browse/HIVE-12045
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
> Environment: Cloudera QuickStart VM, beeline
>Reporter: Zsolt Tóth
>
> If I execute the following query in beeline, I get ClassNotFoundException for 
> the UDF class.
> {code}
> drop function myGenericUdf;
> create function myGenericUdf as 'org.example.myGenericUdf' using jar 
> 'hdfs:///tmp/myudf.jar';
> select distinct myGenericUdf(1,2,1) from mytable;
> {code}
> In my example, myGenericUdf just looks for the 1st argument's value in the 
> others and returns the index. I don't think this is related to the actual 
> GenericUDF function.
> Note that:
> "select myGenericUdf(1,2,1) from mytable;" succeeds
> If I use the non-generic implementation of the same UDF, the select distinct 
> call succeeds.
> StackTrace:
> {code}
> 15/10/06 05:20:25 ERROR exec.Utilities: Failed to load plan: 
> hdfs://quickstart.cloudera:8020/tmp/hive/hive/f9de3f09-c12d-4528-9ee6-1f12932a14ae/hive_2015-10-06_05-20-07_438_6519207588897968406-20/-mr-10003/27cd7226-3e22-46f4-bddd-fb8fd4aa4b8d/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at 

[jira] [Commented] (HIVE-11985) don't store type names in metastore when metastore type names are not used

2015-10-08 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949665#comment-14949665
 ] 

Xuefu Zhang commented on HIVE-11985:


First I have to admit that I don't have the enough knowledge to conclude if the 
approach here causes a problem. As far as I know, 4000 character limit is only 
a problem for Oracle, yet the patch seems rejecting any schema that is more 
than 2000 long. This sounds rather harsh, and a long of times users get around 
the problem by changing Oracle settings.

On a high level, I'd echo [~ashutoshc] and [~alangates]'s concerns. If we spend 
time on this, I'd rather solve the problem in the generic way, regardless the 
serde type and db type. The obvious inconsistency I see here is that we store 
for avro the schema if it's less than 2000 while storing a constant string for 
anything over that. If we determine that it's not necessary to store it for 
avro, don't store it at all. Or if we can solve the length problem for all 
serdes, then that's probably the the right way to go.

> don't store type names in metastore when metastore type names are not used
> --
>
> Key: HIVE-11985
> URL: https://issues.apache.org/jira/browse/HIVE-11985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11985.01.patch, HIVE-11985.02.patch, 
> HIVE-11985.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12082) Null comparison for greatest and least operator

2015-10-13 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955307#comment-14955307
 ] 

Xuefu Zhang commented on HIVE-12082:


Patch looks good. Some minor comments on RB.

> Null comparison for greatest and least operator
> ---
>
> Key: HIVE-12082
> URL: https://issues.apache.org/jira/browse/HIVE-12082
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-12082.patch
>
>
> In mysql comparisons if any of the entries are null, then the result is null.
> [https://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html|https://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html]
>  and 
> [https://dev.mysql.com/doc/refman/5.0/en/type-conversion.html|https://dev.mysql.com/doc/refman/5.0/en/type-conversion.html].
> This can be demonstrated by the following mysql query:
> {noformat}
> mysql> select greatest(1, null) from test;
> +---+
> | greatest(1, null) |
> +---+
> |  NULL |
> +---+
> 1 row in set (0.00 sec)
> mysql> select greatest(-1, null) from test;
> ++
> | greatest(-1, null) |
> ++
> |   NULL |
> ++
> 1 row in set (0.00 sec)
> {noformat}
> This is in contrast to Hive, where null are ignored in the comparisons.
> {noformat}
> hive> select greatest(null, 1) from test;
> OK
> 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10438) Architecture for ResultSet Compression via external plugin

2015-10-13 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955856#comment-14955856
 ] 

Xuefu Zhang commented on HIVE-10438:


Some additional comments on RB.

> Architecture for  ResultSet Compression via external plugin
> ---
>
> Key: HIVE-10438
> URL: https://issues.apache.org/jira/browse/HIVE-10438
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive, Thrift API
>Affects Versions: 1.2.0
>Reporter: Rohit Dholakia
>Assignee: Rohit Dholakia
>  Labels: patch
> Attachments: HIVE-10438-1.patch, HIVE-10438.patch, 
> Proposal-rscompressor.pdf, README.txt, 
> Results_Snappy_protobuf_TBinary_TCompact.pdf, hs2ResultSetCompressor.zip, 
> hs2driver-master.zip
>
>
> This JIRA proposes an architecture for enabling ResultSet compression which 
> uses an external plugin. 
> The patch has three aspects to it: 
> 0. An architecture for enabling ResultSet compression with external plugins
> 1. An example plugin to demonstrate end-to-end functionality 
> 2. A container to allow everyone to write and test ResultSet compressors with 
> a query submitter (https://github.com/xiaom/hs2driver) 
> Also attaching a design document explaining the changes, experimental results 
> document, and a pdf explaining how to setup the docker container to observe 
> end-to-end functionality of ResultSet compression. 
> https://reviews.apache.org/r/35792/ Review board link. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12046) Re-create spark client if connection is dropped

2015-10-09 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950635#comment-14950635
 ] 

Xuefu Zhang commented on HIVE-12046:


+1 to the latest patch.

> Re-create spark client if connection is dropped
> ---
>
> Key: HIVE-12046
> URL: https://issues.apache.org/jira/browse/HIVE-12046
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12046.1.patch, HIVE-12046.2.patch
>
>
> Currently, if the connection to the spark cluster is dropped, the spark 
> client will stay in a bad state. A new Hive session is needed to re-establish 
> the connection. It is better to auto reconnect in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11203) Beeline force option doesn't force execution when errors occurred in a script.

2015-07-08 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618540#comment-14618540
 ] 

Xuefu Zhang commented on HIVE-11203:


+1

 Beeline force option doesn't force execution when errors occurred in a script.
 --

 Key: HIVE-11203
 URL: https://issues.apache.org/jira/browse/HIVE-11203
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-11203.patch


 The force option doesn't function as wiki described.  
 https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10927) Add number of HMS/HS2 connection metrics

2015-07-09 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620977#comment-14620977
 ] 

Xuefu Zhang commented on HIVE-10927:


Are this patch and  those from previous JIRAs also applicable to branch-1? If 
so, we should probably also commit them to that branch as well.

 Add number of HMS/HS2 connection metrics
 

 Key: HIVE-10927
 URL: https://issues.apache.org/jira/browse/HIVE-10927
 Project: Hive
  Issue Type: Sub-task
  Components: Diagnosability
Reporter: Szehon Ho
Assignee: Szehon Ho
  Labels: TODOC2.0
 Fix For: 2.0.0

 Attachments: HIVE-10927.2.patch, HIVE-10927.2.patch, HIVE-10927.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10515) Create tests to cover existing (supported) Hive CLI functionality

2015-07-09 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621627#comment-14621627
 ] 

Xuefu Zhang commented on HIVE-10515:


[~Ferd], Sure, if you think the coverage has reached to acceptable level. 
Thanks.

 Create tests to cover existing (supported) Hive CLI functionality
 -

 Key: HIVE-10515
 URL: https://issues.apache.org/jira/browse/HIVE-10515
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Affects Versions: 0.10.0
Reporter: Xuefu Zhang
Assignee: Ferdinand Xu

 After removing HiveServer1, Hive CLI's functionality is reduced to its 
 original use case, a thick client application. Let's identify this so that we 
 maintain it when implementation is changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10791) Beeline-CLI: Implement in-place update UI for CLI compatibility

2015-07-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616678#comment-14616678
 ] 

Xuefu Zhang commented on HIVE-10791:


[~Ferd], I think [~gopalv] meant the job status tracking shown by Hive CLI. 
Refer to HIVE-8495.

[~gopalv], did you meant that HIVE-8495 was only implemented for Hive CLI? If 
so, don't you think that the feature was incomplete in certain sense and it 
might be a better idea for the original dev to support that feature for BeeLine 
as well? I knew of the feature and saw it in Hive CLI, but I'm not sure if the 
feature is also in BeeLine as it should be.

 Beeline-CLI: Implement in-place update UI for CLI compatibility
 ---

 Key: HIVE-10791
 URL: https://issues.apache.org/jira/browse/HIVE-10791
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Affects Versions: beeline-cli-branch
Reporter: Gopal V
Priority: Critical

 The current CLI implementation has an in-place updating UI which offers a 
 clear picture of execution runtime and failures.
 This is designed for large DAGs which have more than 10 verticles, where the 
 old UI would scroll sideways.
 The new CLI implementation needs to keep up the usability standards set by 
 the old one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11182) Enable optimized hash tables for spark [Spark Branch]

2015-07-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617940#comment-14617940
 ] 

Xuefu Zhang commented on HIVE-11182:


+1

 Enable optimized hash tables for spark [Spark Branch]
 -

 Key: HIVE-11182
 URL: https://issues.apache.org/jira/browse/HIVE-11182
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-11182.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11191) Beeline-cli: support hive.cli.errors.ignore in new CLI

2015-07-09 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621434#comment-14621434
 ] 

Xuefu Zhang commented on HIVE-11191:


+1

 Beeline-cli: support hive.cli.errors.ignore in new CLI
 --

 Key: HIVE-11191
 URL: https://issues.apache.org/jira/browse/HIVE-11191
 Project: Hive
  Issue Type: Sub-task
  Components: CLI
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-11191.1-beeline-cli.patch, 
 HIVE-11191.2-beeline-cli.patch


 In the old CLI, it uses hive.cli.errors.ignore from the hive configuration 
 to force execution a script when errors occurred. In the beeline, it has a 
 similar option called force. We need to support the previous configuration 
 using beeline functionality. More details about force option are available in 
 https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12433) Merge master into spark 11/17/2015 [Spark Branch]

2015-11-17 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12433:
---
Attachment: HIVE-9202.1-spark.patch

> Merge master into spark 11/17/2015 [Spark Branch]
> -
>
> Key: HIVE-12433
> URL: https://issues.apache.org/jira/browse/HIVE-12433
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Brock Noland
>Assignee: Xuefu Zhang
> Fix For: spark-branch
>
> Attachments: HIVE-12433.1-spark.branch, HIVE-9202.1-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    4   5   6   7   8   9   10   11   12   13   >