from:"Xuefu Zhang \(JIRA\)"

[jira] [Issue Comment Deleted] (HIVE-9948) SparkUtilities.getFileName passes File.separator to String.split() method

2015-03-14 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9948:
--
Comment: was deleted

(was: Hi Xuefu,

can u suggest on this,

hive show roles;
FAILED: SemanticException The current builtin authorization in Hive is 
incomplete and disabled.
Error from Hive: error code: '0' error message: 'ExecuteStatement finished with 
operation state: CLOSED_STATE'
)

 SparkUtilities.getFileName passes File.separator to String.split() method
 -

 Key: HIVE-9948
 URL: https://issues.apache.org/jira/browse/HIVE-9948
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Fix For: 1.2.0

 Attachments: HIVE-9948.1.patch


 String.split() method expects regex. This is why File.separator can not be 
 passed to split.
 In this particular case we can use FilenameUtils.getName to get file name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9994) Hive query plan returns sensitive data to external applications

2015-03-18 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367683#comment-14367683
 ] 

Xuefu Zhang commented on HIVE-9994:
---

Patch looks good. One question: do we need to check null for the input in 
redactLogString() as it's a public method?

 Hive query plan returns sensitive data to external applications
 ---

 Key: HIVE-9994
 URL: https://issues.apache.org/jira/browse/HIVE-9994
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9994.1.patch


 Some applications are using getQueryString() method from the QueryPlan class 
 to get the query that is being executed by Hive. The query string returned is 
 not redacted, and it is returning sensitive information that is logged in 
 Navigator.
 We need to return data redacted from the QueryPlan to avoid other 
 applications to log sensitive data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9974) Sensitive data redaction: data appears in name of mapreduce job

2015-03-16 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363397#comment-14363397
 ] 

Xuefu Zhang commented on HIVE-9974:
---

+1

 Sensitive data redaction: data appears in name of mapreduce job
 ---

 Key: HIVE-9974
 URL: https://issues.apache.org/jira/browse/HIVE-9974
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9974.1.patch


 Set up a cluster, configured a redaction rule to redact B0096EZHM2, and ran 
 Hive queries on the cluster.
 Looking at the YARN RM web UI and Job History Server web UI, I see that the 
 mapreduce jobs spawned by the Hive queries have the sensitive data 
 (B0096EZHM2) showing in the job names:
 e.g., select product, useri...product='B0096EZHM2'(Stage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]

2015-03-16 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364539#comment-14364539
 ] 

Xuefu Zhang commented on HIVE-9697:
---

[~lirui], I don't think we had a closure on this. totalSize is closer to file 
size, while rawDataSize closer to memory size required. While using totalSize 
is more aggressive in taking map join, some file format, such as ORC/Parquet, 
is very good at compression (10x is comment). Thus, if whether to do map join 
is based on file size, the executor can run OOM. On the other hand, rawDateSize 
is more conservative on memory estimation, which also gives less opportunity 
for map-join.

I'm not sure which one is better for Hive on Spark. File size is what 
hive.auto.convert.join.noconditionaltask.size implies and what user can see, 
while rawDataSize is closer to memory required. However, once OOM happens, user 
gets no result. It's worse than a result that comes slower, right?

Any thoughts?

 Hive on Spark is not as aggressive as MR on map join [Spark Branch]
 ---

 Key: HIVE-9697
 URL: https://issues.apache.org/jira/browse/HIVE-9697
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xin Hao

 We have a finding during running some Big-Bench cases:
 when the same small table size threshold is used, Map Join operator will not 
 be generated in Stage Plans for Hive on Spark, while will be generated for 
 Hive on MR.
 For example, When we run BigBench Q25, the meta info of one input ORC table 
 is as below:
 totalSize=1748955 (about 1.5M)
 rawDataSize=123050375 (about 120M)
 If we use the following parameter settings,
 set hive.auto.convert.join=true;
 set hive.mapjoin.smalltable.filesize=2500;
 set hive.auto.convert.join.noconditionaltask=true;
 set hive.auto.convert.join.noconditionaltask.size=1; (100M)
 Map Join will be enabled for Hive on MR mode, while will not be enabled for 
 Hive on Spark.
 We found that for Hive on MR, the HDFS file size for the table 
 (ContentSummary.getLength(), should approximate the value of ‘totalSize’) 
 will be used to compare with the threshold 100M (smaller than 100M), while 
 for Hive on Spark 'rawDataSize' will be used to compare with the threshold 
 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark 
 for this case. And as a result Hive on Spark will get much lower performance 
 data than Hive on MR for this case.
 When we set  hive.auto.convert.join.noconditionaltask.size=15000; (150M), 
 MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have 
 similar performance data with Hive on MR by then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9934) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password

2015-03-16 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364092#comment-14364092
 ] 

Xuefu Zhang commented on HIVE-9934:
---

+1

 Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to 
 degrade the authentication mechanism to none, allowing authentication 
 without password
 --

 Key: HIVE-9934
 URL: https://issues.apache.org/jira/browse/HIVE-9934
 Project: Hive
  Issue Type: Bug
  Components: Security
Affects Versions: 1.1.0
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-9934.1.patch, HIVE-9934.2.patch


 Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to 
 degrade the authentication mechanism to none, allowing authentication 
 without password.
 See: http://docs.oracle.com/javase/jndi/tutorial/ldap/security/simple.html
 “If you supply an empty string, an empty byte/char array, or null to the 
 Context.SECURITY_CREDENTIALS environment property, then the authentication 
 mechanism will be none. This is because the LDAP requires the password to 
 be nonempty for simple authentication. The protocol automatically converts 
 the authentication to none if a password is not supplied.”
  
 Since the LdapAuthenticationProviderImpl.Authenticate method is relying on a 
 NamingException being thrown during creation of initial context, it does not 
 fail when the context result is an “unauthenticated” positive response from 
 the LDAP server. The end result is, one can authenticate with HiveServer2 
 using the LdapAuthenticationProviderImpl with only a user name and an empty 
 password.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9934) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password

2015-03-17 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365832#comment-14365832
 ] 

Xuefu Zhang commented on HIVE-9934:
---

[~prasadm], I think lacking @Test seems fine in this case, as the class is 
extended from TestCase. I also saw the added test case was run in previous test 
result. Thus, patch #3 is good as far as I can see. Let me know if you see 
differently.
 

 Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to 
 degrade the authentication mechanism to none, allowing authentication 
 without password
 --

 Key: HIVE-9934
 URL: https://issues.apache.org/jira/browse/HIVE-9934
 Project: Hive
  Issue Type: Bug
  Components: Security
Affects Versions: 1.1.0
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-9934.1.patch, HIVE-9934.2.patch, HIVE-9934.3.patch, 
 HIVE-9934.3.patch, HIVE-9934.4.patch


 Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to 
 degrade the authentication mechanism to none, allowing authentication 
 without password.
 See: http://docs.oracle.com/javase/jndi/tutorial/ldap/security/simple.html
 “If you supply an empty string, an empty byte/char array, or null to the 
 Context.SECURITY_CREDENTIALS environment property, then the authentication 
 mechanism will be none. This is because the LDAP requires the password to 
 be nonempty for simple authentication. The protocol automatically converts 
 the authentication to none if a password is not supplied.”
  
 Since the LdapAuthenticationProviderImpl.Authenticate method is relying on a 
 NamingException being thrown during creation of initial context, it does not 
 fail when the context result is an “unauthenticated” positive response from 
 the LDAP server. The end result is, one can authenticate with HiveServer2 
 using the LdapAuthenticationProviderImpl with only a user name and an empty 
 password.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9991) Cannot do a SELECT on external tables that are on S3 due to Encryption error

2015-03-17 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365606#comment-14365606
 ] 

Xuefu Zhang commented on HIVE-9991:
---

+1 pending on test

 Cannot do a SELECT on external tables that are on S3 due to Encryption error
 

 Key: HIVE-9991
 URL: https://issues.apache.org/jira/browse/HIVE-9991
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9991.1.patch


 I cannot do any select query on external tables that are not part of HDFS. 
 For example S3.
 {code}
 Select * from my_table limit 10;
 FAILED: SemanticException Unable to determine if s3n://my-bucket/is 
 encrypted: java.lang.IllegalArgumentException: Wrong FS: s3n://my-bucket/, 
 expected: hdfs://0.0.0.0:8020
 {code}
 This error is due to a internal function that checks if a table is encrypted 
 or not. This is only supported on HDFS files, but the check is happening on 
 any external table as well causing the above error.
 To fix this, we should check for encrypted tables only for HDFS tables. And 
 skip the check for any other file schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9934) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password

2015-03-17 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9934:
--
Attachment: HIVE-9934.4.patch

Update the patch, adding @Test annotation.

 Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to 
 degrade the authentication mechanism to none, allowing authentication 
 without password
 --

 Key: HIVE-9934
 URL: https://issues.apache.org/jira/browse/HIVE-9934
 Project: Hive
  Issue Type: Bug
  Components: Security
Affects Versions: 1.1.0
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-9934.1.patch, HIVE-9934.2.patch, HIVE-9934.3.patch, 
 HIVE-9934.3.patch, HIVE-9934.4.patch


 Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to 
 degrade the authentication mechanism to none, allowing authentication 
 without password.
 See: http://docs.oracle.com/javase/jndi/tutorial/ldap/security/simple.html
 “If you supply an empty string, an empty byte/char array, or null to the 
 Context.SECURITY_CREDENTIALS environment property, then the authentication 
 mechanism will be none. This is because the LDAP requires the password to 
 be nonempty for simple authentication. The protocol automatically converts 
 the authentication to none if a password is not supplied.”
  
 Since the LdapAuthenticationProviderImpl.Authenticate method is relying on a 
 NamingException being thrown during creation of initial context, it does not 
 fail when the context result is an “unauthenticated” positive response from 
 the LDAP server. The end result is, one can authenticate with HiveServer2 
 using the LdapAuthenticationProviderImpl with only a user name and an empty 
 password.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (HIVE-9934) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without

2015-03-17 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9934:
--
Comment: was deleted

(was: 

{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12705130/HIVE-9934.4.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3059/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3059/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3059/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] Excluding org.scala-lang:scala-compiler:jar:2.10.0 from the shaded jar.
[INFO] Excluding org.scala-lang:scala-reflect:jar:2.10.0 from the shaded jar.
[INFO] Excluding com.fasterxml.jackson.core:jackson-databind:jar:2.3.1 from the 
shaded jar.
[INFO] Excluding com.fasterxml.jackson.core:jackson-annotations:jar:2.3.0 from 
the shaded jar.
[INFO] Excluding com.fasterxml.jackson.core:jackson-core:jar:2.3.1 from the 
shaded jar.
[INFO] Excluding org.apache.mesos:mesos:jar:shaded-protobuf:0.18.1 from the 
shaded jar.
[INFO] Excluding com.clearspring.analytics:stream:jar:2.7.0 from the shaded jar.
[INFO] Excluding com.codahale.metrics:metrics-core:jar:3.0.0 from the shaded 
jar.
[INFO] Excluding com.codahale.metrics:metrics-jvm:jar:3.0.0 from the shaded jar.
[INFO] Excluding com.codahale.metrics:metrics-json:jar:3.0.0 from the shaded 
jar.
[INFO] Excluding com.codahale.metrics:metrics-graphite:jar:3.0.0 from the 
shaded jar.
[INFO] Excluding org.tachyonproject:tachyon-client:jar:0.5.0 from the shaded 
jar.
[INFO] Excluding org.tachyonproject:tachyon:jar:0.5.0 from the shaded jar.
[INFO] Excluding org.spark-project:pyrolite:jar:2.0.1 from the shaded jar.
[INFO] Excluding net.sf.py4j:py4j:jar:0.8.2.1 from the shaded jar.
[INFO] Excluding org.spark-project.spark:unused:jar:1.0.0 from the shaded jar.
[INFO] Excluding org.apache.hadoop:hadoop-common:jar:2.6.0 from the shaded jar.
[INFO] Excluding xmlenc:xmlenc:jar:0.52 from the shaded jar.
[INFO] Excluding javax.servlet:servlet-api:jar:2.5 from the shaded jar.
[INFO] Excluding org.mortbay.jetty:jetty:jar:6.1.26 from the shaded jar.
[INFO] Excluding org.mortbay.jetty:jetty-util:jar:6.1.26 from the shaded jar.
[INFO] Excluding com.sun.jersey:jersey-core:jar:1.14 from the shaded jar.
[INFO] Excluding com.sun.jersey:jersey-json:jar:1.14 from the shaded jar.
[INFO] Excluding org.codehaus.jettison:jettison:jar:1.1 from the shaded jar.
[INFO] Excluding com.sun.xml.bind:jaxb-impl:jar:2.2.3-1 from the shaded jar.
[INFO] Excluding com.sun.jersey:jersey-server:jar:1.14 from the shaded jar.
[INFO] Excluding asm:asm:jar:3.1 from the shaded jar.
[INFO] Excluding tomcat:jasper-compiler:jar:5.5.23 from the shaded jar.
[INFO] Excluding tomcat:jasper-runtime:jar:5.5.23 from the shaded jar.
[INFO] Excluding javax.servlet.jsp:jsp-api:jar:2.1 from the shaded jar.
[INFO] Excluding commons-el:commons-el:jar:1.0 from the shaded jar.
[INFO] Excluding commons-configuration:commons-configuration:jar:1.6 from the 
shaded jar.
[INFO] Excluding commons-digester:commons-digester:jar:1.8 from the shaded jar.
[INFO] Excluding commons-beanutils:commons-beanutils:jar:1.7.0 from the shaded 
jar.
[INFO] Excluding commons-beanutils:commons-beanutils-core:jar:1.8.0 from the 
shaded jar.
[INFO] Excluding com.google.code.gson:gson:jar:2.2.4 from the shaded jar.
[INFO] Excluding org.apache.hadoop:hadoop-auth:jar:2.6.0 from the shaded jar.
[INFO] Excluding 
org.apache.directory.server:apacheds-kerberos-codec:jar:2.0.0-M15 from the 
shaded jar.
[INFO] Excluding org.apache.directory.server:apacheds-i18n:jar:2.0.0-M15 from 
the shaded jar.
[INFO] Excluding org.apache.directory.api:api-asn1-api:jar:1.0.0-M20 from the 
shaded jar.
[INFO] Excluding org.apache.directory.api:api-util:jar:1.0.0-M20 from the 
shaded jar.
[INFO] Excluding com.jcraft:jsch:jar:0.1.42 from the shaded jar.
[INFO] Excluding org.htrace:htrace-core:jar:3.0.4 from the shaded jar.
[INFO] Excluding org.apache.hadoop:hadoop-archives:jar:2.6.0 from the shaded 
jar.
[INFO] Excluding org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.6.0 from 
the shaded jar.
[INFO] Excluding com.google.inject.extensions:guice-servlet:jar:3.0 from the 
shaded jar.
[INFO] Excluding org.apache.hadoop:hadoop-yarn-server-common:jar:2.6.0 from the 
shaded jar.
[INFO] Excluding org.fusesource.leveldbjni:leveldbjni-all:jar:1.8 from the 
shaded jar.
[INFO] Excluding org.apache.hadoop:hadoop-hdfs:jar:2.6.0 from the shaded jar.
[INFO] Excluding commons-daemon:commons-daemon:jar:1.0.13 from the shaded jar.
[INFO] Excluding xerces:xercesImpl:jar:2.9.1 from the shaded jar.
[INFO] Excluding xml-apis:xml-apis:jar:1.3.04 from the shaded jar.
[INFO] Excluding

[jira] [Updated] (HIVE-9934) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password

2015-03-17 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9934:
--
Attachment: (was: HIVE-9934.4.patch)

 Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to 
 degrade the authentication mechanism to none, allowing authentication 
 without password
 --

 Key: HIVE-9934
 URL: https://issues.apache.org/jira/browse/HIVE-9934
 Project: Hive
  Issue Type: Bug
  Components: Security
Affects Versions: 1.1.0
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-9934.1.patch, HIVE-9934.2.patch, HIVE-9934.3.patch, 
 HIVE-9934.3.patch


 Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to 
 degrade the authentication mechanism to none, allowing authentication 
 without password.
 See: http://docs.oracle.com/javase/jndi/tutorial/ldap/security/simple.html
 “If you supply an empty string, an empty byte/char array, or null to the 
 Context.SECURITY_CREDENTIALS environment property, then the authentication 
 mechanism will be none. This is because the LDAP requires the password to 
 be nonempty for simple authentication. The protocol automatically converts 
 the authentication to none if a password is not supplied.”
  
 Since the LdapAuthenticationProviderImpl.Authenticate method is relying on a 
 NamingException being thrown during creation of initial context, it does not 
 fail when the context result is an “unauthenticated” positive response from 
 the LDAP server. The end result is, one can authenticate with HiveServer2 
 using the LdapAuthenticationProviderImpl with only a user name and an empty 
 password.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9991) Cannot do a SELECT on external tables that are on S3 due to Encryption error

2015-03-17 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366213#comment-14366213
 ] 

Xuefu Zhang commented on HIVE-9991:
---

[~spena], it seems the above failed test has a result diff. You might need to 
regenerate the test output.

 Cannot do a SELECT on external tables that are on S3 due to Encryption error
 

 Key: HIVE-9991
 URL: https://issues.apache.org/jira/browse/HIVE-9991
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9991.1.patch, HIVE-9991.2.patch, HIVE-9991.3.patch


 I cannot do any select query on external tables that are not part of HDFS. 
 For example S3.
 {code}
 Select * from my_table limit 10;
 FAILED: SemanticException Unable to determine if s3n://my-bucket/is 
 encrypted: java.lang.IllegalArgumentException: Wrong FS: s3n://my-bucket/, 
 expected: hdfs://0.0.0.0:8020
 {code}
 This error is due to a internal function that checks if a table is encrypted 
 or not. This is only supported on HDFS files, but the check is happening on 
 any external table as well causing the above error.
 To fix this, we should check for encrypted tables only for HDFS tables. And 
 skip the check for any other file schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others

2015-03-17 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365134#comment-14365134
 ] 

Xuefu Zhang commented on HIVE-7018:
---

Patch looks fine. However, I don't quite understand why we are also removing 
the following:
{code}
-  CONSTRAINT `PARTITIONS_FK2` FOREIGN KEY (`SD_ID`) REFERENCES `SDS` (`SD_ID`),
...
-  CONSTRAINT `TBLS_FK2` FOREIGN KEY (`DB_ID`) REFERENCES `DBS` (`DB_ID`),
{code}

This doesn't seem related to LINK_TARGET_ID.

 Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
 not others
 -

 Key: HIVE-7018
 URL: https://issues.apache.org/jira/browse/HIVE-7018
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Yongzhi Chen
 Attachments: HIVE-7018.1.patch


 It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
 column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]

2015-03-17 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365114#comment-14365114
 ] 

Xuefu Zhang commented on HIVE-9697:
---

It seems that we all agree that rawDataSize is more practical for Spark. Could 
anyone give a summary on if it's the default or how to make it as default? If 
code change is required, we can propose a patch here. Thanks.

 Hive on Spark is not as aggressive as MR on map join [Spark Branch]
 ---

 Key: HIVE-9697
 URL: https://issues.apache.org/jira/browse/HIVE-9697
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xin Hao

 We have a finding during running some Big-Bench cases:
 when the same small table size threshold is used, Map Join operator will not 
 be generated in Stage Plans for Hive on Spark, while will be generated for 
 Hive on MR.
 For example, When we run BigBench Q25, the meta info of one input ORC table 
 is as below:
 totalSize=1748955 (about 1.5M)
 rawDataSize=123050375 (about 120M)
 If we use the following parameter settings,
 set hive.auto.convert.join=true;
 set hive.mapjoin.smalltable.filesize=2500;
 set hive.auto.convert.join.noconditionaltask=true;
 set hive.auto.convert.join.noconditionaltask.size=1; (100M)
 Map Join will be enabled for Hive on MR mode, while will not be enabled for 
 Hive on Spark.
 We found that for Hive on MR, the HDFS file size for the table 
 (ContentSummary.getLength(), should approximate the value of ‘totalSize’) 
 will be used to compare with the threshold 100M (smaller than 100M), while 
 for Hive on Spark 'rawDataSize' will be used to compare with the threshold 
 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark 
 for this case. And as a result Hive on Spark will get much lower performance 
 data than Hive on MR for this case.
 When we set  hive.auto.convert.join.noconditionaltask.size=15000; (150M), 
 MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have 
 similar performance data with Hive on MR by then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]

2015-03-19 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370647#comment-14370647
 ] 

Xuefu Zhang commented on HIVE-9697:
---

Thanks, Rui/Chao. So here is what we recommend/conclude for Spark:

Spark prefers rawDataSize for map-join memory estimation. Thus, 
hive.stats.collect.rawdatasize should be set true, which is the default. If 
this configuration is set to false, then fileSize will be used instead for 
memory estimation, which may not be as accurate. 

Agree?

 Hive on Spark is not as aggressive as MR on map join [Spark Branch]
 ---

 Key: HIVE-9697
 URL: https://issues.apache.org/jira/browse/HIVE-9697
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xin Hao

 We have a finding during running some Big-Bench cases:
 when the same small table size threshold is used, Map Join operator will not 
 be generated in Stage Plans for Hive on Spark, while will be generated for 
 Hive on MR.
 For example, When we run BigBench Q25, the meta info of one input ORC table 
 is as below:
 totalSize=1748955 (about 1.5M)
 rawDataSize=123050375 (about 120M)
 If we use the following parameter settings,
 set hive.auto.convert.join=true;
 set hive.mapjoin.smalltable.filesize=2500;
 set hive.auto.convert.join.noconditionaltask=true;
 set hive.auto.convert.join.noconditionaltask.size=1; (100M)
 Map Join will be enabled for Hive on MR mode, while will not be enabled for 
 Hive on Spark.
 We found that for Hive on MR, the HDFS file size for the table 
 (ContentSummary.getLength(), should approximate the value of ‘totalSize’) 
 will be used to compare with the threshold 100M (smaller than 100M), while 
 for Hive on Spark 'rawDataSize' will be used to compare with the threshold 
 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark 
 for this case. And as a result Hive on Spark will get much lower performance 
 data than Hive on MR for this case.
 When we set  hive.auto.convert.join.noconditionaltask.size=15000; (150M), 
 MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have 
 similar performance data with Hive on MR by then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]

2015-03-19 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370647#comment-14370647
 ] 

Xuefu Zhang edited comment on HIVE-9697 at 3/20/15 3:30 AM:


Thanks, Rui/Chao. So here is what we recommend/conclude for Spark:
{quote}
Spark prefers rawDataSize for map-join memory estimation. Thus, 
hive.stats.collect.rawdatasize should be set true, which is the default. If 
this configuration is set to false, then fileSize will be used instead for 
memory estimation, which may not be as accurate. 
{quote}
Agree?


was (Author: xuefuz):
Thanks, Rui/Chao. So here is what we recommend/conclude for Spark:

Spark prefers rawDataSize for map-join memory estimation. Thus, 
hive.stats.collect.rawdatasize should be set true, which is the default. If 
this configuration is set to false, then fileSize will be used instead for 
memory estimation, which may not be as accurate. 

Agree?

 Hive on Spark is not as aggressive as MR on map join [Spark Branch]
 ---

 Key: HIVE-9697
 URL: https://issues.apache.org/jira/browse/HIVE-9697
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xin Hao

 We have a finding during running some Big-Bench cases:
 when the same small table size threshold is used, Map Join operator will not 
 be generated in Stage Plans for Hive on Spark, while will be generated for 
 Hive on MR.
 For example, When we run BigBench Q25, the meta info of one input ORC table 
 is as below:
 totalSize=1748955 (about 1.5M)
 rawDataSize=123050375 (about 120M)
 If we use the following parameter settings,
 set hive.auto.convert.join=true;
 set hive.mapjoin.smalltable.filesize=2500;
 set hive.auto.convert.join.noconditionaltask=true;
 set hive.auto.convert.join.noconditionaltask.size=1; (100M)
 Map Join will be enabled for Hive on MR mode, while will not be enabled for 
 Hive on Spark.
 We found that for Hive on MR, the HDFS file size for the table 
 (ContentSummary.getLength(), should approximate the value of ‘totalSize’) 
 will be used to compare with the threshold 100M (smaller than 100M), while 
 for Hive on Spark 'rawDataSize' will be used to compare with the threshold 
 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark 
 for this case. And as a result Hive on Spark will get much lower performance 
 data than Hive on MR for this case.
 When we set  hive.auto.convert.join.noconditionaltask.size=15000; (150M), 
 MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have 
 similar performance data with Hive on MR by then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]

2015-03-19 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370647#comment-14370647
 ] 

Xuefu Zhang edited comment on HIVE-9697 at 3/20/15 3:31 AM:


Thanks, Rui/Chao. So here is what we recommend/conclude for Spark:
{quote}
Spark prefers rawDataSize for map-join memory estimation. Thus, 
hive.stats.collect.rawdatasize should be set true, which is the default. If 
this configuration is set to false, then fileSize will be used instead for 
estimation, which may not be as accurate. 
{quote}
Agree?


was (Author: xuefuz):
Thanks, Rui/Chao. So here is what we recommend/conclude for Spark:
{quote}
Spark prefers rawDataSize for map-join memory estimation. Thus, 
hive.stats.collect.rawdatasize should be set true, which is the default. If 
this configuration is set to false, then fileSize will be used instead for 
memory estimation, which may not be as accurate. 
{quote}
Agree?

 Hive on Spark is not as aggressive as MR on map join [Spark Branch]
 ---

 Key: HIVE-9697
 URL: https://issues.apache.org/jira/browse/HIVE-9697
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xin Hao

 We have a finding during running some Big-Bench cases:
 when the same small table size threshold is used, Map Join operator will not 
 be generated in Stage Plans for Hive on Spark, while will be generated for 
 Hive on MR.
 For example, When we run BigBench Q25, the meta info of one input ORC table 
 is as below:
 totalSize=1748955 (about 1.5M)
 rawDataSize=123050375 (about 120M)
 If we use the following parameter settings,
 set hive.auto.convert.join=true;
 set hive.mapjoin.smalltable.filesize=2500;
 set hive.auto.convert.join.noconditionaltask=true;
 set hive.auto.convert.join.noconditionaltask.size=1; (100M)
 Map Join will be enabled for Hive on MR mode, while will not be enabled for 
 Hive on Spark.
 We found that for Hive on MR, the HDFS file size for the table 
 (ContentSummary.getLength(), should approximate the value of ‘totalSize’) 
 will be used to compare with the threshold 100M (smaller than 100M), while 
 for Hive on Spark 'rawDataSize' will be used to compare with the threshold 
 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark 
 for this case. And as a result Hive on Spark will get much lower performance 
 data than Hive on MR for this case.
 When we set  hive.auto.convert.join.noconditionaltask.size=15000; (150M), 
 MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have 
 similar performance data with Hive on MR by then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]

2015-03-19 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370669#comment-14370669
 ] 

Xuefu Zhang commented on HIVE-9697:
---

Yes. We should.

 Hive on Spark is not as aggressive as MR on map join [Spark Branch]
 ---

 Key: HIVE-9697
 URL: https://issues.apache.org/jira/browse/HIVE-9697
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xin Hao
  Labels: TODOC1.2

 We have a finding during running some Big-Bench cases:
 when the same small table size threshold is used, Map Join operator will not 
 be generated in Stage Plans for Hive on Spark, while will be generated for 
 Hive on MR.
 For example, When we run BigBench Q25, the meta info of one input ORC table 
 is as below:
 totalSize=1748955 (about 1.5M)
 rawDataSize=123050375 (about 120M)
 If we use the following parameter settings,
 set hive.auto.convert.join=true;
 set hive.mapjoin.smalltable.filesize=2500;
 set hive.auto.convert.join.noconditionaltask=true;
 set hive.auto.convert.join.noconditionaltask.size=1; (100M)
 Map Join will be enabled for Hive on MR mode, while will not be enabled for 
 Hive on Spark.
 We found that for Hive on MR, the HDFS file size for the table 
 (ContentSummary.getLength(), should approximate the value of ‘totalSize’) 
 will be used to compare with the threshold 100M (smaller than 100M), while 
 for Hive on Spark 'rawDataSize' will be used to compare with the threshold 
 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark 
 for this case. And as a result Hive on Spark will get much lower performance 
 data than Hive on MR for this case.
 When we set  hive.auto.convert.join.noconditionaltask.size=15000; (150M), 
 MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have 
 similar performance data with Hive on MR by then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]

2015-03-19 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9697:
--
Labels: TODOC1.2  (was: )

 Hive on Spark is not as aggressive as MR on map join [Spark Branch]
 ---

 Key: HIVE-9697
 URL: https://issues.apache.org/jira/browse/HIVE-9697
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xin Hao
  Labels: TODOC1.2

 We have a finding during running some Big-Bench cases:
 when the same small table size threshold is used, Map Join operator will not 
 be generated in Stage Plans for Hive on Spark, while will be generated for 
 Hive on MR.
 For example, When we run BigBench Q25, the meta info of one input ORC table 
 is as below:
 totalSize=1748955 (about 1.5M)
 rawDataSize=123050375 (about 120M)
 If we use the following parameter settings,
 set hive.auto.convert.join=true;
 set hive.mapjoin.smalltable.filesize=2500;
 set hive.auto.convert.join.noconditionaltask=true;
 set hive.auto.convert.join.noconditionaltask.size=1; (100M)
 Map Join will be enabled for Hive on MR mode, while will not be enabled for 
 Hive on Spark.
 We found that for Hive on MR, the HDFS file size for the table 
 (ContentSummary.getLength(), should approximate the value of ‘totalSize’) 
 will be used to compare with the threshold 100M (smaller than 100M), while 
 for Hive on Spark 'rawDataSize' will be used to compare with the threshold 
 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark 
 for this case. And as a result Hive on Spark will get much lower performance 
 data than Hive on MR for this case.
 When we set  hive.auto.convert.join.noconditionaltask.size=15000; (150M), 
 MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have 
 similar performance data with Hive on MR by then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]

2015-03-19 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-9697.
---
Resolution: Won't Fix

This should be just a doc fix, as discussed above.

 Hive on Spark is not as aggressive as MR on map join [Spark Branch]
 ---

 Key: HIVE-9697
 URL: https://issues.apache.org/jira/browse/HIVE-9697
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xin Hao
  Labels: TODOC-SPARK

 We have a finding during running some Big-Bench cases:
 when the same small table size threshold is used, Map Join operator will not 
 be generated in Stage Plans for Hive on Spark, while will be generated for 
 Hive on MR.
 For example, When we run BigBench Q25, the meta info of one input ORC table 
 is as below:
 totalSize=1748955 (about 1.5M)
 rawDataSize=123050375 (about 120M)
 If we use the following parameter settings,
 set hive.auto.convert.join=true;
 set hive.mapjoin.smalltable.filesize=2500;
 set hive.auto.convert.join.noconditionaltask=true;
 set hive.auto.convert.join.noconditionaltask.size=1; (100M)
 Map Join will be enabled for Hive on MR mode, while will not be enabled for 
 Hive on Spark.
 We found that for Hive on MR, the HDFS file size for the table 
 (ContentSummary.getLength(), should approximate the value of ‘totalSize’) 
 will be used to compare with the threshold 100M (smaller than 100M), while 
 for Hive on Spark 'rawDataSize' will be used to compare with the threshold 
 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark 
 for this case. And as a result Hive on Spark will get much lower performance 
 data than Hive on MR for this case.
 When we set  hive.auto.convert.join.noconditionaltask.size=15000; (150M), 
 MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have 
 similar performance data with Hive on MR by then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]

2015-03-19 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370615#comment-14370615
 ] 

Xuefu Zhang commented on HIVE-9697:
---

Can we put a closure on this? Basically we'd like to confirm/undersand:
1. MR always use file size.
2. Spark should always use rawDataSize. If this is the case, what configs need 
to be set so as to make rawDataSize available. What happens if it's not 
available.

Thanks,
Xuefu

 Hive on Spark is not as aggressive as MR on map join [Spark Branch]
 ---

 Key: HIVE-9697
 URL: https://issues.apache.org/jira/browse/HIVE-9697
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xin Hao

 We have a finding during running some Big-Bench cases:
 when the same small table size threshold is used, Map Join operator will not 
 be generated in Stage Plans for Hive on Spark, while will be generated for 
 Hive on MR.
 For example, When we run BigBench Q25, the meta info of one input ORC table 
 is as below:
 totalSize=1748955 (about 1.5M)
 rawDataSize=123050375 (about 120M)
 If we use the following parameter settings,
 set hive.auto.convert.join=true;
 set hive.mapjoin.smalltable.filesize=2500;
 set hive.auto.convert.join.noconditionaltask=true;
 set hive.auto.convert.join.noconditionaltask.size=1; (100M)
 Map Join will be enabled for Hive on MR mode, while will not be enabled for 
 Hive on Spark.
 We found that for Hive on MR, the HDFS file size for the table 
 (ContentSummary.getLength(), should approximate the value of ‘totalSize’) 
 will be used to compare with the threshold 100M (smaller than 100M), while 
 for Hive on Spark 'rawDataSize' will be used to compare with the threshold 
 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark 
 for this case. And as a result Hive on Spark will get much lower performance 
 data than Hive on MR for this case.
 When we set  hive.auto.convert.join.noconditionaltask.size=15000; (150M), 
 MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have 
 similar performance data with Hive on MR by then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10006) RSC has memory leak while execute multi queries.[Spark Branch]

2015-03-20 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371009#comment-14371009
 ] 

Xuefu Zhang commented on HIVE-10006:


Re: Besides, is this ThreadLocal MapWork/ReduceWork cache new introduced 
optimization?

Yes, it's introduced in HIVE-9127. Looks like we need to be careful about this 
threadlocal map, indeed.

 RSC has memory leak while execute multi queries.[Spark Branch]
 --

 Key: HIVE-10006
 URL: https://issues.apache.org/jira/browse/HIVE-10006
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 1.1.0
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Critical
  Labels: Spark-M5
 Attachments: HIVE-10006.1-spark.patch, HIVE-10006.2-spark.patch, 
 HIVE-10006.2-spark.patch, HIVE-10006.3-spark.patch, HIVE-10006.4-spark.patch, 
 HIVE-10006.5-spark.patch, HIVE-10006.6-spark.patch, HIVE-10006.7-spark.patch


 While execute query with RSC, MapWork/ReduceWork number is increased all the 
 time, and lead to OOM at the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10006) RSC has memory leak while execute multi queries.[Spark Branch]

2015-03-21 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372803#comment-14372803
 ] 

Xuefu Zhang commented on HIVE-10006:


+1 for patch #8. One nit, It would be great if we can put a similar comment on 
changes in SparkPlanGenerator.java. Also, we can create a JIRA for 
HiveInputFormat to track the issue, but no fix is necessary at the moment.

 RSC has memory leak while execute multi queries.[Spark Branch]
 --

 Key: HIVE-10006
 URL: https://issues.apache.org/jira/browse/HIVE-10006
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 1.1.0
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Critical
  Labels: Spark-M5
 Attachments: HIVE-10006.1-spark.patch, HIVE-10006.2-spark.patch, 
 HIVE-10006.2-spark.patch, HIVE-10006.3-spark.patch, HIVE-10006.4-spark.patch, 
 HIVE-10006.5-spark.patch, HIVE-10006.6-spark.patch, HIVE-10006.7-spark.patch, 
 HIVE-10006.8-spark.patch


 While execute query with RSC, MapWork/ReduceWork number is increased all the 
 time, and lead to OOM at the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others

2015-03-19 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369337#comment-14369337
 ] 

Xuefu Zhang commented on HIVE-7018:
---

[~ctang.ma], what's your thought on the latest patch?

 Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
 not others
 -

 Key: HIVE-7018
 URL: https://issues.apache.org/jira/browse/HIVE-7018
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Yongzhi Chen
 Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch


 It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
 column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10017) SparkTask log improvement [Spark Branch]

2015-03-19 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369288#comment-14369288
 ] 

Xuefu Zhang commented on HIVE-10017:


+1

 SparkTask log improvement [Spark Branch]
 

 Key: HIVE-10017
 URL: https://issues.apache.org/jira/browse/HIVE-10017
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
Priority: Minor
 Fix For: spark-branch

 Attachments: HIVE-10017.1-spark.patch


 Initialize log object in the own class for better log message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9934) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password

2015-03-19 Thread Xuefu Zhang (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370174#comment-14370174
]

Xuefu Zhang commented on HIVE-9934:
---

Apache has special guidelines regarding security vulnerabilities. Here is the
link:

http://www.apache.org/security/committers

We are all new to this, so what we have done so far may not comply to this.
However, we should try to do so from now on.

For doc, please also refer to the document.

AS to the vulnerability, discussion is still ongoing in the community. Thus, we
will act based on the conclusions.

Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to
degrade the authentication mechanism to none, allowing authentication
without password
--

Key: HIVE-9934
URL: https://issues.apache.org/jira/browse/HIVE-9934
Project: Hive
Issue Type: Bug
Components: Security
Affects Versions: 1.1.0
Reporter: Chao
Assignee: Chao
Fix For: 1.2.0

Attachments: HIVE-9934.1.patch, HIVE-9934.2.patch, HIVE-9934.3.patch,
HIVE-9934.3.patch

Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to
degrade the authentication mechanism to none, allowing authentication
without password.
See: http://docs.oracle.com/javase/jndi/tutorial/ldap/security/simple.html
“If you supply an empty string, an empty byte/char array, or null to the
Context.SECURITY_CREDENTIALS environment property, then the authentication
mechanism will be none. This is because the LDAP requires the password to
be nonempty for simple authentication. The protocol automatically converts
the authentication to none if a password is not supplied.”

Since the LdapAuthenticationProviderImpl.Authenticate method is relying on a
NamingException being thrown during creation of initial context, it does not
fail when the context result is an “unauthenticated” positive response from
the LDAP server. The end result is, one can authenticate with HiveServer2
using the LdapAuthenticationProviderImpl with only a user name and an empty
password.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9990) TestMultiSessionsHS2WithLocalClusterSpark is failing

2015-03-17 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9990:
--
Description: 
At least sometimes. I can reproduce it with mvn test 
-Dtest=TestMultiSessionsHS2WithLocalClusterSpark -Phadoop-2 consistently on my 
local box (both trunk and spark branch).
{code}
---
 T E S T S
---
Running org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 92.438 sec  
FAILURE! - in org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark
testSparkQuery(org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark)  
Time elapsed: 21.514 sec   ERROR!
java.util.concurrent.ExecutionException: java.sql.SQLException: Error while 
processing statement: FAILED: Execution Error, return code 3 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296)
at 
org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)
at 
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.verifyResult(TestMultiSessionsHS2WithLocalClusterSpark.java:244)
at 
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testKvQuery(TestMultiSessionsHS2WithLocalClusterSpark.java:220)
at 
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.access$000(TestMultiSessionsHS2WithLocalClusterSpark.java:53)
{code}

The error was also seen in HIVE-9934 test run.

  was:
At least sometimes. I can reproduce it with mvn test 
-Dtest=TestMultiSessionsHS2WithLocalClusterSpark -Phadoop-2 consistently on my 
local box.
{code}
---
 T E S T S
---
Running org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 92.438 sec  
FAILURE! - in org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark
testSparkQuery(org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark)  
Time elapsed: 21.514 sec   ERROR!
java.util.concurrent.ExecutionException: java.sql.SQLException: Error while 
processing statement: FAILED: Execution Error, return code 3 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296)
at 
org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)
at 
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.verifyResult(TestMultiSessionsHS2WithLocalClusterSpark.java:244)
at 
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testKvQuery(TestMultiSessionsHS2WithLocalClusterSpark.java:220)
at 
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.access$000(TestMultiSessionsHS2WithLocalClusterSpark.java:53)
{code}

The error was also seen in HIVE-9934 test run.


 TestMultiSessionsHS2WithLocalClusterSpark is failing
 

 Key: HIVE-9990
 URL: https://issues.apache.org/jira/browse/HIVE-9990
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 1.2.0
Reporter: Xuefu Zhang

 At least sometimes. I can reproduce it with mvn test 
 -Dtest=TestMultiSessionsHS2WithLocalClusterSpark -Phadoop-2 consistently on 
 my local box (both trunk and spark branch).
 {code}
 ---
  T E S T S
 ---
 Running org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark
 Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 92.438 sec 
  FAILURE! - in 
 org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark
 testSparkQuery(org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark)
   Time elapsed: 21.514 sec   ERROR!
 java.util.concurrent.ExecutionException: java.sql.SQLException: Error while 
 processing statement: FAILED: Execution Error, return code 3 from 
 org.apache.hadoop.hive.ql.exec.spark.SparkTask
   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296)
   at 
 org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)
   at 
 org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.verifyResult(TestMultiSessionsHS2WithLocalClusterSpark.java:244)
   at 
 org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testKvQuery(TestMultiSessionsHS2WithLocalClusterSpark.java:220)
   at 
 org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.access$000(TestMultiSessionsHS2WithLocalClusterSpark.java:53)
 {code}
 The error was also seen in HIVE-9934 test run.



--
This message was sent by

[jira] [Updated] (HIVE-9934) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password

2015-03-17 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9934:
--
Attachment: HIVE-9934.3.patch

Attached the same patch for another test run.

 Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to 
 degrade the authentication mechanism to none, allowing authentication 
 without password
 --

 Key: HIVE-9934
 URL: https://issues.apache.org/jira/browse/HIVE-9934
 Project: Hive
  Issue Type: Bug
  Components: Security
Affects Versions: 1.1.0
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-9934.1.patch, HIVE-9934.2.patch, HIVE-9934.3.patch, 
 HIVE-9934.3.patch


 Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to 
 degrade the authentication mechanism to none, allowing authentication 
 without password.
 See: http://docs.oracle.com/javase/jndi/tutorial/ldap/security/simple.html
 “If you supply an empty string, an empty byte/char array, or null to the 
 Context.SECURITY_CREDENTIALS environment property, then the authentication 
 mechanism will be none. This is because the LDAP requires the password to 
 be nonempty for simple authentication. The protocol automatically converts 
 the authentication to none if a password is not supplied.”
  
 Since the LdapAuthenticationProviderImpl.Authenticate method is relying on a 
 NamingException being thrown during creation of initial context, it does not 
 fail when the context result is an “unauthenticated” positive response from 
 the LDAP server. The end result is, one can authenticate with HiveServer2 
 using the LdapAuthenticationProviderImpl with only a user name and an empty 
 password.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9302) Beeline add commands to register local jdbc driver names and jars

2015-03-06 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351207#comment-14351207
 ] 

Xuefu Zhang commented on HIVE-9302:
---

Thank you, [~leftylev].

 Beeline add commands to register local jdbc driver names and jars
 -

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.2.patch, HIVE-9302.3.patch, HIVE-9302.3.patch, HIVE-9302.4.patch, 
 HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jdbc 
 driver jars and register custom jdbc driver names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9889) Merge trunk to Spark branch 3/6/2015 [Spark Branch]

2015-03-09 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9889:
--
Attachment: HIVE-9889.2-spark.patch

Regenerate the patch since some patches were merged individually.

 Merge trunk to Spark branch 3/6/2015 [Spark Branch]
 ---

 Key: HIVE-9889
 URL: https://issues.apache.org/jira/browse/HIVE-9889
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-9889.1-spark.patch, HIVE-9889.2-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9871) Print spark job id in history file [spark branch]

2015-03-09 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353468#comment-14353468
 ] 

Xuefu Zhang commented on HIVE-9871:
---

[~chinnalalam], thanks for working on this. Patch looks good, but I'm wondering 
if you can come up with a better name for the private method added. Something 
like recordJobId() or addToHistory(), etc.

 Print spark job id in history file [spark branch]
 -

 Key: HIVE-9871
 URL: https://issues.apache.org/jira/browse/HIVE-9871
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-9871.1-spark.patch


 Maintain the spark job id in history file for the corresponding queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]

2015-03-09 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354331#comment-14354331
 ] 

Xuefu Zhang commented on HIVE-9659:
---

[~ruili], let's create a JIRA for MR and move on. We enable the test only for 
Spark.

 'Error while trying to create table container' occurs during hive query case 
 execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
 ---

 Key: HIVE-9659
 URL: https://issues.apache.org/jira/browse/HIVE-9659
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xin Hao
Assignee: Rui Li
 Attachments: HIVE-9659.1-spark.patch, HIVE-9659.2-spark.patch, 
 HIVE-9659.3-spark.patch


 We found that 'Error while trying to create table container'  occurs during 
 Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'.
 If hive.optimize.skewjoin set to 'false', the case could pass.
 How to reproduce:
 1. set hive.optimize.skewjoin=true;
 2. Run BigBench case Q12 and it will fail. 
 Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you 
 will found error 'Error while trying to create table container' in the log 
 and also a NullPointerException near the end of the log.
 (a) Detail error message for 'Error while trying to create table container':
 {noformat}
 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
   at 
 org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while 
 trying to create table container
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115)
   ... 21 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a 
 directory: 
 hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106)
   ... 22 more
 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480
 15/02/12 01:29:49 INFO PerfLogger: PERFLOG method=SparkInitializeOperators

[jira] [Commented] (HIVE-9569) Enable more unit tests for UNION ALL [Spark Branch]

2015-03-09 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354339#comment-14354339
 ] 

Xuefu Zhang commented on HIVE-9569:
---

+1

 Enable more unit tests for UNION ALL [Spark Branch]
 ---

 Key: HIVE-9569
 URL: https://issues.apache.org/jira/browse/HIVE-9569
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-9569.1-spark.patch, HIVE-9569.1.patch, 
 HIVE-9569.2.patch, HIVE-9569.3.patch, HIVE-9569.4.patch, HIVE-9569.5.patch


 Currently, we only enabled a subset of all the union tests. We should try to 
 enable the rest, and see if there's any issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9924) Add SORT_QUERY_RESULTS to union12.q

2015-03-11 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357035#comment-14357035
 ] 

Xuefu Zhang commented on HIVE-9924:
---

Yes. Let's fix for Spark branch first.

 Add SORT_QUERY_RESULTS to union12.q
 ---

 Key: HIVE-9924
 URL: https://issues.apache.org/jira/browse/HIVE-9924
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor
 Attachments: HIVE-9924.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9516) Enable CBO related tests [Spark Branch]

2015-03-12 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355347#comment-14355347
 ] 

Xuefu Zhang commented on HIVE-9516:
---

+1

 Enable CBO related tests [Spark Branch]
 ---

 Key: HIVE-9516
 URL: https://issues.apache.org/jira/browse/HIVE-9516
 Project: Hive
  Issue Type: Sub-task
  Components: spark-branch
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chinna Rao Lalam
 Attachments: HIVE-9516.1-spark.patch, HIVE-9516.2-spark.patch, 
 HIVE-9516.3-spark.patch


 In Spark branch we enabled CBO, but hasn't turned on CBO related unit tests. 
 We should do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9871) Print spark job id in history file [spark branch]

2015-03-12 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355369#comment-14355369
 ] 

Xuefu Zhang commented on HIVE-9871:
---

+1

 Print spark job id in history file [spark branch]
 -

 Key: HIVE-9871
 URL: https://issues.apache.org/jira/browse/HIVE-9871
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Attachments: HIVE-9871.1-spark.patch, HIVE-9871.2-spark.patch


 Maintain the spark job id in history file for the corresponding queries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9813) Hive JDBC - DatabaseMetaData.getColumns method cannot find classes added with add jar command

2015-03-11 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357697#comment-14357697
 ] 

Xuefu Zhang commented on HIVE-9813:
---

+1

 Hive JDBC - DatabaseMetaData.getColumns method cannot find classes added with 
 add jar command
 ---

 Key: HIVE-9813
 URL: https://issues.apache.org/jira/browse/HIVE-9813
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-9813.1.patch, HIVE-9813.3.patch


 Execute following JDBC client program:
 {code}
 import java.sql.*;
 public class TestAddJar {
 private static Connection makeConnection(String connString, String 
 classPath) throws ClassNotFoundException, SQLException
 {
 System.out.println(Current Connection info: + connString);
 Class.forName(classPath);
 System.out.println(Current driver info: + classPath);
 return DriverManager.getConnection(connString);
 }
 public static void main(String[] args)
 {
 if(2 != args.length)
 {
 System.out.println(Two arguments needed: connection string, path 
 to jar to be added (include jar name));
 System.out.println(Example: java -jar TestApp.jar 
 jdbc:hive2://192.168.111.111 /tmp/json-serde-1.3-jar-with-dependencies.jar);
 return;
 }
 Connection conn;
 try
 {
 conn = makeConnection(args[0], org.apache.hive.jdbc.HiveDriver);
 
 System.out.println(---);
 System.out.println(DONE);
 
 System.out.println(---);
 System.out.println(Execute query: add jar  + args[1] + ;);
 Statement stmt = conn.createStatement();
 int c = stmt.executeUpdate(add jar  + args[1]);
 System.out.println(Returned value is: [ + c + ]\n);
 
 System.out.println(---);
 final String createTableQry = Create table if not exists 
 json_test(id int, content string)  +
 row format serde 'org.openx.data.jsonserde.JsonSerDe';
 System.out.println(Execute query: + createTableQry + ;);
 stmt.execute(createTableQry);
 
 System.out.println(---);
 System.out.println(getColumn() 
 Call---\n);
 DatabaseMetaData md = conn.getMetaData();
 System.out.println(Test get all column in a schema:);
 ResultSet rs = md.getColumns(Hive, default, json_test, 
 null);
 while (rs.next()) {
 System.out.println(rs.getString(1));
 }
 conn.close();
 }
 catch (ClassNotFoundException e)
 {
 e.printStackTrace();
 }
 catch (SQLException e)
 {
 e.printStackTrace();
 }
 }
 }
 {code}
 Get Exception, and from metastore log:
 7:41:30.316 PMERROR   hive.log
 error in initSerDe: java.lang.ClassNotFoundException Class 
 org.openx.data.jsonserde.JsonSerDe not found
 java.lang.ClassNotFoundException: Class org.openx.data.jsonserde.JsonSerDe 
 not found
 at 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1803)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:183)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_fields(HiveMetaStore.java:2487)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_schema(HiveMetaStore.java:2542)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
 at com.sun.proxy.$Proxy5.get_schema(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema.getResult(ThriftHiveMetastore.java:6425)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema.getResult(ThriftHiveMetastore.java:6409)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
 at

[jira] [Commented] (HIVE-9916) Fix TestSparkSessionManagerImpl [Spark Branch]

2015-03-11 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357739#comment-14357739
 ] 

Xuefu Zhang commented on HIVE-9916:
---

+1

 Fix TestSparkSessionManagerImpl [Spark Branch]
 --

 Key: HIVE-9916
 URL: https://issues.apache.org/jira/browse/HIVE-9916
 Project: Hive
  Issue Type: Bug
  Components: spark-branch
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-9916.1-spark.patch, HIVE-9916.2-spark.patch


 Looks like in HIVE-9872, wrong patch is committed, and therefore 
 TestSparkSessionManagerImpl will still fail. This JIRA should fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9924) Add SORT_QUERY_RESULTS to union12.q [Spark Branch]

2015-03-11 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9924:
--
Component/s: Spark

 Add SORT_QUERY_RESULTS to union12.q [Spark Branch]
 --

 Key: HIVE-9924
 URL: https://issues.apache.org/jira/browse/HIVE-9924
 Project: Hive
  Issue Type: Test
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor
 Attachments: HIVE-9924.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9924) Add SORT_QUERY_RESULTS to union12.q [Spark Branch]

2015-03-11 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9924:
--
Summary: Add SORT_QUERY_RESULTS to union12.q [Spark Branch]  (was: Add 
SORT_QUERY_RESULTS to union12.q)

 Add SORT_QUERY_RESULTS to union12.q [Spark Branch]
 --

 Key: HIVE-9924
 URL: https://issues.apache.org/jira/browse/HIVE-9924
 Project: Hive
  Issue Type: Test
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor
 Attachments: HIVE-9924.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-9916) Fix TestSparkSessionManagerImpl [Spark Branch]

2015-03-11 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357739#comment-14357739
 ] 

Xuefu Zhang edited comment on HIVE-9916 at 3/11/15 10:43 PM:
-

+1

The union related test failures will be addressed in HIVE-9924.


was (Author: xuefuz):
+1

 Fix TestSparkSessionManagerImpl [Spark Branch]
 --

 Key: HIVE-9916
 URL: https://issues.apache.org/jira/browse/HIVE-9916
 Project: Hive
  Issue Type: Bug
  Components: spark-branch
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-9916.1-spark.patch, HIVE-9916.2-spark.patch


 Looks like in HIVE-9872, wrong patch is committed, and therefore 
 TestSparkSessionManagerImpl will still fail. This JIRA should fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9924) Add SORT_QUERY_RESULTS to union12.q [Spark Branch]

2015-03-11 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357746#comment-14357746
 ] 

Xuefu Zhang commented on HIVE-9924:
---

We need to address the two union-related test failures. The other two will be 
fixed in HIVE-9916.

 Add SORT_QUERY_RESULTS to union12.q [Spark Branch]
 --

 Key: HIVE-9924
 URL: https://issues.apache.org/jira/browse/HIVE-9924
 Project: Hive
  Issue Type: Test
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor
 Attachments: HIVE-9924.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9939) Code cleanup for redundant if check in ExplainTask [Spark Branch]

2015-03-12 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9939:
--
Component/s: Spark
Summary: Code cleanup for redundant if check in ExplainTask [Spark 
Branch]  (was: Code cleanup for redundant if check in ExplainTask)

 Code cleanup for redundant if check in ExplainTask [Spark Branch]
 -

 Key: HIVE-9939
 URL: https://issues.apache.org/jira/browse/HIVE-9939
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: spark-branch

 Attachments: HIVE-9939.1-spark.patch


 ExplainTask.execute() method have redundant if check.
 Same applicable for trunk also..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9939) Code cleanup for redundant if check in ExplainTask [Spark Branch]

2015-03-12 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358596#comment-14358596
 ] 

Xuefu Zhang commented on HIVE-9939:
---

+1

 Code cleanup for redundant if check in ExplainTask [Spark Branch]
 -

 Key: HIVE-9939
 URL: https://issues.apache.org/jira/browse/HIVE-9939
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: spark-branch

 Attachments: HIVE-9939.1-spark.patch


 ExplainTask.execute() method have redundant if check.
 Same applicable for trunk also..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-9882) Add jar/file doesn't work with yarn-cluster mode [Spark Branch]

2015-03-06 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350734#comment-14350734
 ] 

Xuefu Zhang edited comment on HIVE-9882 at 3/6/15 7:07 PM:
---

+1. I have a related question on RB, whose answer doesn't block this.


was (Author: xuefuz):
+1. I have a related question on RB.

 Add jar/file doesn't work with yarn-cluster mode [Spark Branch]
 ---

 Key: HIVE-9882
 URL: https://issues.apache.org/jira/browse/HIVE-9882
 Project: Hive
  Issue Type: Sub-task
  Components: Hive, spark-branch
Affects Versions: spark-branch
Reporter: Xiaomin Zhang
Assignee: Rui Li
 Attachments: HIVE-9882.1-spark.patch, HIVE-9882.1.patch


 It seems current fix for HIVE-9425 only uploads the Jar/Files to HDFS, 
 however, they are not accessible by the Driver/Executor.
 I found below in the AM log:
 {noformat}
 15/02/26 15:10:36 INFO Configuration.deprecation: mapred.min.split.size is 
 deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added 
 jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/hive-exec-1.2.0-SNAPSHOT.jar]
  to classpath.
 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added 
 jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/opennlp-maxent-3.0.3.jar]
  to classpath.
 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added 
 jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/bigbenchqueriesmr.jar]
  to classpath.
 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added 
 jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/opennlp-tools-1.5.3.jar]
  to classpath.
 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added 
 jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/jcl-over-slf4j-1.7.5.jar]
  to classpath.
 15/02/26 15:10:36 INFO client.RemoteDriver: Failed to run job 
 6886df05-f430-456c-a0ff-c7621db712d6
 org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
 class: de.bankmark.bigbench.queries.q10.SentimentUDF 
 {noformat}
 As above shows, the file path which was attempted to add to Classpath is 
 invalid, so actually all uploaded Jars/Files are still not available for 
 Driver/Executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9882) Add jar/file doesn't work with yarn-cluster mode [Spark Branch]

2015-03-06 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350734#comment-14350734
 ] 

Xuefu Zhang commented on HIVE-9882:
---

+1. I have a related question on RB.

 Add jar/file doesn't work with yarn-cluster mode [Spark Branch]
 ---

 Key: HIVE-9882
 URL: https://issues.apache.org/jira/browse/HIVE-9882
 Project: Hive
  Issue Type: Sub-task
  Components: Hive, spark-branch
Affects Versions: spark-branch
Reporter: Xiaomin Zhang
Assignee: Rui Li
 Attachments: HIVE-9882.1-spark.patch, HIVE-9882.1.patch


 It seems current fix for HIVE-9425 only uploads the Jar/Files to HDFS, 
 however, they are not accessible by the Driver/Executor.
 I found below in the AM log:
 {noformat}
 15/02/26 15:10:36 INFO Configuration.deprecation: mapred.min.split.size is 
 deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added 
 jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/hive-exec-1.2.0-SNAPSHOT.jar]
  to classpath.
 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added 
 jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/opennlp-maxent-3.0.3.jar]
  to classpath.
 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added 
 jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/bigbenchqueriesmr.jar]
  to classpath.
 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added 
 jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/opennlp-tools-1.5.3.jar]
  to classpath.
 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added 
 jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/jcl-over-slf4j-1.7.5.jar]
  to classpath.
 15/02/26 15:10:36 INFO client.RemoteDriver: Failed to run job 
 6886df05-f430-456c-a0ff-c7621db712d6
 org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
 class: de.bankmark.bigbench.queries.q10.SentimentUDF 
 {noformat}
 As above shows, the file path which was attempted to add to Classpath is 
 invalid, so actually all uploaded Jars/Files are still not available for 
 Driver/Executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9302) Beeline add commands to register local jdbc driver names and jars

2015-03-07 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351657#comment-14351657
 ] 

Xuefu Zhang commented on HIVE-9302:
---

[~Ferd], I think I didn't check in the jar files. Could you please specify 
which jar(s) you need and the locations? Thanks.

 Beeline add commands to register local jdbc driver names and jars
 -

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.2.patch, HIVE-9302.3.patch, HIVE-9302.3.patch, HIVE-9302.4.patch, 
 HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jdbc 
 driver jars and register custom jdbc driver names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9302) Beeline add commands to register local jdbc driver names and jars

2015-03-07 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351774#comment-14351774
 ] 

Xuefu Zhang commented on HIVE-9302:
---

These two jar files are added to the trunk.

 Beeline add commands to register local jdbc driver names and jars
 -

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.2.patch, HIVE-9302.3.patch, HIVE-9302.3.patch, HIVE-9302.4.patch, 
 HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jdbc 
 driver jars and register custom jdbc driver names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9961) HookContext for view should return a table type of VIRTUAL_VIEW

2015-03-13 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360837#comment-14360837
 ] 

Xuefu Zhang commented on HIVE-9961:
---

+1 pending on test.

 HookContext for view should return a table type of VIRTUAL_VIEW
 ---

 Key: HIVE-9961
 URL: https://issues.apache.org/jira/browse/HIVE-9961
 Project: Hive
  Issue Type: Bug
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-9961.patch


 Run a 'create view' statement.
 The view entity (which is in the hook's outputs) has a table with tableType 
 'MANAGED_TABLE').  It should be of type 'VIRTUAL_VIEW' so that auditing tools 
 can correctly identify it as a view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]

2015-03-10 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356155#comment-14356155
 ] 

Xuefu Zhang commented on HIVE-9659:
---

HIVE-9918 is resolved. [~lirui], could you reattach the patch to have another 
test run? Thanks.

 'Error while trying to create table container' occurs during hive query case 
 execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
 ---

 Key: HIVE-9659
 URL: https://issues.apache.org/jira/browse/HIVE-9659
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xin Hao
Assignee: Rui Li
 Attachments: HIVE-9659.1-spark.patch, HIVE-9659.2-spark.patch, 
 HIVE-9659.3-spark.patch, HIVE-9659.4-spark.patch


 We found that 'Error while trying to create table container'  occurs during 
 Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'.
 If hive.optimize.skewjoin set to 'false', the case could pass.
 How to reproduce:
 1. set hive.optimize.skewjoin=true;
 2. Run BigBench case Q12 and it will fail. 
 Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you 
 will found error 'Error while trying to create table container' in the log 
 and also a NullPointerException near the end of the log.
 (a) Detail error message for 'Error while trying to create table container':
 {noformat}
 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
   at 
 org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while 
 trying to create table container
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115)
   ... 21 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a 
 directory: 
 hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106)
   ... 22 more
 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480
 15/02/12 01:29:49 INFO PerfLogger: PERFLOG

[jira] [Commented] (HIVE-9828) Semantic analyzer does not capture view parent entity for tables referred in view with union all

2015-03-10 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355173#comment-14355173
 ] 

Xuefu Zhang commented on HIVE-9828:
---

+1

 Semantic analyzer does not capture view parent entity for tables referred in 
 view with union all 
 -

 Key: HIVE-9828
 URL: https://issues.apache.org/jira/browse/HIVE-9828
 Project: Hive
  Issue Type: Bug
  Components: Parser
Affects Versions: 1.1.0
Reporter: Prasad Mujumdar
 Fix For: 1.2.0

 Attachments: HIVE-9828.1-npf.patch


 Hive compiler adds tables used in a view definition in the input entity list, 
 with the view as parent entity for the table.
 In case of a view with union all query, this is not being done property. For 
 example,
 {noformat}
 create view view1 as select t.id from (select tab1.id from db.tab1 union all 
 select tab2.id from db.tab2 ) t;
 {noformat}
 This query will capture tab1 and tab2 as read entity without view1 as parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9918) Spark branch build is failing due to unknown url [Spark Branch]

2015-03-10 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9918:
--
Summary: Spark branch build is failing due to unknown url [Spark Branch]  
(was: Spark branch build is failing due to unknown url)

 Spark branch build is failing due to unknown url [Spark Branch]
 ---

 Key: HIVE-9918
 URL: https://issues.apache.org/jira/browse/HIVE-9918
 Project: Hive
  Issue Type: Bug
  Components: Spark, spark-branch
Reporter: Sergio Peña
Assignee: Sergio Peña
Priority: Blocker
 Attachments: HIVE-9918.1-spark.patch, HIVE-9918.1.patch


 Spark branch is failing due to an URL that does not exist anymore. This is 
 URL contains all spark jars used to build.
 The spark jars versions are not on the official maven repository.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9956) use BigDecimal.valueOf instead of new in TestFileDump

2015-03-13 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360408#comment-14360408
 ] 

Xuefu Zhang commented on HIVE-9956:
---

+1

 use BigDecimal.valueOf instead of new in TestFileDump
 -

 Key: HIVE-9956
 URL: https://issues.apache.org/jira/browse/HIVE-9956
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-9956.1.patch


 TestFileDump builds data row where one of the column is BigDecimal
 The test adds value 2.
 There are 2 ways to create BigDecimal object.
 1. use new
 2. use valueOf
 in this particular case 
 1. new will create 2.222153
 2. valueOf will use the canonical String representation and the result will 
 be 2.
 Probably we should use valueOf to create BigDecimal object
 TestTimestampWritable and TestHCatStores use valueOf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9957) Hive 1.1.0 not compatible with Hadoop 2.4.0

2015-03-13 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360386#comment-14360386
 ] 

Xuefu Zhang commented on HIVE-9957:
---

cc: [~spena]

 Hive 1.1.0 not compatible with Hadoop 2.4.0
 ---

 Key: HIVE-9957
 URL: https://issues.apache.org/jira/browse/HIVE-9957
 Project: Hive
  Issue Type: Bug
  Components: Encryption
Reporter: Vivek Shrivastava

 Getting this exception while accessing data through Hive. 
 Exception in thread main java.lang.NoSuchMethodError: 
 org.apache.hadoop.hdfs.DFSClient.getKeyProvider()Lorg/apache/hadoop/crypto/key/KeyProvider;
 at 
 org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.init(Hadoop23Shims.java:1152)
 at 
 org.apache.hadoop.hive.shims.Hadoop23Shims.createHdfsEncryptionShim(Hadoop23Shims.java:1279)
 at 
 org.apache.hadoop.hive.ql.session.SessionState.getHdfsEncryptionShim(SessionState.java:392)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1756)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1875)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1689)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1427)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10132)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9813) Hive JDBC - DatabaseMetaData.getColumns method cannot find classes added with add jar command

2015-03-12 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9813:
--
Labels: TODOC1.2  (was: )

 Hive JDBC - DatabaseMetaData.getColumns method cannot find classes added with 
 add jar command
 ---

 Key: HIVE-9813
 URL: https://issues.apache.org/jira/browse/HIVE-9813
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-9813.1.patch, HIVE-9813.3.patch


 Execute following JDBC client program:
 {code}
 import java.sql.*;
 public class TestAddJar {
 private static Connection makeConnection(String connString, String 
 classPath) throws ClassNotFoundException, SQLException
 {
 System.out.println(Current Connection info: + connString);
 Class.forName(classPath);
 System.out.println(Current driver info: + classPath);
 return DriverManager.getConnection(connString);
 }
 public static void main(String[] args)
 {
 if(2 != args.length)
 {
 System.out.println(Two arguments needed: connection string, path 
 to jar to be added (include jar name));
 System.out.println(Example: java -jar TestApp.jar 
 jdbc:hive2://192.168.111.111 /tmp/json-serde-1.3-jar-with-dependencies.jar);
 return;
 }
 Connection conn;
 try
 {
 conn = makeConnection(args[0], org.apache.hive.jdbc.HiveDriver);
 
 System.out.println(---);
 System.out.println(DONE);
 
 System.out.println(---);
 System.out.println(Execute query: add jar  + args[1] + ;);
 Statement stmt = conn.createStatement();
 int c = stmt.executeUpdate(add jar  + args[1]);
 System.out.println(Returned value is: [ + c + ]\n);
 
 System.out.println(---);
 final String createTableQry = Create table if not exists 
 json_test(id int, content string)  +
 row format serde 'org.openx.data.jsonserde.JsonSerDe';
 System.out.println(Execute query: + createTableQry + ;);
 stmt.execute(createTableQry);
 
 System.out.println(---);
 System.out.println(getColumn() 
 Call---\n);
 DatabaseMetaData md = conn.getMetaData();
 System.out.println(Test get all column in a schema:);
 ResultSet rs = md.getColumns(Hive, default, json_test, 
 null);
 while (rs.next()) {
 System.out.println(rs.getString(1));
 }
 conn.close();
 }
 catch (ClassNotFoundException e)
 {
 e.printStackTrace();
 }
 catch (SQLException e)
 {
 e.printStackTrace();
 }
 }
 }
 {code}
 Get Exception, and from metastore log:
 7:41:30.316 PMERROR   hive.log
 error in initSerDe: java.lang.ClassNotFoundException Class 
 org.openx.data.jsonserde.JsonSerDe not found
 java.lang.ClassNotFoundException: Class org.openx.data.jsonserde.JsonSerDe 
 not found
 at 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1803)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:183)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_fields(HiveMetaStore.java:2487)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_schema(HiveMetaStore.java:2542)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
 at com.sun.proxy.$Proxy5.get_schema(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema.getResult(ThriftHiveMetastore.java:6425)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema.getResult(ThriftHiveMetastore.java:6409)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
 at

[jira] [Commented] (HIVE-9918) Spark branch build is failing due to unknown url

2015-03-10 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355839#comment-14355839
 ] 

Xuefu Zhang commented on HIVE-9918:
---

+1 pending on test.

 Spark branch build is failing due to unknown url
 

 Key: HIVE-9918
 URL: https://issues.apache.org/jira/browse/HIVE-9918
 Project: Hive
  Issue Type: Bug
  Components: Spark, spark-branch
Reporter: Sergio Peña
Assignee: Sergio Peña
Priority: Blocker
 Attachments: HIVE-9918.1.patch


 Spark branch is failing due to an URL that does not exist anymore. This is 
 URL contains all spark jars used to build.
 The spark jars versions are not on the official maven repository.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9924) Add SORT_QUERY_RESULTS to union12.q

2015-03-11 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9924:
--
Attachment: HIVE-9924.1-spark.patch

Attached a dummy patch to trigger a clean test run for Spark branch to find out 
any test failures.

 Add SORT_QUERY_RESULTS to union12.q
 ---

 Key: HIVE-9924
 URL: https://issues.apache.org/jira/browse/HIVE-9924
 Project: Hive
  Issue Type: Test
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor
 Attachments: HIVE-9924.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9625) Delegation tokens for HMS are not renewed

2015-03-11 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358052#comment-14358052
 ] 

Xuefu Zhang commented on HIVE-9625:
---

[~brocknoland], [~prasadm], could we move this forward?

 Delegation tokens for HMS are not renewed
 -

 Key: HIVE-9625
 URL: https://issues.apache.org/jira/browse/HIVE-9625
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9625.1.patch


 AFAICT the delegation tokens stored in [HiveSessionImplwithUGI 
 |https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java#L45]
  for HMS + Impersonation are never renewed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10087) Beeline's --silent option should suppress query from being echoed when running with -f option

2015-03-25 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14380336#comment-14380336
 ] 

Xuefu Zhang commented on HIVE-10087:


Patch looks good. One minor thing, I noticed there is a bland line in the 
console output for -f when --silent=true. Is there a way to get rid of that?

 Beeline's --silent option should suppress query from being echoed when 
 running with -f option
 -

 Key: HIVE-10087
 URL: https://issues.apache.org/jira/browse/HIVE-10087
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 0.13.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Attachments: HIVE-10087.patch


 The {{-e}} and the {{-f}} options behave differently. 
 {code}
 beeline -u jdbc:hive2://localhost:1/default --showHeader=false 
 --silent=true -f select.sql
 0: jdbc:hive2://localhost:1/default select * from sample_07 limit 5;
 --
 00- All Occupations 134354250 40690
 11- Management occupations 6003930 96150
 11-1011 Chief executives 299160 151370
 11-1021 General and operations managers 1655410 103780
 11-1031 Legislators 61110 33880
 --
 beeline -u jdbc:hive2://localhost:1/default --showHeader=false 
 --silent=true -e select * from sample_07 limit 5;
 --
 00-   All Occupations 134354250   40690
 11-   Management occupations  6003930 96150
 11-1011   Chief executives299160  151370
 11-1021   General and operations managers 1655410 103780
 11-1031   Legislators 61110   33880
 --
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10087) Beeline's --silent option should suppress query from being echoed when running with -f option

2015-03-25 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14380945#comment-14380945
 ] 

Xuefu Zhang commented on HIVE-10087:


+1

 Beeline's --silent option should suppress query from being echoed when 
 running with -f option
 -

 Key: HIVE-10087
 URL: https://issues.apache.org/jira/browse/HIVE-10087
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 0.13.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor
 Attachments: HIVE-10087.patch


 The {{-e}} and the {{-f}} options behave differently. 
 {code}
 beeline -u jdbc:hive2://localhost:1/default --showHeader=false 
 --silent=true -f select.sql
 0: jdbc:hive2://localhost:1/default select * from sample_07 limit 5;
 --
 00- All Occupations 134354250 40690
 11- Management occupations 6003930 96150
 11-1011 Chief executives 299160 151370
 11-1021 General and operations managers 1655410 103780
 11-1031 Legislators 61110 33880
 --
 beeline -u jdbc:hive2://localhost:1/default --showHeader=false 
 --silent=true -e select * from sample_07 limit 5;
 --
 00-   All Occupations 134354250   40690
 11-   Management occupations  6003930 96150
 11-1011   Chief executives299160  151370
 11-1021   General and operations managers 1655410 103780
 11-1031   Legislators 61110   33880
 --
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8858) Visualize generated Spark plan [Spark Branch]

2015-03-30 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387344#comment-14387344
 ] 

Xuefu Zhang commented on HIVE-8858:
---

Hi Chinna, thanks for working on this. I haven't checked your patch, but the 
output looks nice. I have a few suggestions:

1. we need numbering in the Trans. Otherwise, it's hard to visualize the graph.
2. Other information, such as num of partitions in ShuffleTran, is also 
important to show.
3. It would be better if we log this graph in one line. The easiest way is to 
have a toString() method in SparkPlan and then we can just log the string 
representation of SparkPlan.
4. To avoid long lines, we can show the graph in the same way as we show work 
graph. For instance
{code}
MapTran 1 - MapInput 1 (cache off)
Shuffle1 (cache on) - MapTran 1
Reduce 1 - Shuffle1 (cache on)
Reduce 2 - Shuffle1 (cache on)
{code}
Please note that this may not represent a valid plan.

[~jxiang]/[~csun], please feel free to share your thoughts.

 Visualize generated Spark plan [Spark Branch]
 -

 Key: HIVE-8858
 URL: https://issues.apache.org/jira/browse/HIVE-8858
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chinna Rao Lalam
 Attachments: HIVE-8858-spark.patch


 The spark plan generated by SparkPlanGenerator contains info which isn't 
 available in Hive's explain plan, such as RDD caching. Also, the graph is 
 slight different from orignal SparkWork. Thus, it would be nice to visualize 
 the plan as is done for SparkWork.
 Preferrably, the visualization can happen as part of Hive explain extended. 
 If not feasible, we at least can log this at info level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10143) HS2 fails to clean up Spark client state on timeout [Spark Branch]

2015-03-30 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387276#comment-14387276
 ] 

Xuefu Zhang commented on HIVE-10143:


+1 pending on tests.

 HS2 fails to clean up Spark client state on timeout [Spark Branch]
 --

 Key: HIVE-10143
 URL: https://issues.apache.org/jira/browse/HIVE-10143
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
 Attachments: HIVE-10143.1-spark.patch


 When a new client is registered with the Spark client and fails to connect 
 back in time, the code will time out the future and HS2 will give up on that 
 client. But the RSC backend does not clean up all the state, and the client 
 is still allowed to connect back. That can lead to the client staying alive 
 indefinitely and holding on to cluster resources, since HS2 doesn't know it's 
 alive but the connection still exists.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10143) HS2 fails to clean up Spark client state on timeout [Spark Branch]

2015-03-30 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387437#comment-14387437
 ] 

Xuefu Zhang commented on HIVE-10143:


That's correct. These failures are known, captured in HIVE-10134. Please ignore 
for now.

 HS2 fails to clean up Spark client state on timeout [Spark Branch]
 --

 Key: HIVE-10143
 URL: https://issues.apache.org/jira/browse/HIVE-10143
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
 Attachments: HIVE-10143.1-spark.patch


 When a new client is registered with the Spark client and fails to connect 
 back in time, the code will time out the future and HS2 will give up on that 
 client. But the RSC backend does not clean up all the state, and the client 
 is still allowed to connect back. That can lead to the client staying alive 
 indefinitely and holding on to cluster resources, since HS2 doesn't know it's 
 alive but the connection still exists.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]

2015-03-26 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383219#comment-14383219
 ] 

Xuefu Zhang commented on HIVE-10073:


Okay. Makes sense.

 Runtime exception when querying HBase with Spark [Spark Branch]
 ---

 Key: HIVE-10073
 URL: https://issues.apache.org/jira/browse/HIVE-10073
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10073.1-spark.patch, HIVE-10073.2-spark.patch, 
 HIVE-10073.3-spark.patch


 When querying HBase with Spark, we got 
 {noformat}
  Caused by: java.lang.IllegalArgumentException: Must specify table name
 at 
 org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331)
 {noformat}
 But it works fine for MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]

2015-03-26 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382162#comment-14382162
 ] 

Xuefu Zhang commented on HIVE-10073:


Hi [~jxiang] and [~chengxiang li], before we patch this on Hive side, I think 
it's better to find the root cause. If the problem is due to Spark, we can 
bring up the problem to that community. So far, I'm not convinced that the 
problem is on hive side.

 Runtime exception when querying HBase with Spark [Spark Branch]
 ---

 Key: HIVE-10073
 URL: https://issues.apache.org/jira/browse/HIVE-10073
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10073.1-spark.patch


 When querying HBase with Spark, we got 
 {noformat}
  Caused by: java.lang.IllegalArgumentException: Must specify table name
 at 
 org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276)
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331)
 {noformat}
 But it works fine for MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9969) Avoid Utilities.getMapRedWork for spark [Spark Branch]

2015-03-31 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389226#comment-14389226
 ] 

Xuefu Zhang commented on HIVE-9969:
---

+1

 Avoid Utilities.getMapRedWork for spark [Spark Branch]
 --

 Key: HIVE-9969
 URL: https://issues.apache.org/jira/browse/HIVE-9969
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Priority: Minor
 Attachments: HIVE-9969.1-spark.patch


 The method shouldn't be used for spark mode. Specifically, map work and 
 reduce work have different plan paths in spark. Calling this method will 
 leave lots of errors in executor's log:
 {noformat}
 15/03/16 02:57:23 INFO Utilities: Open file to read in plan: 
 hdfs://node13-1:8020/tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml
 15/03/16 02:57:23 INFO Utilities: File not found: File does not exist: 
 /tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10130) Merge from Spark branch to trunk 03/27/2015 [Spark Branch]

2015-03-28 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10130:
---
Summary: Merge from Spark branch to trunk 03/27/2015 [Spark Branch]  (was: 
Merge from Spark branch to trunk 03/27/2015)

 Merge from Spark branch to trunk 03/27/2015 [Spark Branch]
 --

 Key: HIVE-10130
 URL: https://issues.apache.org/jira/browse/HIVE-10130
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-10130.1-spark.patch, HIVE-10130.2-spark.patch, 
 HIVE-10130.2-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10058) Log the information of cached RDD [Spark Branch]

2015-03-23 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375896#comment-14375896
 ] 

Xuefu Zhang commented on HIVE-10058:


Hi Chinna, do you agree that if we fulfill HIVE-8858 we don't need this one? My 
concern is that RDD id helps little in understanding Spark plan.

 Log the information of cached RDD [Spark Branch]
 

 Key: HIVE-10058
 URL: https://issues.apache.org/jira/browse/HIVE-10058
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: spark-branch

 Attachments: HIVE-10058.1-spark.patch


 Log the cached RDD Id's at info level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9990) TestMultiSessionsHS2WithLocalClusterSpark is failing

2015-03-22 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375345#comment-14375345
 ] 

Xuefu Zhang commented on HIVE-9990:
---

[~Ferd], thanks for looking into this. My desktop always has the problem 
running spark tests, due to snappy native library. I guess my problem could be 
different from Jenkins. If you cannot produce, I think that could be just a 
transient failure. You may close the problem as not reproducible. Thanks.

 TestMultiSessionsHS2WithLocalClusterSpark is failing
 

 Key: HIVE-9990
 URL: https://issues.apache.org/jira/browse/HIVE-9990
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 1.2.0
Reporter: Xuefu Zhang
Assignee: Ferdinand Xu

 At least sometimes. I can reproduce it with mvn test 
 -Dtest=TestMultiSessionsHS2WithLocalClusterSpark -Phadoop-2 consistently on 
 my local box (both trunk and spark branch).
 {code}
 ---
  T E S T S
 ---
 Running org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark
 Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 92.438 sec 
  FAILURE! - in 
 org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark
 testSparkQuery(org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark)
   Time elapsed: 21.514 sec   ERROR!
 java.util.concurrent.ExecutionException: java.sql.SQLException: Error while 
 processing statement: FAILED: Execution Error, return code 3 from 
 org.apache.hadoop.hive.ql.exec.spark.SparkTask
   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296)
   at 
 org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)
   at 
 org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.verifyResult(TestMultiSessionsHS2WithLocalClusterSpark.java:244)
   at 
 org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testKvQuery(TestMultiSessionsHS2WithLocalClusterSpark.java:220)
   at 
 org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.access$000(TestMultiSessionsHS2WithLocalClusterSpark.java:53)
 {code}
 The error was also seen in HIVE-9934 test run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9793) Remove hard coded paths from cli driver tests

2015-02-27 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340258#comment-14340258
 ] 

Xuefu Zhang commented on HIVE-9793:
---

+1

 Remove hard coded paths from cli driver tests
 -

 Key: HIVE-9793
 URL: https://issues.apache.org/jira/browse/HIVE-9793
 Project: Hive
  Issue Type: Improvement
  Components: Tests
Affects Versions: 1.2.0
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9793.patch, HIVE-9793.patch, HIVE-9793.patch


 At some point a change which generates a hard coded path into the test files 
 snuck in. Insert we should use the {{HIVE_ROOT}} directory as this is better 
 for ptest environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-03-04 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347821#comment-14347821
 ] 

Xuefu Zhang edited comment on HIVE-9863 at 3/5/15 12:18 AM:


cc: [~rdblue] [~spena]


was (Author: xuefuz):
cc: [~rdblue]

 Querying parquet tables fails with IllegalStateException [Spark Branch]
 ---

 Key: HIVE-9863
 URL: https://issues.apache.org/jira/browse/HIVE-9863
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang

 Not necessarily happens only in spark branch, queries such as select count(*) 
 from table_name fails with error:
 {code}
 hive select * from content limit 2;
 OK
 Failed with exception java.io.IOException:java.lang.IllegalStateException: 
 All the offsets listed in the split should be found in the file. expected: 
 [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
 BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
 BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
 INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
 [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
 ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
 BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
 PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
 INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
 [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
 [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
 [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
 [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
 129785482, 260224757] in range 0, 134217728
 Time taken: 0.253 seconds
 hive 
 {code}
 I can reproduce the problem with either local or yarn-cluster. It seems 
 happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

2015-03-04 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347821#comment-14347821
 ] 

Xuefu Zhang commented on HIVE-9863:
---

cc: [~rdblue]

 Querying parquet tables fails with IllegalStateException [Spark Branch]
 ---

 Key: HIVE-9863
 URL: https://issues.apache.org/jira/browse/HIVE-9863
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang

 Not necessarily happens only in spark branch, queries such as select count(*) 
 from table_name fails with error:
 {code}
 hive select * from content limit 2;
 OK
 Failed with exception java.io.IOException:java.lang.IllegalStateException: 
 All the offsets listed in the split should be found in the file. expected: 
 [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
 BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
 BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
 INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
 [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
 ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
 BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
 PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
 INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
 [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
 [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
 [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
 [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
 129785482, 260224757] in range 0, 134217728
 Time taken: 0.253 seconds
 hive 
 {code}
 I can reproduce the problem with either local or yarn-cluster. It seems 
 happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-9869) Trunk doesn't build with hadoop-1

2015-03-05 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-9869:
-

Assignee: Rui Li

 Trunk doesn't build with hadoop-1
 -

 Key: HIVE-9869
 URL: https://issues.apache.org/jira/browse/HIVE-9869
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9869.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9855) Runtime skew join doesn't work when skewed data only exists in big table

2015-03-04 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346896#comment-14346896
 ] 

Xuefu Zhang commented on HIVE-9855:
---

+1 pending on test.

 Runtime skew join doesn't work when skewed data only exists in big table
 

 Key: HIVE-9855
 URL: https://issues.apache.org/jira/browse/HIVE-9855
 Project: Hive
  Issue Type: Bug
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-9855.1.patch


 To reproduce, enable runtime skew join and then join two tables that skewed 
 data only exists in one of them. The task will fail with the following 
 exception:
 {noformat}
 Error: java.lang.RuntimeException: Hive Runtime Error while closing 
 operators: java.io.IOException: Unable to rename output to: hdfs://..
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9302) Beeline add commands to register local jdbc driver names and jars

2015-03-05 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348841#comment-14348841
 ] 

Xuefu Zhang commented on HIVE-9302:
---

+1

 Beeline add commands to register local jdbc driver names and jars
 -

 Key: HIVE-9302
 URL: https://issues.apache.org/jira/browse/HIVE-9302
 Project: Hive
  Issue Type: New Feature
Reporter: Brock Noland
Assignee: Ferdinand Xu
 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, 
 HIVE-9302.2.patch, HIVE-9302.3.patch, HIVE-9302.3.patch, HIVE-9302.4.patch, 
 HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar


 At present if a beeline user uses {{add jar}} the path they give is actually 
 on the HS2 server. It'd be great to allow beeline users to add local jdbc 
 driver jars and register custom jdbc driver names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9793) Remove hard coded paths from cli driver tests

2015-02-25 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337810#comment-14337810
 ] 

Xuefu Zhang commented on HIVE-9793:
---

Looks good to me. What about the result directory, which is also using basedir?

 Remove hard coded paths from cli driver tests
 -

 Key: HIVE-9793
 URL: https://issues.apache.org/jira/browse/HIVE-9793
 Project: Hive
  Issue Type: Improvement
  Components: Tests
Affects Versions: 1.2.0
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-9793.patch


 At some point a change which generates a hard coded path into the test files 
 snuck in. Insert we should use the {{HIVE_ROOT}} directory as this is better 
 for ptest environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9086) Add language support to PURGE data while dropping partitions.

2015-02-25 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337437#comment-14337437
 ] 

Xuefu Zhang commented on HIVE-9086:
---

Could we get a summary on the disagreement here? If the syntax for table is 
adding PURGE after table name, we should be adding PURGE after partition spec 
just to be consistent.

 Add language support to PURGE data while dropping partitions.
 -

 Key: HIVE-9086
 URL: https://issues.apache.org/jira/browse/HIVE-9086
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.15.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9086.1.patch


 HIVE-9083 adds metastore-support to skip-trash while dropping partitions. 
 This patch includes language support to do the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9794) java.lang.NoSuchMethodError occurs during hive query execution which has 'ADD FILE XXXX.jar' sentence

2015-02-26 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339492#comment-14339492
 ] 

Xuefu Zhang commented on HIVE-9794:
---

cc: [~chengxiang li], [~lirui]

 java.lang.NoSuchMethodError occurs during hive query execution which has 'ADD 
 FILE .jar' sentence
 -

 Key: HIVE-9794
 URL: https://issues.apache.org/jira/browse/HIVE-9794
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xin Hao

 We updated our code to the latest revision on Spark Branch  (i.e. 
 fd0f638a8d481a9a98b34d3dd08236d6d591812f) , rebuild and deploy Hive in our 
 cluster and run BigBench case again. Many cases (e.g. Q1, Q2, Q3, Q4, Q8) 
 failed due to a common 'NoSuchMethodError'. The root cause sentence in these 
 queries should be  ‘ADD FILE .jar’.
 Detail error message:
 Exception in thread main java.lang.NoSuchMethodError: 
 org.apache.hadoop.hive.ql.session.SessionState.add_resources(Lorg/apache/hadoop/hive/ql/session/SessionState$ResourceType;Ljava/util/List;)Ljava/util/List;
 at 
 org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:67)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:262)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:403)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:419)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others

2015-03-19 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369575#comment-14369575
 ] 

Xuefu Zhang commented on HIVE-7018:
---

+1, Thanks, Chaoyu!

 Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
 not others
 -

 Key: HIVE-7018
 URL: https://issues.apache.org/jira/browse/HIVE-7018
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Yongzhi Chen
 Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch


 It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
 column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9994) Hive query plan returns sensitive data to external applications

2015-03-18 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367771#comment-14367771
 ] 

Xuefu Zhang commented on HIVE-9994:
---

+1

 Hive query plan returns sensitive data to external applications
 ---

 Key: HIVE-9994
 URL: https://issues.apache.org/jira/browse/HIVE-9994
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9994.1.patch, HIVE-9994.2.patch, HIVE-9994.3.patch


 Some applications are using getQueryString() method from the QueryPlan class 
 to get the query that is being executed by Hive. The query string returned is 
 not redacted, and it is returning sensitive information that is logged in 
 Navigator.
 We need to return data redacted from the QueryPlan to avoid other 
 applications to log sensitive data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10058) Log the information of cached RDD [Spark Branch]

2015-03-23 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376552#comment-14376552
 ] 

Xuefu Zhang commented on HIVE-10058:


[~chinnalalam], Sorry that I wasn't clear. I was thinking more on HIVE-8858, 
where I'd like to have visual representation of SparkPlan. As you can see in 
the class definition, A SparkPlan consists of a graph of SparkTrans. SparkTran 
has a few subclasses. Some subclass, such as MapInput has property such as 
toCache. What is desirable is that we log a SparkPlan in a graphical way 
similar to what's show for work graph in explain plan, such as:
{code} 
MapInput (cache off) - Shuffle (cache on) - Reduce
 \- Reduce
{code}
This is will give us some idea about SparkPlan that we are executing. Let me 
know if you have any questions.

 Log the information of cached RDD [Spark Branch]
 

 Key: HIVE-10058
 URL: https://issues.apache.org/jira/browse/HIVE-10058
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: spark-branch

 Attachments: HIVE-10058.1-spark.patch, HIVE-10058.2-spark.patch


 Log the cached RDD Id's at info level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-10058) Log the information of cached RDD [Spark Branch]

2015-03-23 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376552#comment-14376552
 ] 

Xuefu Zhang edited comment on HIVE-10058 at 3/23/15 8:16 PM:
-

[~chinnalalam], Sorry that I wasn't clear. I was thinking more on HIVE-8858, 
where I'd like to have visual representation of SparkPlan. As you can see in 
the class definition, A SparkPlan consists of a graph of SparkTrans. SparkTran 
has a few subclasses. Some subclass, such as MapInput has property such as 
toCache. What is desirable is that we log a SparkPlan in a graphical way 
similar to what's show for work graph in explain plan, such as:
{code} 
MapInput (cache off) - Shuffle (cache on) - Reduce
   \- Reduce
{code}
This is will give us some idea about SparkPlan that we are executing. Let me 
know if you have any questions.


was (Author: xuefuz):
[~chinnalalam], Sorry that I wasn't clear. I was thinking more on HIVE-8858, 
where I'd like to have visual representation of SparkPlan. As you can see in 
the class definition, A SparkPlan consists of a graph of SparkTrans. SparkTran 
has a few subclasses. Some subclass, such as MapInput has property such as 
toCache. What is desirable is that we log a SparkPlan in a graphical way 
similar to what's show for work graph in explain plan, such as:
{code} 
MapInput (cache off) - Shuffle (cache on) - Reduce
\- Reduce
{code}
This is will give us some idea about SparkPlan that we are executing. Let me 
know if you have any questions.

 Log the information of cached RDD [Spark Branch]
 

 Key: HIVE-10058
 URL: https://issues.apache.org/jira/browse/HIVE-10058
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: spark-branch

 Attachments: HIVE-10058.1-spark.patch, HIVE-10058.2-spark.patch


 Log the cached RDD Id's at info level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-10058) Log the information of cached RDD [Spark Branch]

2015-03-23 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376552#comment-14376552
 ] 

Xuefu Zhang edited comment on HIVE-10058 at 3/23/15 8:16 PM:
-

[~chinnalalam], Sorry that I wasn't clear. I was thinking more on HIVE-8858, 
where I'd like to have visual representation of SparkPlan. As you can see in 
the class definition, A SparkPlan consists of a graph of SparkTrans. SparkTran 
has a few subclasses. Some subclass, such as MapInput has property such as 
toCache. What is desirable is that we log a SparkPlan in a graphical way 
similar to what's show for work graph in explain plan, such as:
{code} 
MapInput (cache off) - Shuffle (cache on) - Reduce
\- Reduce
{code}
This is will give us some idea about SparkPlan that we are executing. Let me 
know if you have any questions.


was (Author: xuefuz):
[~chinnalalam], Sorry that I wasn't clear. I was thinking more on HIVE-8858, 
where I'd like to have visual representation of SparkPlan. As you can see in 
the class definition, A SparkPlan consists of a graph of SparkTrans. SparkTran 
has a few subclasses. Some subclass, such as MapInput has property such as 
toCache. What is desirable is that we log a SparkPlan in a graphical way 
similar to what's show for work graph in explain plan, such as:
{code} 
MapInput (cache off) - Shuffle (cache on) - Reduce
   \- Reduce
{code}
This is will give us some idea about SparkPlan that we are executing. Let me 
know if you have any questions.

 Log the information of cached RDD [Spark Branch]
 

 Key: HIVE-10058
 URL: https://issues.apache.org/jira/browse/HIVE-10058
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: spark-branch

 Attachments: HIVE-10058.1-spark.patch, HIVE-10058.2-spark.patch


 Log the cached RDD Id's at info level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-10058) Log the information of cached RDD [Spark Branch]

2015-03-23 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376552#comment-14376552
 ] 

Xuefu Zhang edited comment on HIVE-10058 at 3/23/15 8:16 PM:
-

[~chinnalalam], Sorry that I wasn't clear. I was thinking more on HIVE-8858, 
where I'd like to have visual representation of SparkPlan. As you can see in 
the class definition, A SparkPlan consists of a graph of SparkTrans. SparkTran 
has a few subclasses. Some subclass, such as MapInput has property such as 
toCache. What is desirable is that we log a SparkPlan in a graphical way 
similar to what's show for work graph in explain plan, such as:
{code} 
MapInput (cache off) - Shuffle (cache on) - Reduce
   \- Reduce
{code}
This is will give us some idea about SparkPlan that we are executing. Let me 
know if you have any questions.


was (Author: xuefuz):
[~chinnalalam], Sorry that I wasn't clear. I was thinking more on HIVE-8858, 
where I'd like to have visual representation of SparkPlan. As you can see in 
the class definition, A SparkPlan consists of a graph of SparkTrans. SparkTran 
has a few subclasses. Some subclass, such as MapInput has property such as 
toCache. What is desirable is that we log a SparkPlan in a graphical way 
similar to what's show for work graph in explain plan, such as:
{code} 
MapInput (cache off) - Shuffle (cache on) - Reduce
 \- Reduce
{code}
This is will give us some idea about SparkPlan that we are executing. Let me 
know if you have any questions.

 Log the information of cached RDD [Spark Branch]
 

 Key: HIVE-10058
 URL: https://issues.apache.org/jira/browse/HIVE-10058
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chinna Rao Lalam
Assignee: Chinna Rao Lalam
 Fix For: spark-branch

 Attachments: HIVE-10058.1-spark.patch, HIVE-10058.2-spark.patch


 Log the cached RDD Id's at info level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10130) Merge from Spark branch to trunk 03/27/2015

2015-03-27 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10130:
---
Attachment: HIVE-10130.1-spark.patch

 Merge from Spark branch to trunk 03/27/2015
 ---

 Key: HIVE-10130
 URL: https://issues.apache.org/jira/browse/HIVE-10130
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-10130.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10130) Merge from Spark branch to trunk 03/27/2015

2015-03-27 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10130:
---
Attachment: HIVE-10130.2-spark.patch

 Merge from Spark branch to trunk 03/27/2015
 ---

 Key: HIVE-10130
 URL: https://issues.apache.org/jira/browse/HIVE-10130
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-10130.1-spark.patch, HIVE-10130.2-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10146) Not count session as idle if query is running

2015-04-02 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10146:
---
Labels: TODOC1.2  (was: )

 Not count session as idle if query is running
 -

 Key: HIVE-10146
 URL: https://issues.apache.org/jira/browse/HIVE-10146
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-10146.1.patch, HIVE-10146.2.patch


 Currently, as long as there is no activity, we think the HS2 session is idle. 
 This makes it very hard to set HIVE_SERVER2_IDLE_SESSION_TIMEOUT. If we don't 
 set it long enough, an unattended query could be killed.
 We should provide an option to not to count the session as idle if some query 
 is still running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10385) Optionally disable partition creation to speedup ETL jobs

2015-04-20 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504116#comment-14504116
 ] 

Xuefu Zhang commented on HIVE-10385:


Not sure if I understand the request correctly. If we load a table with dynamic 
partitioning w/o creating these partitions at the end, why do we even bother 
using dynamic partitioning at all. A use case would help.

 Optionally disable partition creation to speedup ETL jobs
 -

 Key: HIVE-10385
 URL: https://issues.apache.org/jira/browse/HIVE-10385
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Slava Markeyev
Priority: Minor
 Attachments: HIVE-10385.patch


 ETL jobs that create dynamic partitions with high cardinality perform the 
 expensive step of metastore partition creation after query completion. Until 
 bulk partition creation can be optimized there should be a way of optionally 
 skipping this step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10464) How i find the kryo version

2015-04-23 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-10464.

Resolution: Invalid

 How i find the kryo version 
 

 Key: HIVE-10464
 URL: https://issues.apache.org/jira/browse/HIVE-10464
 Project: Hive
  Issue Type: Improvement
Reporter: ankush

 Could you please let me know how i find the kryo version that i using ?
 Please help me on this,
 We are just running HQL (Hive) queries



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10454) Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified.

2015-04-23 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509403#comment-14509403
 ] 

Xuefu Zhang commented on HIVE-10454:


I think the point of strict mode is to prevent full scan all partitions of a 
table. In your case, while rows are filtered, the scanner will have to scan all 
partitions, which should be prevented by the virtue of the strict mode.

 Query against partitioned table in strict mode failed with No partition 
 predicate found even if partition predicate is specified.
 ---

 Key: HIVE-10454
 URL: https://issues.apache.org/jira/browse/HIVE-10454
 Project: Hive
  Issue Type: Bug
Reporter: Aihua Xu
Assignee: Aihua Xu

 The following queries fail:
 {noformat}
 create table t1 (c1 int) PARTITIONED BY (c2 string);
 set hive.mapred.mode=strict;
 select * from t1 where t1.c2  to_date(date_add(from_unixtime( 
 unix_timestamp() ),1));
 {noformat}
 The query failed with No partition predicate found for alias t1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5672) Insert with custom separator not supported for non-local directory

2015-04-23 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509440#comment-14509440
 ] 

Xuefu Zhang commented on HIVE-5672:
---

Looking at the patch, I'm not sure if I understand the changes correctly. I can 
see that we modified the grammar to make local optional and the rest is about 
refactoring. I'm not sure if this is sufficient. Did I miss anything?

Also, instead of adding a new grammar rule, we should combine it with the old 
one. We just need to make KW_LOCAL optional.

 Insert with custom separator not supported for non-local directory
 --

 Key: HIVE-5672
 URL: https://issues.apache.org/jira/browse/HIVE-5672
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 1.0.0
Reporter: Romain Rigaux
Assignee: Nemon Lou
 Attachments: HIVE-5672.1.patch, HIVE-5672.2.patch, HIVE-5672.3.patch, 
 HIVE-5672.4.patch, HIVE-5672.5.patch, HIVE-5672.5.patch.tar.gz


 https://issues.apache.org/jira/browse/HIVE-3682 is great but non local 
 directory don't seem to be supported:
 {code}
 insert overwrite directory '/tmp/test-02'
 row format delimited
 FIELDS TERMINATED BY ':'
 select description FROM sample_07
 {code}
 {code}
 Error while compiling statement: FAILED: ParseException line 2:0 cannot 
 recognize input near 'row' 'format' 'delimited' in select clause
 {code}
 This works (with 'local'):
 {code}
 insert overwrite local directory '/tmp/test-02'
 row format delimited
 FIELDS TERMINATED BY ':'
 select code, description FROM sample_07
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10302) Load small tables (for map join) in executor memory only once[Spark Branch]

2015-04-21 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10302:
---
Summary: Load small tables (for map join) in executor memory only 
once[Spark Branch]  (was: Cache small tables in memory [Spark Branch])

 Load small tables (for map join) in executor memory only once[Spark Branch]
 ---

 Key: HIVE-10302
 URL: https://issues.apache.org/jira/browse/HIVE-10302
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10302.spark-1.patch


 If we can cache small tables in executor memory, we could save some time in 
 loading them from HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]

2015-04-21 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10302:
---
Summary: Load small tables (for map join) in executor memory only once 
[Spark Branch]  (was: Load small tables (for map join) in executor memory only 
once[Spark Branch])

 Load small tables (for map join) in executor memory only once [Spark Branch]
 

 Key: HIVE-10302
 URL: https://issues.apache.org/jira/browse/HIVE-10302
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10302.spark-1.patch


 If we can cache small tables in executor memory, we could save some time in 
 loading them from HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]

2015-04-21 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10302:
---
Description: Usually there are multiple cores in a Spark executor, and thus 
it's possible that multiple map-join tasks can be running in the same executor 
(concurrently or sequentially). Currently, each task will load its own copy of 
the small tables for map join into memory, ending up with inefficiency. 
Ideally, we only load the small tables once and share them among the tasks 
running in that executor.  (was: If we can cache small tables in executor 
memory, we could save some time in loading them from HDFS.)

 Load small tables (for map join) in executor memory only once [Spark Branch]
 

 Key: HIVE-10302
 URL: https://issues.apache.org/jira/browse/HIVE-10302
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch

 Attachments: HIVE-10302.spark-1.patch


 Usually there are multiple cores in a Spark executor, and thus it's possible 
 that multiple map-join tasks can be running in the same executor 
 (concurrently or sequentially). Currently, each task will load its own copy 
 of the small tables for map join into memory, ending up with inefficiency. 
 Ideally, we only load the small tables once and share them among the tasks 
 running in that executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10477) Provide option to disable Spark tests in Windows OS

2015-04-24 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511038#comment-14511038
 ] 

Xuefu Zhang commented on HIVE-10477:


I'm wondering it's possible to detect the OS type in pom.xml and skip spark 
test automatically if os is windows.

 Provide option to disable Spark tests in Windows OS
 ---

 Key: HIVE-10477
 URL: https://issues.apache.org/jira/browse/HIVE-10477
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan
 Attachments: HIVE-10477.1.patch


 In the current master branch, unit tests fail with windows OS because of the 
 dependency on bash executable in itests/hive-unit/pom.xml around these 
 lines :
 {code}
  target
 exec executable=bash dir=${basedir} 
 failonerror=true
   arg line=../target/download.sh/
 /exec
   /target
 {code}
 We should provide an option to disable spark tests in OSes  like Windows 
 where bash might be absent



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5672) Insert with custom separator not supported for non-local directory

2015-04-24 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511084#comment-14511084
 ] 

Xuefu Zhang commented on HIVE-5672:
---

Here is what I have about the combined grammar:
{code}
  (local = KW_LOCAL)? KW_DIRECTORY StringLiteral tableRowFormat? 
tableFileFormat?
- ^(TOK_DIR StringLiteral $local? tableRowFormat? tableFileFormat?)
{code}

With this, I'm sure SemanticAnalyzer has the information about whether the 
directory is local.



 Insert with custom separator not supported for non-local directory
 --

 Key: HIVE-5672
 URL: https://issues.apache.org/jira/browse/HIVE-5672
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 1.0.0
Reporter: Romain Rigaux
Assignee: Nemon Lou
 Attachments: HIVE-5672.1.patch, HIVE-5672.2.patch, HIVE-5672.3.patch, 
 HIVE-5672.4.patch, HIVE-5672.5.patch, HIVE-5672.5.patch.tar.gz, 
 HIVE-5672.6.patch, HIVE-5672.6.patch.tar.gz


 https://issues.apache.org/jira/browse/HIVE-3682 is great but non local 
 directory don't seem to be supported:
 {code}
 insert overwrite directory '/tmp/test-02'
 row format delimited
 FIELDS TERMINATED BY ':'
 select description FROM sample_07
 {code}
 {code}
 Error while compiling statement: FAILED: ParseException line 2:0 cannot 
 recognize input near 'row' 'format' 'delimited' in select clause
 {code}
 This works (with 'local'):
 {code}
 insert overwrite local directory '/tmp/test-02'
 row format delimited
 FIELDS TERMINATED BY ':'
 select code, description FROM sample_07
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10312) SASL.QOP in JDBC URL is ignored for Delegation token Authentication

2015-04-26 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513029#comment-14513029
 ] 

Xuefu Zhang commented on HIVE-10312:


[~mkazia], please feel free to update the doc.

 SASL.QOP in JDBC URL is ignored for Delegation token Authentication
 ---

 Key: HIVE-10312
 URL: https://issues.apache.org/jira/browse/HIVE-10312
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 1.2.0
Reporter: Mubashir Kazia
Assignee: Mubashir Kazia
 Fix For: 1.2.0

 Attachments: HIVE-10312.1.patch, HIVE-10312.1.patch


 When HS2 is configured for QOP other than auth (auth-int or auth-conf), 
 Kerberos client connection works fine when the JDBC URL specifies the 
 matching QOP, however when this HS2 is accessed through Oozie (Delegation 
 token / Digest authentication), connections fails because the JDBC driver 
 ignores the SASL.QOP parameters in the JDBC URL. SASL.QOP setting should be 
 valid for DIGEST Auth mech.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10487) remove non-ISO restriction that projections in a union have identical column names

2015-04-26 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513028#comment-14513028
 ] 

Xuefu Zhang commented on HIVE-10487:


Interesting. If the restriction is lifted, what's the column name of the result 
schema then? Does ISO say anything?

 remove non-ISO restriction that projections in a union have identical column 
 names
 --

 Key: HIVE-10487
 URL: https://issues.apache.org/jira/browse/HIVE-10487
 Project: Hive
  Issue Type: Improvement
  Components: SQL
Affects Versions: 0.13.1
Reporter: N Campbell
Priority: Critical

 While documented 
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union
 an application should be able to perform a union query where the projections  
 are union compatible which does not include the projected column names being 
 identical which Hive imposes vs ISO-SQL 20xx.
 i.e 
 rejected
 select c1 from t1 union all select c2 from t2 
 Schema of both sides of union should match. _u1-subquery2
 accepted
 select c1 from t1 union all select c2 c1 from t2 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-10312) SASL.QOP in JDBC URL is ignored for Delegation token Authentication

2015-04-24 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-10312.

Resolution: Fixed

Committed to master. Thanks, Mubashir!

 SASL.QOP in JDBC URL is ignored for Delegation token Authentication
 ---

 Key: HIVE-10312
 URL: https://issues.apache.org/jira/browse/HIVE-10312
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 1.2.0
Reporter: Mubashir Kazia
Assignee: Mubashir Kazia
 Fix For: 1.2.0

 Attachments: HIVE-10312.1.patch, HIVE-10312.1.patch


 When HS2 is configured for QOP other than auth (auth-int or auth-conf), 
 Kerberos client connection works fine when the JDBC URL specifies the 
 matching QOP, however when this HS2 is accessed through Oozie (Delegation 
 token / Digest authentication), connections fails because the JDBC driver 
 ignores the SASL.QOP parameters in the JDBC URL. SASL.QOP setting should be 
 valid for DIGEST Auth mech.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10312) SASL.QOP in JDBC URL is ignored for Delegation token Authentication

2015-04-22 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508124#comment-14508124
 ] 

Xuefu Zhang commented on HIVE-10312:


+1

 SASL.QOP in JDBC URL is ignored for Delegation token Authentication
 ---

 Key: HIVE-10312
 URL: https://issues.apache.org/jira/browse/HIVE-10312
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 1.2.0
Reporter: Mubashir Kazia
Assignee: Mubashir Kazia
 Fix For: 1.2.0

 Attachments: HIVE-10312.1.patch, HIVE-10312.1.patch


 When HS2 is configured for QOP other than auth (auth-int or auth-conf), 
 Kerberos client connection works fine when the JDBC URL specifies the 
 matching QOP, however when this HS2 is accessed through Oozie (Delegation 
 token / Digest authentication), connections fails because the JDBC driver 
 ignores the SASL.QOP parameters in the JDBC URL. SASL.QOP setting should be 
 valid for DIGEST Auth mech.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10454) Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified.

2015-04-23 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509084#comment-14509084
 ] 

Xuefu Zhang commented on HIVE-10454:


I'm not sure if in strict mode an equal condition on partition column is 
expected. Otherwise, the query can still span to all or a large number of 
partitions.

 Query against partitioned table in strict mode failed with No partition 
 predicate found even if partition predicate is specified.
 ---

 Key: HIVE-10454
 URL: https://issues.apache.org/jira/browse/HIVE-10454
 Project: Hive
  Issue Type: Bug
Reporter: Aihua Xu
Assignee: Aihua Xu

 The following queries fail:
 {noformat}
 create table t1 (c1 int) PARTITIONED BY (c2 string);
 set hive.mapred.mode=strict;
 select * from t1 where t1.c2  to_date(date_add(from_unixtime( 
 unix_timestamp() ),1));
 {noformat}
 The query failed with No partition predicate found for alias t1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1868 matches

Mail list logo