[jira] [Issue Comment Deleted] (HIVE-9948) SparkUtilities.getFileName passes File.separator to String.split() method
[ https://issues.apache.org/jira/browse/HIVE-9948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9948: -- Comment: was deleted (was: Hi Xuefu, can u suggest on this, hive show roles; FAILED: SemanticException The current builtin authorization in Hive is incomplete and disabled. Error from Hive: error code: '0' error message: 'ExecuteStatement finished with operation state: CLOSED_STATE' ) SparkUtilities.getFileName passes File.separator to String.split() method - Key: HIVE-9948 URL: https://issues.apache.org/jira/browse/HIVE-9948 Project: Hive Issue Type: Bug Components: Spark Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Fix For: 1.2.0 Attachments: HIVE-9948.1.patch String.split() method expects regex. This is why File.separator can not be passed to split. In this particular case we can use FilenameUtils.getName to get file name -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9994) Hive query plan returns sensitive data to external applications
[ https://issues.apache.org/jira/browse/HIVE-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367683#comment-14367683 ] Xuefu Zhang commented on HIVE-9994: --- Patch looks good. One question: do we need to check null for the input in redactLogString() as it's a public method? Hive query plan returns sensitive data to external applications --- Key: HIVE-9994 URL: https://issues.apache.org/jira/browse/HIVE-9994 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9994.1.patch Some applications are using getQueryString() method from the QueryPlan class to get the query that is being executed by Hive. The query string returned is not redacted, and it is returning sensitive information that is logged in Navigator. We need to return data redacted from the QueryPlan to avoid other applications to log sensitive data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9974) Sensitive data redaction: data appears in name of mapreduce job
[ https://issues.apache.org/jira/browse/HIVE-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363397#comment-14363397 ] Xuefu Zhang commented on HIVE-9974: --- +1 Sensitive data redaction: data appears in name of mapreduce job --- Key: HIVE-9974 URL: https://issues.apache.org/jira/browse/HIVE-9974 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9974.1.patch Set up a cluster, configured a redaction rule to redact B0096EZHM2, and ran Hive queries on the cluster. Looking at the YARN RM web UI and Job History Server web UI, I see that the mapreduce jobs spawned by the Hive queries have the sensitive data (B0096EZHM2) showing in the job names: e.g., select product, useri...product='B0096EZHM2'(Stage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364539#comment-14364539 ] Xuefu Zhang commented on HIVE-9697: --- [~lirui], I don't think we had a closure on this. totalSize is closer to file size, while rawDataSize closer to memory size required. While using totalSize is more aggressive in taking map join, some file format, such as ORC/Parquet, is very good at compression (10x is comment). Thus, if whether to do map join is based on file size, the executor can run OOM. On the other hand, rawDateSize is more conservative on memory estimation, which also gives less opportunity for map-join. I'm not sure which one is better for Hive on Spark. File size is what hive.auto.convert.join.noconditionaltask.size implies and what user can see, while rawDataSize is closer to memory required. However, once OOM happens, user gets no result. It's worse than a result that comes slower, right? Any thoughts? Hive on Spark is not as aggressive as MR on map join [Spark Branch] --- Key: HIVE-9697 URL: https://issues.apache.org/jira/browse/HIVE-9697 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xin Hao We have a finding during running some Big-Bench cases: when the same small table size threshold is used, Map Join operator will not be generated in Stage Plans for Hive on Spark, while will be generated for Hive on MR. For example, When we run BigBench Q25, the meta info of one input ORC table is as below: totalSize=1748955 (about 1.5M) rawDataSize=123050375 (about 120M) If we use the following parameter settings, set hive.auto.convert.join=true; set hive.mapjoin.smalltable.filesize=2500; set hive.auto.convert.join.noconditionaltask=true; set hive.auto.convert.join.noconditionaltask.size=1; (100M) Map Join will be enabled for Hive on MR mode, while will not be enabled for Hive on Spark. We found that for Hive on MR, the HDFS file size for the table (ContentSummary.getLength(), should approximate the value of ‘totalSize’) will be used to compare with the threshold 100M (smaller than 100M), while for Hive on Spark 'rawDataSize' will be used to compare with the threshold 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark for this case. And as a result Hive on Spark will get much lower performance data than Hive on MR for this case. When we set hive.auto.convert.join.noconditionaltask.size=15000; (150M), MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have similar performance data with Hive on MR by then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9934) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password
[ https://issues.apache.org/jira/browse/HIVE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364092#comment-14364092 ] Xuefu Zhang commented on HIVE-9934: --- +1 Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password -- Key: HIVE-9934 URL: https://issues.apache.org/jira/browse/HIVE-9934 Project: Hive Issue Type: Bug Components: Security Affects Versions: 1.1.0 Reporter: Chao Assignee: Chao Attachments: HIVE-9934.1.patch, HIVE-9934.2.patch Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password. See: http://docs.oracle.com/javase/jndi/tutorial/ldap/security/simple.html “If you supply an empty string, an empty byte/char array, or null to the Context.SECURITY_CREDENTIALS environment property, then the authentication mechanism will be none. This is because the LDAP requires the password to be nonempty for simple authentication. The protocol automatically converts the authentication to none if a password is not supplied.” Since the LdapAuthenticationProviderImpl.Authenticate method is relying on a NamingException being thrown during creation of initial context, it does not fail when the context result is an “unauthenticated” positive response from the LDAP server. The end result is, one can authenticate with HiveServer2 using the LdapAuthenticationProviderImpl with only a user name and an empty password. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9934) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password
[ https://issues.apache.org/jira/browse/HIVE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365832#comment-14365832 ] Xuefu Zhang commented on HIVE-9934: --- [~prasadm], I think lacking @Test seems fine in this case, as the class is extended from TestCase. I also saw the added test case was run in previous test result. Thus, patch #3 is good as far as I can see. Let me know if you see differently. Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password -- Key: HIVE-9934 URL: https://issues.apache.org/jira/browse/HIVE-9934 Project: Hive Issue Type: Bug Components: Security Affects Versions: 1.1.0 Reporter: Chao Assignee: Chao Attachments: HIVE-9934.1.patch, HIVE-9934.2.patch, HIVE-9934.3.patch, HIVE-9934.3.patch, HIVE-9934.4.patch Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password. See: http://docs.oracle.com/javase/jndi/tutorial/ldap/security/simple.html “If you supply an empty string, an empty byte/char array, or null to the Context.SECURITY_CREDENTIALS environment property, then the authentication mechanism will be none. This is because the LDAP requires the password to be nonempty for simple authentication. The protocol automatically converts the authentication to none if a password is not supplied.” Since the LdapAuthenticationProviderImpl.Authenticate method is relying on a NamingException being thrown during creation of initial context, it does not fail when the context result is an “unauthenticated” positive response from the LDAP server. The end result is, one can authenticate with HiveServer2 using the LdapAuthenticationProviderImpl with only a user name and an empty password. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9991) Cannot do a SELECT on external tables that are on S3 due to Encryption error
[ https://issues.apache.org/jira/browse/HIVE-9991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365606#comment-14365606 ] Xuefu Zhang commented on HIVE-9991: --- +1 pending on test Cannot do a SELECT on external tables that are on S3 due to Encryption error Key: HIVE-9991 URL: https://issues.apache.org/jira/browse/HIVE-9991 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9991.1.patch I cannot do any select query on external tables that are not part of HDFS. For example S3. {code} Select * from my_table limit 10; FAILED: SemanticException Unable to determine if s3n://my-bucket/is encrypted: java.lang.IllegalArgumentException: Wrong FS: s3n://my-bucket/, expected: hdfs://0.0.0.0:8020 {code} This error is due to a internal function that checks if a table is encrypted or not. This is only supported on HDFS files, but the check is happening on any external table as well causing the above error. To fix this, we should check for encrypted tables only for HDFS tables. And skip the check for any other file schema. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9934) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password
[ https://issues.apache.org/jira/browse/HIVE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9934: -- Attachment: HIVE-9934.4.patch Update the patch, adding @Test annotation. Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password -- Key: HIVE-9934 URL: https://issues.apache.org/jira/browse/HIVE-9934 Project: Hive Issue Type: Bug Components: Security Affects Versions: 1.1.0 Reporter: Chao Assignee: Chao Attachments: HIVE-9934.1.patch, HIVE-9934.2.patch, HIVE-9934.3.patch, HIVE-9934.3.patch, HIVE-9934.4.patch Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password. See: http://docs.oracle.com/javase/jndi/tutorial/ldap/security/simple.html “If you supply an empty string, an empty byte/char array, or null to the Context.SECURITY_CREDENTIALS environment property, then the authentication mechanism will be none. This is because the LDAP requires the password to be nonempty for simple authentication. The protocol automatically converts the authentication to none if a password is not supplied.” Since the LdapAuthenticationProviderImpl.Authenticate method is relying on a NamingException being thrown during creation of initial context, it does not fail when the context result is an “unauthenticated” positive response from the LDAP server. The end result is, one can authenticate with HiveServer2 using the LdapAuthenticationProviderImpl with only a user name and an empty password. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HIVE-9934) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without
[ https://issues.apache.org/jira/browse/HIVE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9934: -- Comment: was deleted (was: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12705130/HIVE-9934.4.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3059/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3059/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3059/ Messages: {noformat} This message was trimmed, see log for full details [INFO] Excluding org.scala-lang:scala-compiler:jar:2.10.0 from the shaded jar. [INFO] Excluding org.scala-lang:scala-reflect:jar:2.10.0 from the shaded jar. [INFO] Excluding com.fasterxml.jackson.core:jackson-databind:jar:2.3.1 from the shaded jar. [INFO] Excluding com.fasterxml.jackson.core:jackson-annotations:jar:2.3.0 from the shaded jar. [INFO] Excluding com.fasterxml.jackson.core:jackson-core:jar:2.3.1 from the shaded jar. [INFO] Excluding org.apache.mesos:mesos:jar:shaded-protobuf:0.18.1 from the shaded jar. [INFO] Excluding com.clearspring.analytics:stream:jar:2.7.0 from the shaded jar. [INFO] Excluding com.codahale.metrics:metrics-core:jar:3.0.0 from the shaded jar. [INFO] Excluding com.codahale.metrics:metrics-jvm:jar:3.0.0 from the shaded jar. [INFO] Excluding com.codahale.metrics:metrics-json:jar:3.0.0 from the shaded jar. [INFO] Excluding com.codahale.metrics:metrics-graphite:jar:3.0.0 from the shaded jar. [INFO] Excluding org.tachyonproject:tachyon-client:jar:0.5.0 from the shaded jar. [INFO] Excluding org.tachyonproject:tachyon:jar:0.5.0 from the shaded jar. [INFO] Excluding org.spark-project:pyrolite:jar:2.0.1 from the shaded jar. [INFO] Excluding net.sf.py4j:py4j:jar:0.8.2.1 from the shaded jar. [INFO] Excluding org.spark-project.spark:unused:jar:1.0.0 from the shaded jar. [INFO] Excluding org.apache.hadoop:hadoop-common:jar:2.6.0 from the shaded jar. [INFO] Excluding xmlenc:xmlenc:jar:0.52 from the shaded jar. [INFO] Excluding javax.servlet:servlet-api:jar:2.5 from the shaded jar. [INFO] Excluding org.mortbay.jetty:jetty:jar:6.1.26 from the shaded jar. [INFO] Excluding org.mortbay.jetty:jetty-util:jar:6.1.26 from the shaded jar. [INFO] Excluding com.sun.jersey:jersey-core:jar:1.14 from the shaded jar. [INFO] Excluding com.sun.jersey:jersey-json:jar:1.14 from the shaded jar. [INFO] Excluding org.codehaus.jettison:jettison:jar:1.1 from the shaded jar. [INFO] Excluding com.sun.xml.bind:jaxb-impl:jar:2.2.3-1 from the shaded jar. [INFO] Excluding com.sun.jersey:jersey-server:jar:1.14 from the shaded jar. [INFO] Excluding asm:asm:jar:3.1 from the shaded jar. [INFO] Excluding tomcat:jasper-compiler:jar:5.5.23 from the shaded jar. [INFO] Excluding tomcat:jasper-runtime:jar:5.5.23 from the shaded jar. [INFO] Excluding javax.servlet.jsp:jsp-api:jar:2.1 from the shaded jar. [INFO] Excluding commons-el:commons-el:jar:1.0 from the shaded jar. [INFO] Excluding commons-configuration:commons-configuration:jar:1.6 from the shaded jar. [INFO] Excluding commons-digester:commons-digester:jar:1.8 from the shaded jar. [INFO] Excluding commons-beanutils:commons-beanutils:jar:1.7.0 from the shaded jar. [INFO] Excluding commons-beanutils:commons-beanutils-core:jar:1.8.0 from the shaded jar. [INFO] Excluding com.google.code.gson:gson:jar:2.2.4 from the shaded jar. [INFO] Excluding org.apache.hadoop:hadoop-auth:jar:2.6.0 from the shaded jar. [INFO] Excluding org.apache.directory.server:apacheds-kerberos-codec:jar:2.0.0-M15 from the shaded jar. [INFO] Excluding org.apache.directory.server:apacheds-i18n:jar:2.0.0-M15 from the shaded jar. [INFO] Excluding org.apache.directory.api:api-asn1-api:jar:1.0.0-M20 from the shaded jar. [INFO] Excluding org.apache.directory.api:api-util:jar:1.0.0-M20 from the shaded jar. [INFO] Excluding com.jcraft:jsch:jar:0.1.42 from the shaded jar. [INFO] Excluding org.htrace:htrace-core:jar:3.0.4 from the shaded jar. [INFO] Excluding org.apache.hadoop:hadoop-archives:jar:2.6.0 from the shaded jar. [INFO] Excluding org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.6.0 from the shaded jar. [INFO] Excluding com.google.inject.extensions:guice-servlet:jar:3.0 from the shaded jar. [INFO] Excluding org.apache.hadoop:hadoop-yarn-server-common:jar:2.6.0 from the shaded jar. [INFO] Excluding org.fusesource.leveldbjni:leveldbjni-all:jar:1.8 from the shaded jar. [INFO] Excluding org.apache.hadoop:hadoop-hdfs:jar:2.6.0 from the shaded jar. [INFO] Excluding commons-daemon:commons-daemon:jar:1.0.13 from the shaded jar. [INFO] Excluding xerces:xercesImpl:jar:2.9.1 from the shaded jar. [INFO] Excluding xml-apis:xml-apis:jar:1.3.04 from the shaded jar. [INFO] Excluding
[jira] [Updated] (HIVE-9934) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password
[ https://issues.apache.org/jira/browse/HIVE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9934: -- Attachment: (was: HIVE-9934.4.patch) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password -- Key: HIVE-9934 URL: https://issues.apache.org/jira/browse/HIVE-9934 Project: Hive Issue Type: Bug Components: Security Affects Versions: 1.1.0 Reporter: Chao Assignee: Chao Attachments: HIVE-9934.1.patch, HIVE-9934.2.patch, HIVE-9934.3.patch, HIVE-9934.3.patch Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password. See: http://docs.oracle.com/javase/jndi/tutorial/ldap/security/simple.html “If you supply an empty string, an empty byte/char array, or null to the Context.SECURITY_CREDENTIALS environment property, then the authentication mechanism will be none. This is because the LDAP requires the password to be nonempty for simple authentication. The protocol automatically converts the authentication to none if a password is not supplied.” Since the LdapAuthenticationProviderImpl.Authenticate method is relying on a NamingException being thrown during creation of initial context, it does not fail when the context result is an “unauthenticated” positive response from the LDAP server. The end result is, one can authenticate with HiveServer2 using the LdapAuthenticationProviderImpl with only a user name and an empty password. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9991) Cannot do a SELECT on external tables that are on S3 due to Encryption error
[ https://issues.apache.org/jira/browse/HIVE-9991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366213#comment-14366213 ] Xuefu Zhang commented on HIVE-9991: --- [~spena], it seems the above failed test has a result diff. You might need to regenerate the test output. Cannot do a SELECT on external tables that are on S3 due to Encryption error Key: HIVE-9991 URL: https://issues.apache.org/jira/browse/HIVE-9991 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9991.1.patch, HIVE-9991.2.patch, HIVE-9991.3.patch I cannot do any select query on external tables that are not part of HDFS. For example S3. {code} Select * from my_table limit 10; FAILED: SemanticException Unable to determine if s3n://my-bucket/is encrypted: java.lang.IllegalArgumentException: Wrong FS: s3n://my-bucket/, expected: hdfs://0.0.0.0:8020 {code} This error is due to a internal function that checks if a table is encrypted or not. This is only supported on HDFS files, but the check is happening on any external table as well causing the above error. To fix this, we should check for encrypted tables only for HDFS tables. And skip the check for any other file schema. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others
[ https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365134#comment-14365134 ] Xuefu Zhang commented on HIVE-7018: --- Patch looks fine. However, I don't quite understand why we are also removing the following: {code} - CONSTRAINT `PARTITIONS_FK2` FOREIGN KEY (`SD_ID`) REFERENCES `SDS` (`SD_ID`), ... - CONSTRAINT `TBLS_FK2` FOREIGN KEY (`DB_ID`) REFERENCES `DBS` (`DB_ID`), {code} This doesn't seem related to LINK_TARGET_ID. Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others - Key: HIVE-7018 URL: https://issues.apache.org/jira/browse/HIVE-7018 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Yongzhi Chen Attachments: HIVE-7018.1.patch It appears that at least postgres and oracle do not have the LINK_TARGET_ID column while mysql does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365114#comment-14365114 ] Xuefu Zhang commented on HIVE-9697: --- It seems that we all agree that rawDataSize is more practical for Spark. Could anyone give a summary on if it's the default or how to make it as default? If code change is required, we can propose a patch here. Thanks. Hive on Spark is not as aggressive as MR on map join [Spark Branch] --- Key: HIVE-9697 URL: https://issues.apache.org/jira/browse/HIVE-9697 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xin Hao We have a finding during running some Big-Bench cases: when the same small table size threshold is used, Map Join operator will not be generated in Stage Plans for Hive on Spark, while will be generated for Hive on MR. For example, When we run BigBench Q25, the meta info of one input ORC table is as below: totalSize=1748955 (about 1.5M) rawDataSize=123050375 (about 120M) If we use the following parameter settings, set hive.auto.convert.join=true; set hive.mapjoin.smalltable.filesize=2500; set hive.auto.convert.join.noconditionaltask=true; set hive.auto.convert.join.noconditionaltask.size=1; (100M) Map Join will be enabled for Hive on MR mode, while will not be enabled for Hive on Spark. We found that for Hive on MR, the HDFS file size for the table (ContentSummary.getLength(), should approximate the value of ‘totalSize’) will be used to compare with the threshold 100M (smaller than 100M), while for Hive on Spark 'rawDataSize' will be used to compare with the threshold 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark for this case. And as a result Hive on Spark will get much lower performance data than Hive on MR for this case. When we set hive.auto.convert.join.noconditionaltask.size=15000; (150M), MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have similar performance data with Hive on MR by then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370647#comment-14370647 ] Xuefu Zhang commented on HIVE-9697: --- Thanks, Rui/Chao. So here is what we recommend/conclude for Spark: Spark prefers rawDataSize for map-join memory estimation. Thus, hive.stats.collect.rawdatasize should be set true, which is the default. If this configuration is set to false, then fileSize will be used instead for memory estimation, which may not be as accurate. Agree? Hive on Spark is not as aggressive as MR on map join [Spark Branch] --- Key: HIVE-9697 URL: https://issues.apache.org/jira/browse/HIVE-9697 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xin Hao We have a finding during running some Big-Bench cases: when the same small table size threshold is used, Map Join operator will not be generated in Stage Plans for Hive on Spark, while will be generated for Hive on MR. For example, When we run BigBench Q25, the meta info of one input ORC table is as below: totalSize=1748955 (about 1.5M) rawDataSize=123050375 (about 120M) If we use the following parameter settings, set hive.auto.convert.join=true; set hive.mapjoin.smalltable.filesize=2500; set hive.auto.convert.join.noconditionaltask=true; set hive.auto.convert.join.noconditionaltask.size=1; (100M) Map Join will be enabled for Hive on MR mode, while will not be enabled for Hive on Spark. We found that for Hive on MR, the HDFS file size for the table (ContentSummary.getLength(), should approximate the value of ‘totalSize’) will be used to compare with the threshold 100M (smaller than 100M), while for Hive on Spark 'rawDataSize' will be used to compare with the threshold 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark for this case. And as a result Hive on Spark will get much lower performance data than Hive on MR for this case. When we set hive.auto.convert.join.noconditionaltask.size=15000; (150M), MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have similar performance data with Hive on MR by then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370647#comment-14370647 ] Xuefu Zhang edited comment on HIVE-9697 at 3/20/15 3:30 AM: Thanks, Rui/Chao. So here is what we recommend/conclude for Spark: {quote} Spark prefers rawDataSize for map-join memory estimation. Thus, hive.stats.collect.rawdatasize should be set true, which is the default. If this configuration is set to false, then fileSize will be used instead for memory estimation, which may not be as accurate. {quote} Agree? was (Author: xuefuz): Thanks, Rui/Chao. So here is what we recommend/conclude for Spark: Spark prefers rawDataSize for map-join memory estimation. Thus, hive.stats.collect.rawdatasize should be set true, which is the default. If this configuration is set to false, then fileSize will be used instead for memory estimation, which may not be as accurate. Agree? Hive on Spark is not as aggressive as MR on map join [Spark Branch] --- Key: HIVE-9697 URL: https://issues.apache.org/jira/browse/HIVE-9697 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xin Hao We have a finding during running some Big-Bench cases: when the same small table size threshold is used, Map Join operator will not be generated in Stage Plans for Hive on Spark, while will be generated for Hive on MR. For example, When we run BigBench Q25, the meta info of one input ORC table is as below: totalSize=1748955 (about 1.5M) rawDataSize=123050375 (about 120M) If we use the following parameter settings, set hive.auto.convert.join=true; set hive.mapjoin.smalltable.filesize=2500; set hive.auto.convert.join.noconditionaltask=true; set hive.auto.convert.join.noconditionaltask.size=1; (100M) Map Join will be enabled for Hive on MR mode, while will not be enabled for Hive on Spark. We found that for Hive on MR, the HDFS file size for the table (ContentSummary.getLength(), should approximate the value of ‘totalSize’) will be used to compare with the threshold 100M (smaller than 100M), while for Hive on Spark 'rawDataSize' will be used to compare with the threshold 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark for this case. And as a result Hive on Spark will get much lower performance data than Hive on MR for this case. When we set hive.auto.convert.join.noconditionaltask.size=15000; (150M), MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have similar performance data with Hive on MR by then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370647#comment-14370647 ] Xuefu Zhang edited comment on HIVE-9697 at 3/20/15 3:31 AM: Thanks, Rui/Chao. So here is what we recommend/conclude for Spark: {quote} Spark prefers rawDataSize for map-join memory estimation. Thus, hive.stats.collect.rawdatasize should be set true, which is the default. If this configuration is set to false, then fileSize will be used instead for estimation, which may not be as accurate. {quote} Agree? was (Author: xuefuz): Thanks, Rui/Chao. So here is what we recommend/conclude for Spark: {quote} Spark prefers rawDataSize for map-join memory estimation. Thus, hive.stats.collect.rawdatasize should be set true, which is the default. If this configuration is set to false, then fileSize will be used instead for memory estimation, which may not be as accurate. {quote} Agree? Hive on Spark is not as aggressive as MR on map join [Spark Branch] --- Key: HIVE-9697 URL: https://issues.apache.org/jira/browse/HIVE-9697 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xin Hao We have a finding during running some Big-Bench cases: when the same small table size threshold is used, Map Join operator will not be generated in Stage Plans for Hive on Spark, while will be generated for Hive on MR. For example, When we run BigBench Q25, the meta info of one input ORC table is as below: totalSize=1748955 (about 1.5M) rawDataSize=123050375 (about 120M) If we use the following parameter settings, set hive.auto.convert.join=true; set hive.mapjoin.smalltable.filesize=2500; set hive.auto.convert.join.noconditionaltask=true; set hive.auto.convert.join.noconditionaltask.size=1; (100M) Map Join will be enabled for Hive on MR mode, while will not be enabled for Hive on Spark. We found that for Hive on MR, the HDFS file size for the table (ContentSummary.getLength(), should approximate the value of ‘totalSize’) will be used to compare with the threshold 100M (smaller than 100M), while for Hive on Spark 'rawDataSize' will be used to compare with the threshold 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark for this case. And as a result Hive on Spark will get much lower performance data than Hive on MR for this case. When we set hive.auto.convert.join.noconditionaltask.size=15000; (150M), MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have similar performance data with Hive on MR by then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370669#comment-14370669 ] Xuefu Zhang commented on HIVE-9697: --- Yes. We should. Hive on Spark is not as aggressive as MR on map join [Spark Branch] --- Key: HIVE-9697 URL: https://issues.apache.org/jira/browse/HIVE-9697 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xin Hao Labels: TODOC1.2 We have a finding during running some Big-Bench cases: when the same small table size threshold is used, Map Join operator will not be generated in Stage Plans for Hive on Spark, while will be generated for Hive on MR. For example, When we run BigBench Q25, the meta info of one input ORC table is as below: totalSize=1748955 (about 1.5M) rawDataSize=123050375 (about 120M) If we use the following parameter settings, set hive.auto.convert.join=true; set hive.mapjoin.smalltable.filesize=2500; set hive.auto.convert.join.noconditionaltask=true; set hive.auto.convert.join.noconditionaltask.size=1; (100M) Map Join will be enabled for Hive on MR mode, while will not be enabled for Hive on Spark. We found that for Hive on MR, the HDFS file size for the table (ContentSummary.getLength(), should approximate the value of ‘totalSize’) will be used to compare with the threshold 100M (smaller than 100M), while for Hive on Spark 'rawDataSize' will be used to compare with the threshold 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark for this case. And as a result Hive on Spark will get much lower performance data than Hive on MR for this case. When we set hive.auto.convert.join.noconditionaltask.size=15000; (150M), MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have similar performance data with Hive on MR by then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9697: -- Labels: TODOC1.2 (was: ) Hive on Spark is not as aggressive as MR on map join [Spark Branch] --- Key: HIVE-9697 URL: https://issues.apache.org/jira/browse/HIVE-9697 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xin Hao Labels: TODOC1.2 We have a finding during running some Big-Bench cases: when the same small table size threshold is used, Map Join operator will not be generated in Stage Plans for Hive on Spark, while will be generated for Hive on MR. For example, When we run BigBench Q25, the meta info of one input ORC table is as below: totalSize=1748955 (about 1.5M) rawDataSize=123050375 (about 120M) If we use the following parameter settings, set hive.auto.convert.join=true; set hive.mapjoin.smalltable.filesize=2500; set hive.auto.convert.join.noconditionaltask=true; set hive.auto.convert.join.noconditionaltask.size=1; (100M) Map Join will be enabled for Hive on MR mode, while will not be enabled for Hive on Spark. We found that for Hive on MR, the HDFS file size for the table (ContentSummary.getLength(), should approximate the value of ‘totalSize’) will be used to compare with the threshold 100M (smaller than 100M), while for Hive on Spark 'rawDataSize' will be used to compare with the threshold 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark for this case. And as a result Hive on Spark will get much lower performance data than Hive on MR for this case. When we set hive.auto.convert.join.noconditionaltask.size=15000; (150M), MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have similar performance data with Hive on MR by then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-9697. --- Resolution: Won't Fix This should be just a doc fix, as discussed above. Hive on Spark is not as aggressive as MR on map join [Spark Branch] --- Key: HIVE-9697 URL: https://issues.apache.org/jira/browse/HIVE-9697 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xin Hao Labels: TODOC-SPARK We have a finding during running some Big-Bench cases: when the same small table size threshold is used, Map Join operator will not be generated in Stage Plans for Hive on Spark, while will be generated for Hive on MR. For example, When we run BigBench Q25, the meta info of one input ORC table is as below: totalSize=1748955 (about 1.5M) rawDataSize=123050375 (about 120M) If we use the following parameter settings, set hive.auto.convert.join=true; set hive.mapjoin.smalltable.filesize=2500; set hive.auto.convert.join.noconditionaltask=true; set hive.auto.convert.join.noconditionaltask.size=1; (100M) Map Join will be enabled for Hive on MR mode, while will not be enabled for Hive on Spark. We found that for Hive on MR, the HDFS file size for the table (ContentSummary.getLength(), should approximate the value of ‘totalSize’) will be used to compare with the threshold 100M (smaller than 100M), while for Hive on Spark 'rawDataSize' will be used to compare with the threshold 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark for this case. And as a result Hive on Spark will get much lower performance data than Hive on MR for this case. When we set hive.auto.convert.join.noconditionaltask.size=15000; (150M), MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have similar performance data with Hive on MR by then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370615#comment-14370615 ] Xuefu Zhang commented on HIVE-9697: --- Can we put a closure on this? Basically we'd like to confirm/undersand: 1. MR always use file size. 2. Spark should always use rawDataSize. If this is the case, what configs need to be set so as to make rawDataSize available. What happens if it's not available. Thanks, Xuefu Hive on Spark is not as aggressive as MR on map join [Spark Branch] --- Key: HIVE-9697 URL: https://issues.apache.org/jira/browse/HIVE-9697 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xin Hao We have a finding during running some Big-Bench cases: when the same small table size threshold is used, Map Join operator will not be generated in Stage Plans for Hive on Spark, while will be generated for Hive on MR. For example, When we run BigBench Q25, the meta info of one input ORC table is as below: totalSize=1748955 (about 1.5M) rawDataSize=123050375 (about 120M) If we use the following parameter settings, set hive.auto.convert.join=true; set hive.mapjoin.smalltable.filesize=2500; set hive.auto.convert.join.noconditionaltask=true; set hive.auto.convert.join.noconditionaltask.size=1; (100M) Map Join will be enabled for Hive on MR mode, while will not be enabled for Hive on Spark. We found that for Hive on MR, the HDFS file size for the table (ContentSummary.getLength(), should approximate the value of ‘totalSize’) will be used to compare with the threshold 100M (smaller than 100M), while for Hive on Spark 'rawDataSize' will be used to compare with the threshold 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark for this case. And as a result Hive on Spark will get much lower performance data than Hive on MR for this case. When we set hive.auto.convert.join.noconditionaltask.size=15000; (150M), MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have similar performance data with Hive on MR by then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10006) RSC has memory leak while execute multi queries.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371009#comment-14371009 ] Xuefu Zhang commented on HIVE-10006: Re: Besides, is this ThreadLocal MapWork/ReduceWork cache new introduced optimization? Yes, it's introduced in HIVE-9127. Looks like we need to be careful about this threadlocal map, indeed. RSC has memory leak while execute multi queries.[Spark Branch] -- Key: HIVE-10006 URL: https://issues.apache.org/jira/browse/HIVE-10006 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.1.0 Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Critical Labels: Spark-M5 Attachments: HIVE-10006.1-spark.patch, HIVE-10006.2-spark.patch, HIVE-10006.2-spark.patch, HIVE-10006.3-spark.patch, HIVE-10006.4-spark.patch, HIVE-10006.5-spark.patch, HIVE-10006.6-spark.patch, HIVE-10006.7-spark.patch While execute query with RSC, MapWork/ReduceWork number is increased all the time, and lead to OOM at the end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10006) RSC has memory leak while execute multi queries.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372803#comment-14372803 ] Xuefu Zhang commented on HIVE-10006: +1 for patch #8. One nit, It would be great if we can put a similar comment on changes in SparkPlanGenerator.java. Also, we can create a JIRA for HiveInputFormat to track the issue, but no fix is necessary at the moment. RSC has memory leak while execute multi queries.[Spark Branch] -- Key: HIVE-10006 URL: https://issues.apache.org/jira/browse/HIVE-10006 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 1.1.0 Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Critical Labels: Spark-M5 Attachments: HIVE-10006.1-spark.patch, HIVE-10006.2-spark.patch, HIVE-10006.2-spark.patch, HIVE-10006.3-spark.patch, HIVE-10006.4-spark.patch, HIVE-10006.5-spark.patch, HIVE-10006.6-spark.patch, HIVE-10006.7-spark.patch, HIVE-10006.8-spark.patch While execute query with RSC, MapWork/ReduceWork number is increased all the time, and lead to OOM at the end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others
[ https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369337#comment-14369337 ] Xuefu Zhang commented on HIVE-7018: --- [~ctang.ma], what's your thought on the latest patch? Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others - Key: HIVE-7018 URL: https://issues.apache.org/jira/browse/HIVE-7018 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Yongzhi Chen Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch It appears that at least postgres and oracle do not have the LINK_TARGET_ID column while mysql does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10017) SparkTask log improvement [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369288#comment-14369288 ] Xuefu Zhang commented on HIVE-10017: +1 SparkTask log improvement [Spark Branch] Key: HIVE-10017 URL: https://issues.apache.org/jira/browse/HIVE-10017 Project: Hive Issue Type: Bug Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Priority: Minor Fix For: spark-branch Attachments: HIVE-10017.1-spark.patch Initialize log object in the own class for better log message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9934) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password
[ https://issues.apache.org/jira/browse/HIVE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370174#comment-14370174 ] Xuefu Zhang commented on HIVE-9934: --- Apache has special guidelines regarding security vulnerabilities. Here is the link: http://www.apache.org/security/committers We are all new to this, so what we have done so far may not comply to this. However, we should try to do so from now on. For doc, please also refer to the document. AS to the vulnerability, discussion is still ongoing in the community. Thus, we will act based on the conclusions. Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password -- Key: HIVE-9934 URL: https://issues.apache.org/jira/browse/HIVE-9934 Project: Hive Issue Type: Bug Components: Security Affects Versions: 1.1.0 Reporter: Chao Assignee: Chao Fix For: 1.2.0 Attachments: HIVE-9934.1.patch, HIVE-9934.2.patch, HIVE-9934.3.patch, HIVE-9934.3.patch Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password. See: http://docs.oracle.com/javase/jndi/tutorial/ldap/security/simple.html “If you supply an empty string, an empty byte/char array, or null to the Context.SECURITY_CREDENTIALS environment property, then the authentication mechanism will be none. This is because the LDAP requires the password to be nonempty for simple authentication. The protocol automatically converts the authentication to none if a password is not supplied.” Since the LdapAuthenticationProviderImpl.Authenticate method is relying on a NamingException being thrown during creation of initial context, it does not fail when the context result is an “unauthenticated” positive response from the LDAP server. The end result is, one can authenticate with HiveServer2 using the LdapAuthenticationProviderImpl with only a user name and an empty password. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9990) TestMultiSessionsHS2WithLocalClusterSpark is failing
[ https://issues.apache.org/jira/browse/HIVE-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9990: -- Description: At least sometimes. I can reproduce it with mvn test -Dtest=TestMultiSessionsHS2WithLocalClusterSpark -Phadoop-2 consistently on my local box (both trunk and spark branch). {code} --- T E S T S --- Running org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 92.438 sec FAILURE! - in org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark testSparkQuery(org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark) Time elapsed: 21.514 sec ERROR! java.util.concurrent.ExecutionException: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296) at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392) at org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.verifyResult(TestMultiSessionsHS2WithLocalClusterSpark.java:244) at org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testKvQuery(TestMultiSessionsHS2WithLocalClusterSpark.java:220) at org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.access$000(TestMultiSessionsHS2WithLocalClusterSpark.java:53) {code} The error was also seen in HIVE-9934 test run. was: At least sometimes. I can reproduce it with mvn test -Dtest=TestMultiSessionsHS2WithLocalClusterSpark -Phadoop-2 consistently on my local box. {code} --- T E S T S --- Running org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 92.438 sec FAILURE! - in org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark testSparkQuery(org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark) Time elapsed: 21.514 sec ERROR! java.util.concurrent.ExecutionException: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296) at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392) at org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.verifyResult(TestMultiSessionsHS2WithLocalClusterSpark.java:244) at org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testKvQuery(TestMultiSessionsHS2WithLocalClusterSpark.java:220) at org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.access$000(TestMultiSessionsHS2WithLocalClusterSpark.java:53) {code} The error was also seen in HIVE-9934 test run. TestMultiSessionsHS2WithLocalClusterSpark is failing Key: HIVE-9990 URL: https://issues.apache.org/jira/browse/HIVE-9990 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 1.2.0 Reporter: Xuefu Zhang At least sometimes. I can reproduce it with mvn test -Dtest=TestMultiSessionsHS2WithLocalClusterSpark -Phadoop-2 consistently on my local box (both trunk and spark branch). {code} --- T E S T S --- Running org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 92.438 sec FAILURE! - in org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark testSparkQuery(org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark) Time elapsed: 21.514 sec ERROR! java.util.concurrent.ExecutionException: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296) at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392) at org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.verifyResult(TestMultiSessionsHS2WithLocalClusterSpark.java:244) at org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testKvQuery(TestMultiSessionsHS2WithLocalClusterSpark.java:220) at org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.access$000(TestMultiSessionsHS2WithLocalClusterSpark.java:53) {code} The error was also seen in HIVE-9934 test run. -- This message was sent by
[jira] [Updated] (HIVE-9934) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password
[ https://issues.apache.org/jira/browse/HIVE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9934: -- Attachment: HIVE-9934.3.patch Attached the same patch for another test run. Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password -- Key: HIVE-9934 URL: https://issues.apache.org/jira/browse/HIVE-9934 Project: Hive Issue Type: Bug Components: Security Affects Versions: 1.1.0 Reporter: Chao Assignee: Chao Attachments: HIVE-9934.1.patch, HIVE-9934.2.patch, HIVE-9934.3.patch, HIVE-9934.3.patch Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to none, allowing authentication without password. See: http://docs.oracle.com/javase/jndi/tutorial/ldap/security/simple.html “If you supply an empty string, an empty byte/char array, or null to the Context.SECURITY_CREDENTIALS environment property, then the authentication mechanism will be none. This is because the LDAP requires the password to be nonempty for simple authentication. The protocol automatically converts the authentication to none if a password is not supplied.” Since the LdapAuthenticationProviderImpl.Authenticate method is relying on a NamingException being thrown during creation of initial context, it does not fail when the context result is an “unauthenticated” positive response from the LDAP server. The end result is, one can authenticate with HiveServer2 using the LdapAuthenticationProviderImpl with only a user name and an empty password. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9302) Beeline add commands to register local jdbc driver names and jars
[ https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351207#comment-14351207 ] Xuefu Zhang commented on HIVE-9302: --- Thank you, [~leftylev]. Beeline add commands to register local jdbc driver names and jars - Key: HIVE-9302 URL: https://issues.apache.org/jira/browse/HIVE-9302 Project: Hive Issue Type: New Feature Reporter: Brock Noland Assignee: Ferdinand Xu Labels: TODOC1.2 Fix For: 1.2.0 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, HIVE-9302.2.patch, HIVE-9302.3.patch, HIVE-9302.3.patch, HIVE-9302.4.patch, HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar At present if a beeline user uses {{add jar}} the path they give is actually on the HS2 server. It'd be great to allow beeline users to add local jdbc driver jars and register custom jdbc driver names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9889) Merge trunk to Spark branch 3/6/2015 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9889: -- Attachment: HIVE-9889.2-spark.patch Regenerate the patch since some patches were merged individually. Merge trunk to Spark branch 3/6/2015 [Spark Branch] --- Key: HIVE-9889 URL: https://issues.apache.org/jira/browse/HIVE-9889 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-9889.1-spark.patch, HIVE-9889.2-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9871) Print spark job id in history file [spark branch]
[ https://issues.apache.org/jira/browse/HIVE-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353468#comment-14353468 ] Xuefu Zhang commented on HIVE-9871: --- [~chinnalalam], thanks for working on this. Patch looks good, but I'm wondering if you can come up with a better name for the private method added. Something like recordJobId() or addToHistory(), etc. Print spark job id in history file [spark branch] - Key: HIVE-9871 URL: https://issues.apache.org/jira/browse/HIVE-9871 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-9871.1-spark.patch Maintain the spark job id in history file for the corresponding queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354331#comment-14354331 ] Xuefu Zhang commented on HIVE-9659: --- [~ruili], let's create a JIRA for MR and move on. We enable the test only for Spark. 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch] --- Key: HIVE-9659 URL: https://issues.apache.org/jira/browse/HIVE-9659 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xin Hao Assignee: Rui Li Attachments: HIVE-9659.1-spark.patch, HIVE-9659.2-spark.patch, HIVE-9659.3-spark.patch We found that 'Error while trying to create table container' occurs during Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'. If hive.optimize.skewjoin set to 'false', the case could pass. How to reproduce: 1. set hive.optimize.skewjoin=true; 2. Run BigBench case Q12 and it will fail. Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you will found error 'Error while trying to create table container' in the log and also a NullPointerException near the end of the log. (a) Detail error message for 'Error while trying to create table container': {noformat} 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158) at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115) ... 21 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a directory: hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106) ... 22 more 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480 15/02/12 01:29:49 INFO PerfLogger: PERFLOG method=SparkInitializeOperators
[jira] [Commented] (HIVE-9569) Enable more unit tests for UNION ALL [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14354339#comment-14354339 ] Xuefu Zhang commented on HIVE-9569: --- +1 Enable more unit tests for UNION ALL [Spark Branch] --- Key: HIVE-9569 URL: https://issues.apache.org/jira/browse/HIVE-9569 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Attachments: HIVE-9569.1-spark.patch, HIVE-9569.1.patch, HIVE-9569.2.patch, HIVE-9569.3.patch, HIVE-9569.4.patch, HIVE-9569.5.patch Currently, we only enabled a subset of all the union tests. We should try to enable the rest, and see if there's any issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9924) Add SORT_QUERY_RESULTS to union12.q
[ https://issues.apache.org/jira/browse/HIVE-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357035#comment-14357035 ] Xuefu Zhang commented on HIVE-9924: --- Yes. Let's fix for Spark branch first. Add SORT_QUERY_RESULTS to union12.q --- Key: HIVE-9924 URL: https://issues.apache.org/jira/browse/HIVE-9924 Project: Hive Issue Type: Test Reporter: Rui Li Assignee: Rui Li Priority: Minor Attachments: HIVE-9924.1-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9516) Enable CBO related tests [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355347#comment-14355347 ] Xuefu Zhang commented on HIVE-9516: --- +1 Enable CBO related tests [Spark Branch] --- Key: HIVE-9516 URL: https://issues.apache.org/jira/browse/HIVE-9516 Project: Hive Issue Type: Sub-task Components: spark-branch Affects Versions: spark-branch Reporter: Chao Assignee: Chinna Rao Lalam Attachments: HIVE-9516.1-spark.patch, HIVE-9516.2-spark.patch, HIVE-9516.3-spark.patch In Spark branch we enabled CBO, but hasn't turned on CBO related unit tests. We should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9871) Print spark job id in history file [spark branch]
[ https://issues.apache.org/jira/browse/HIVE-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355369#comment-14355369 ] Xuefu Zhang commented on HIVE-9871: --- +1 Print spark job id in history file [spark branch] - Key: HIVE-9871 URL: https://issues.apache.org/jira/browse/HIVE-9871 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Attachments: HIVE-9871.1-spark.patch, HIVE-9871.2-spark.patch Maintain the spark job id in history file for the corresponding queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9813) Hive JDBC - DatabaseMetaData.getColumns method cannot find classes added with add jar command
[ https://issues.apache.org/jira/browse/HIVE-9813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357697#comment-14357697 ] Xuefu Zhang commented on HIVE-9813: --- +1 Hive JDBC - DatabaseMetaData.getColumns method cannot find classes added with add jar command --- Key: HIVE-9813 URL: https://issues.apache.org/jira/browse/HIVE-9813 Project: Hive Issue Type: Bug Components: Metastore Reporter: Yongzhi Chen Assignee: Yongzhi Chen Attachments: HIVE-9813.1.patch, HIVE-9813.3.patch Execute following JDBC client program: {code} import java.sql.*; public class TestAddJar { private static Connection makeConnection(String connString, String classPath) throws ClassNotFoundException, SQLException { System.out.println(Current Connection info: + connString); Class.forName(classPath); System.out.println(Current driver info: + classPath); return DriverManager.getConnection(connString); } public static void main(String[] args) { if(2 != args.length) { System.out.println(Two arguments needed: connection string, path to jar to be added (include jar name)); System.out.println(Example: java -jar TestApp.jar jdbc:hive2://192.168.111.111 /tmp/json-serde-1.3-jar-with-dependencies.jar); return; } Connection conn; try { conn = makeConnection(args[0], org.apache.hive.jdbc.HiveDriver); System.out.println(---); System.out.println(DONE); System.out.println(---); System.out.println(Execute query: add jar + args[1] + ;); Statement stmt = conn.createStatement(); int c = stmt.executeUpdate(add jar + args[1]); System.out.println(Returned value is: [ + c + ]\n); System.out.println(---); final String createTableQry = Create table if not exists json_test(id int, content string) + row format serde 'org.openx.data.jsonserde.JsonSerDe'; System.out.println(Execute query: + createTableQry + ;); stmt.execute(createTableQry); System.out.println(---); System.out.println(getColumn() Call---\n); DatabaseMetaData md = conn.getMetaData(); System.out.println(Test get all column in a schema:); ResultSet rs = md.getColumns(Hive, default, json_test, null); while (rs.next()) { System.out.println(rs.getString(1)); } conn.close(); } catch (ClassNotFoundException e) { e.printStackTrace(); } catch (SQLException e) { e.printStackTrace(); } } } {code} Get Exception, and from metastore log: 7:41:30.316 PMERROR hive.log error in initSerDe: java.lang.ClassNotFoundException Class org.openx.data.jsonserde.JsonSerDe not found java.lang.ClassNotFoundException: Class org.openx.data.jsonserde.JsonSerDe not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1803) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:183) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_fields(HiveMetaStore.java:2487) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_schema(HiveMetaStore.java:2542) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at com.sun.proxy.$Proxy5.get_schema(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema.getResult(ThriftHiveMetastore.java:6425) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema.getResult(ThriftHiveMetastore.java:6409) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at
[jira] [Commented] (HIVE-9916) Fix TestSparkSessionManagerImpl [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357739#comment-14357739 ] Xuefu Zhang commented on HIVE-9916: --- +1 Fix TestSparkSessionManagerImpl [Spark Branch] -- Key: HIVE-9916 URL: https://issues.apache.org/jira/browse/HIVE-9916 Project: Hive Issue Type: Bug Components: spark-branch Affects Versions: spark-branch Reporter: Chao Assignee: Chao Attachments: HIVE-9916.1-spark.patch, HIVE-9916.2-spark.patch Looks like in HIVE-9872, wrong patch is committed, and therefore TestSparkSessionManagerImpl will still fail. This JIRA should fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9924) Add SORT_QUERY_RESULTS to union12.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9924: -- Component/s: Spark Add SORT_QUERY_RESULTS to union12.q [Spark Branch] -- Key: HIVE-9924 URL: https://issues.apache.org/jira/browse/HIVE-9924 Project: Hive Issue Type: Test Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Minor Attachments: HIVE-9924.1-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9924) Add SORT_QUERY_RESULTS to union12.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9924: -- Summary: Add SORT_QUERY_RESULTS to union12.q [Spark Branch] (was: Add SORT_QUERY_RESULTS to union12.q) Add SORT_QUERY_RESULTS to union12.q [Spark Branch] -- Key: HIVE-9924 URL: https://issues.apache.org/jira/browse/HIVE-9924 Project: Hive Issue Type: Test Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Minor Attachments: HIVE-9924.1-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-9916) Fix TestSparkSessionManagerImpl [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357739#comment-14357739 ] Xuefu Zhang edited comment on HIVE-9916 at 3/11/15 10:43 PM: - +1 The union related test failures will be addressed in HIVE-9924. was (Author: xuefuz): +1 Fix TestSparkSessionManagerImpl [Spark Branch] -- Key: HIVE-9916 URL: https://issues.apache.org/jira/browse/HIVE-9916 Project: Hive Issue Type: Bug Components: spark-branch Affects Versions: spark-branch Reporter: Chao Assignee: Chao Attachments: HIVE-9916.1-spark.patch, HIVE-9916.2-spark.patch Looks like in HIVE-9872, wrong patch is committed, and therefore TestSparkSessionManagerImpl will still fail. This JIRA should fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9924) Add SORT_QUERY_RESULTS to union12.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357746#comment-14357746 ] Xuefu Zhang commented on HIVE-9924: --- We need to address the two union-related test failures. The other two will be fixed in HIVE-9916. Add SORT_QUERY_RESULTS to union12.q [Spark Branch] -- Key: HIVE-9924 URL: https://issues.apache.org/jira/browse/HIVE-9924 Project: Hive Issue Type: Test Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Minor Attachments: HIVE-9924.1-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9939) Code cleanup for redundant if check in ExplainTask [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9939: -- Component/s: Spark Summary: Code cleanup for redundant if check in ExplainTask [Spark Branch] (was: Code cleanup for redundant if check in ExplainTask) Code cleanup for redundant if check in ExplainTask [Spark Branch] - Key: HIVE-9939 URL: https://issues.apache.org/jira/browse/HIVE-9939 Project: Hive Issue Type: Bug Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: spark-branch Attachments: HIVE-9939.1-spark.patch ExplainTask.execute() method have redundant if check. Same applicable for trunk also.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9939) Code cleanup for redundant if check in ExplainTask [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358596#comment-14358596 ] Xuefu Zhang commented on HIVE-9939: --- +1 Code cleanup for redundant if check in ExplainTask [Spark Branch] - Key: HIVE-9939 URL: https://issues.apache.org/jira/browse/HIVE-9939 Project: Hive Issue Type: Bug Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: spark-branch Attachments: HIVE-9939.1-spark.patch ExplainTask.execute() method have redundant if check. Same applicable for trunk also.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-9882) Add jar/file doesn't work with yarn-cluster mode [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350734#comment-14350734 ] Xuefu Zhang edited comment on HIVE-9882 at 3/6/15 7:07 PM: --- +1. I have a related question on RB, whose answer doesn't block this. was (Author: xuefuz): +1. I have a related question on RB. Add jar/file doesn't work with yarn-cluster mode [Spark Branch] --- Key: HIVE-9882 URL: https://issues.apache.org/jira/browse/HIVE-9882 Project: Hive Issue Type: Sub-task Components: Hive, spark-branch Affects Versions: spark-branch Reporter: Xiaomin Zhang Assignee: Rui Li Attachments: HIVE-9882.1-spark.patch, HIVE-9882.1.patch It seems current fix for HIVE-9425 only uploads the Jar/Files to HDFS, however, they are not accessible by the Driver/Executor. I found below in the AM log: {noformat} 15/02/26 15:10:36 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/hive-exec-1.2.0-SNAPSHOT.jar] to classpath. 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/opennlp-maxent-3.0.3.jar] to classpath. 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/bigbenchqueriesmr.jar] to classpath. 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/opennlp-tools-1.5.3.jar] to classpath. 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/jcl-over-slf4j-1.7.5.jar] to classpath. 15/02/26 15:10:36 INFO client.RemoteDriver: Failed to run job 6886df05-f430-456c-a0ff-c7621db712d6 org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: de.bankmark.bigbench.queries.q10.SentimentUDF {noformat} As above shows, the file path which was attempted to add to Classpath is invalid, so actually all uploaded Jars/Files are still not available for Driver/Executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9882) Add jar/file doesn't work with yarn-cluster mode [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350734#comment-14350734 ] Xuefu Zhang commented on HIVE-9882: --- +1. I have a related question on RB. Add jar/file doesn't work with yarn-cluster mode [Spark Branch] --- Key: HIVE-9882 URL: https://issues.apache.org/jira/browse/HIVE-9882 Project: Hive Issue Type: Sub-task Components: Hive, spark-branch Affects Versions: spark-branch Reporter: Xiaomin Zhang Assignee: Rui Li Attachments: HIVE-9882.1-spark.patch, HIVE-9882.1.patch It seems current fix for HIVE-9425 only uploads the Jar/Files to HDFS, however, they are not accessible by the Driver/Executor. I found below in the AM log: {noformat} 15/02/26 15:10:36 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/hive-exec-1.2.0-SNAPSHOT.jar] to classpath. 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/opennlp-maxent-3.0.3.jar] to classpath. 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/bigbenchqueriesmr.jar] to classpath. 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/opennlp-tools-1.5.3.jar] to classpath. 15/02/26 15:10:36 INFO client.SparkClientUtilities: Added jar[file:/data/hadoop-devel/data/nm/usercache/user/appcache/application_1424933948132_0002/container_1424933948132_0002_01_01/hdfs:/localhost:8020/tmp/hive/user/47040bca-1da4-49b6-b2c7-69be9bc92855/jcl-over-slf4j-1.7.5.jar] to classpath. 15/02/26 15:10:36 INFO client.RemoteDriver: Failed to run job 6886df05-f430-456c-a0ff-c7621db712d6 org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: de.bankmark.bigbench.queries.q10.SentimentUDF {noformat} As above shows, the file path which was attempted to add to Classpath is invalid, so actually all uploaded Jars/Files are still not available for Driver/Executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9302) Beeline add commands to register local jdbc driver names and jars
[ https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351657#comment-14351657 ] Xuefu Zhang commented on HIVE-9302: --- [~Ferd], I think I didn't check in the jar files. Could you please specify which jar(s) you need and the locations? Thanks. Beeline add commands to register local jdbc driver names and jars - Key: HIVE-9302 URL: https://issues.apache.org/jira/browse/HIVE-9302 Project: Hive Issue Type: New Feature Reporter: Brock Noland Assignee: Ferdinand Xu Labels: TODOC1.2 Fix For: 1.2.0 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, HIVE-9302.2.patch, HIVE-9302.3.patch, HIVE-9302.3.patch, HIVE-9302.4.patch, HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar At present if a beeline user uses {{add jar}} the path they give is actually on the HS2 server. It'd be great to allow beeline users to add local jdbc driver jars and register custom jdbc driver names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9302) Beeline add commands to register local jdbc driver names and jars
[ https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351774#comment-14351774 ] Xuefu Zhang commented on HIVE-9302: --- These two jar files are added to the trunk. Beeline add commands to register local jdbc driver names and jars - Key: HIVE-9302 URL: https://issues.apache.org/jira/browse/HIVE-9302 Project: Hive Issue Type: New Feature Reporter: Brock Noland Assignee: Ferdinand Xu Labels: TODOC1.2 Fix For: 1.2.0 Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, HIVE-9302.2.patch, HIVE-9302.3.patch, HIVE-9302.3.patch, HIVE-9302.4.patch, HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar At present if a beeline user uses {{add jar}} the path they give is actually on the HS2 server. It'd be great to allow beeline users to add local jdbc driver jars and register custom jdbc driver names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9961) HookContext for view should return a table type of VIRTUAL_VIEW
[ https://issues.apache.org/jira/browse/HIVE-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360837#comment-14360837 ] Xuefu Zhang commented on HIVE-9961: --- +1 pending on test. HookContext for view should return a table type of VIRTUAL_VIEW --- Key: HIVE-9961 URL: https://issues.apache.org/jira/browse/HIVE-9961 Project: Hive Issue Type: Bug Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-9961.patch Run a 'create view' statement. The view entity (which is in the hook's outputs) has a table with tableType 'MANAGED_TABLE'). It should be of type 'VIRTUAL_VIEW' so that auditing tools can correctly identify it as a view. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356155#comment-14356155 ] Xuefu Zhang commented on HIVE-9659: --- HIVE-9918 is resolved. [~lirui], could you reattach the patch to have another test run? Thanks. 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch] --- Key: HIVE-9659 URL: https://issues.apache.org/jira/browse/HIVE-9659 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xin Hao Assignee: Rui Li Attachments: HIVE-9659.1-spark.patch, HIVE-9659.2-spark.patch, HIVE-9659.3-spark.patch, HIVE-9659.4-spark.patch We found that 'Error while trying to create table container' occurs during Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'. If hive.optimize.skewjoin set to 'false', the case could pass. How to reproduce: 1. set hive.optimize.skewjoin=true; 2. Run BigBench case Q12 and it will fail. Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you will found error 'Error while trying to create table container' in the log and also a NullPointerException near the end of the log. (a) Detail error message for 'Error while trying to create table container': {noformat} 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158) at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115) ... 21 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a directory: hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106) ... 22 more 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480 15/02/12 01:29:49 INFO PerfLogger: PERFLOG
[jira] [Commented] (HIVE-9828) Semantic analyzer does not capture view parent entity for tables referred in view with union all
[ https://issues.apache.org/jira/browse/HIVE-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355173#comment-14355173 ] Xuefu Zhang commented on HIVE-9828: --- +1 Semantic analyzer does not capture view parent entity for tables referred in view with union all - Key: HIVE-9828 URL: https://issues.apache.org/jira/browse/HIVE-9828 Project: Hive Issue Type: Bug Components: Parser Affects Versions: 1.1.0 Reporter: Prasad Mujumdar Fix For: 1.2.0 Attachments: HIVE-9828.1-npf.patch Hive compiler adds tables used in a view definition in the input entity list, with the view as parent entity for the table. In case of a view with union all query, this is not being done property. For example, {noformat} create view view1 as select t.id from (select tab1.id from db.tab1 union all select tab2.id from db.tab2 ) t; {noformat} This query will capture tab1 and tab2 as read entity without view1 as parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9918) Spark branch build is failing due to unknown url [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9918: -- Summary: Spark branch build is failing due to unknown url [Spark Branch] (was: Spark branch build is failing due to unknown url) Spark branch build is failing due to unknown url [Spark Branch] --- Key: HIVE-9918 URL: https://issues.apache.org/jira/browse/HIVE-9918 Project: Hive Issue Type: Bug Components: Spark, spark-branch Reporter: Sergio Peña Assignee: Sergio Peña Priority: Blocker Attachments: HIVE-9918.1-spark.patch, HIVE-9918.1.patch Spark branch is failing due to an URL that does not exist anymore. This is URL contains all spark jars used to build. The spark jars versions are not on the official maven repository. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9956) use BigDecimal.valueOf instead of new in TestFileDump
[ https://issues.apache.org/jira/browse/HIVE-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360408#comment-14360408 ] Xuefu Zhang commented on HIVE-9956: --- +1 use BigDecimal.valueOf instead of new in TestFileDump - Key: HIVE-9956 URL: https://issues.apache.org/jira/browse/HIVE-9956 Project: Hive Issue Type: Bug Components: File Formats Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-9956.1.patch TestFileDump builds data row where one of the column is BigDecimal The test adds value 2. There are 2 ways to create BigDecimal object. 1. use new 2. use valueOf in this particular case 1. new will create 2.222153 2. valueOf will use the canonical String representation and the result will be 2. Probably we should use valueOf to create BigDecimal object TestTimestampWritable and TestHCatStores use valueOf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9957) Hive 1.1.0 not compatible with Hadoop 2.4.0
[ https://issues.apache.org/jira/browse/HIVE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360386#comment-14360386 ] Xuefu Zhang commented on HIVE-9957: --- cc: [~spena] Hive 1.1.0 not compatible with Hadoop 2.4.0 --- Key: HIVE-9957 URL: https://issues.apache.org/jira/browse/HIVE-9957 Project: Hive Issue Type: Bug Components: Encryption Reporter: Vivek Shrivastava Getting this exception while accessing data through Hive. Exception in thread main java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.DFSClient.getKeyProvider()Lorg/apache/hadoop/crypto/key/KeyProvider; at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.init(Hadoop23Shims.java:1152) at org.apache.hadoop.hive.shims.Hadoop23Shims.createHdfsEncryptionShim(Hadoop23Shims.java:1279) at org.apache.hadoop.hive.ql.session.SessionState.getHdfsEncryptionShim(SessionState.java:392) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1756) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1875) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1689) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1427) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10132) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9813) Hive JDBC - DatabaseMetaData.getColumns method cannot find classes added with add jar command
[ https://issues.apache.org/jira/browse/HIVE-9813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9813: -- Labels: TODOC1.2 (was: ) Hive JDBC - DatabaseMetaData.getColumns method cannot find classes added with add jar command --- Key: HIVE-9813 URL: https://issues.apache.org/jira/browse/HIVE-9813 Project: Hive Issue Type: Bug Components: Metastore Reporter: Yongzhi Chen Assignee: Yongzhi Chen Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-9813.1.patch, HIVE-9813.3.patch Execute following JDBC client program: {code} import java.sql.*; public class TestAddJar { private static Connection makeConnection(String connString, String classPath) throws ClassNotFoundException, SQLException { System.out.println(Current Connection info: + connString); Class.forName(classPath); System.out.println(Current driver info: + classPath); return DriverManager.getConnection(connString); } public static void main(String[] args) { if(2 != args.length) { System.out.println(Two arguments needed: connection string, path to jar to be added (include jar name)); System.out.println(Example: java -jar TestApp.jar jdbc:hive2://192.168.111.111 /tmp/json-serde-1.3-jar-with-dependencies.jar); return; } Connection conn; try { conn = makeConnection(args[0], org.apache.hive.jdbc.HiveDriver); System.out.println(---); System.out.println(DONE); System.out.println(---); System.out.println(Execute query: add jar + args[1] + ;); Statement stmt = conn.createStatement(); int c = stmt.executeUpdate(add jar + args[1]); System.out.println(Returned value is: [ + c + ]\n); System.out.println(---); final String createTableQry = Create table if not exists json_test(id int, content string) + row format serde 'org.openx.data.jsonserde.JsonSerDe'; System.out.println(Execute query: + createTableQry + ;); stmt.execute(createTableQry); System.out.println(---); System.out.println(getColumn() Call---\n); DatabaseMetaData md = conn.getMetaData(); System.out.println(Test get all column in a schema:); ResultSet rs = md.getColumns(Hive, default, json_test, null); while (rs.next()) { System.out.println(rs.getString(1)); } conn.close(); } catch (ClassNotFoundException e) { e.printStackTrace(); } catch (SQLException e) { e.printStackTrace(); } } } {code} Get Exception, and from metastore log: 7:41:30.316 PMERROR hive.log error in initSerDe: java.lang.ClassNotFoundException Class org.openx.data.jsonserde.JsonSerDe not found java.lang.ClassNotFoundException: Class org.openx.data.jsonserde.JsonSerDe not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1803) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:183) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_fields(HiveMetaStore.java:2487) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_schema(HiveMetaStore.java:2542) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at com.sun.proxy.$Proxy5.get_schema(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema.getResult(ThriftHiveMetastore.java:6425) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema.getResult(ThriftHiveMetastore.java:6409) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at
[jira] [Commented] (HIVE-9918) Spark branch build is failing due to unknown url
[ https://issues.apache.org/jira/browse/HIVE-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355839#comment-14355839 ] Xuefu Zhang commented on HIVE-9918: --- +1 pending on test. Spark branch build is failing due to unknown url Key: HIVE-9918 URL: https://issues.apache.org/jira/browse/HIVE-9918 Project: Hive Issue Type: Bug Components: Spark, spark-branch Reporter: Sergio Peña Assignee: Sergio Peña Priority: Blocker Attachments: HIVE-9918.1.patch Spark branch is failing due to an URL that does not exist anymore. This is URL contains all spark jars used to build. The spark jars versions are not on the official maven repository. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9924) Add SORT_QUERY_RESULTS to union12.q
[ https://issues.apache.org/jira/browse/HIVE-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9924: -- Attachment: HIVE-9924.1-spark.patch Attached a dummy patch to trigger a clean test run for Spark branch to find out any test failures. Add SORT_QUERY_RESULTS to union12.q --- Key: HIVE-9924 URL: https://issues.apache.org/jira/browse/HIVE-9924 Project: Hive Issue Type: Test Reporter: Rui Li Assignee: Rui Li Priority: Minor Attachments: HIVE-9924.1-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9625) Delegation tokens for HMS are not renewed
[ https://issues.apache.org/jira/browse/HIVE-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358052#comment-14358052 ] Xuefu Zhang commented on HIVE-9625: --- [~brocknoland], [~prasadm], could we move this forward? Delegation tokens for HMS are not renewed - Key: HIVE-9625 URL: https://issues.apache.org/jira/browse/HIVE-9625 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9625.1.patch AFAICT the delegation tokens stored in [HiveSessionImplwithUGI |https://github.com/apache/hive/blob/trunk/service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java#L45] for HMS + Impersonation are never renewed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10087) Beeline's --silent option should suppress query from being echoed when running with -f option
[ https://issues.apache.org/jira/browse/HIVE-10087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14380336#comment-14380336 ] Xuefu Zhang commented on HIVE-10087: Patch looks good. One minor thing, I noticed there is a bland line in the console output for -f when --silent=true. Is there a way to get rid of that? Beeline's --silent option should suppress query from being echoed when running with -f option - Key: HIVE-10087 URL: https://issues.apache.org/jira/browse/HIVE-10087 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 0.13.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Attachments: HIVE-10087.patch The {{-e}} and the {{-f}} options behave differently. {code} beeline -u jdbc:hive2://localhost:1/default --showHeader=false --silent=true -f select.sql 0: jdbc:hive2://localhost:1/default select * from sample_07 limit 5; -- 00- All Occupations 134354250 40690 11- Management occupations 6003930 96150 11-1011 Chief executives 299160 151370 11-1021 General and operations managers 1655410 103780 11-1031 Legislators 61110 33880 -- beeline -u jdbc:hive2://localhost:1/default --showHeader=false --silent=true -e select * from sample_07 limit 5; -- 00- All Occupations 134354250 40690 11- Management occupations 6003930 96150 11-1011 Chief executives299160 151370 11-1021 General and operations managers 1655410 103780 11-1031 Legislators 61110 33880 -- {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10087) Beeline's --silent option should suppress query from being echoed when running with -f option
[ https://issues.apache.org/jira/browse/HIVE-10087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14380945#comment-14380945 ] Xuefu Zhang commented on HIVE-10087: +1 Beeline's --silent option should suppress query from being echoed when running with -f option - Key: HIVE-10087 URL: https://issues.apache.org/jira/browse/HIVE-10087 Project: Hive Issue Type: Bug Components: Beeline Affects Versions: 0.13.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Attachments: HIVE-10087.patch The {{-e}} and the {{-f}} options behave differently. {code} beeline -u jdbc:hive2://localhost:1/default --showHeader=false --silent=true -f select.sql 0: jdbc:hive2://localhost:1/default select * from sample_07 limit 5; -- 00- All Occupations 134354250 40690 11- Management occupations 6003930 96150 11-1011 Chief executives 299160 151370 11-1021 General and operations managers 1655410 103780 11-1031 Legislators 61110 33880 -- beeline -u jdbc:hive2://localhost:1/default --showHeader=false --silent=true -e select * from sample_07 limit 5; -- 00- All Occupations 134354250 40690 11- Management occupations 6003930 96150 11-1011 Chief executives299160 151370 11-1021 General and operations managers 1655410 103780 11-1031 Legislators 61110 33880 -- {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8858) Visualize generated Spark plan [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387344#comment-14387344 ] Xuefu Zhang commented on HIVE-8858: --- Hi Chinna, thanks for working on this. I haven't checked your patch, but the output looks nice. I have a few suggestions: 1. we need numbering in the Trans. Otherwise, it's hard to visualize the graph. 2. Other information, such as num of partitions in ShuffleTran, is also important to show. 3. It would be better if we log this graph in one line. The easiest way is to have a toString() method in SparkPlan and then we can just log the string representation of SparkPlan. 4. To avoid long lines, we can show the graph in the same way as we show work graph. For instance {code} MapTran 1 - MapInput 1 (cache off) Shuffle1 (cache on) - MapTran 1 Reduce 1 - Shuffle1 (cache on) Reduce 2 - Shuffle1 (cache on) {code} Please note that this may not represent a valid plan. [~jxiang]/[~csun], please feel free to share your thoughts. Visualize generated Spark plan [Spark Branch] - Key: HIVE-8858 URL: https://issues.apache.org/jira/browse/HIVE-8858 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chinna Rao Lalam Attachments: HIVE-8858-spark.patch The spark plan generated by SparkPlanGenerator contains info which isn't available in Hive's explain plan, such as RDD caching. Also, the graph is slight different from orignal SparkWork. Thus, it would be nice to visualize the plan as is done for SparkWork. Preferrably, the visualization can happen as part of Hive explain extended. If not feasible, we at least can log this at info level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10143) HS2 fails to clean up Spark client state on timeout [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387276#comment-14387276 ] Xuefu Zhang commented on HIVE-10143: +1 pending on tests. HS2 fails to clean up Spark client state on timeout [Spark Branch] -- Key: HIVE-10143 URL: https://issues.apache.org/jira/browse/HIVE-10143 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Attachments: HIVE-10143.1-spark.patch When a new client is registered with the Spark client and fails to connect back in time, the code will time out the future and HS2 will give up on that client. But the RSC backend does not clean up all the state, and the client is still allowed to connect back. That can lead to the client staying alive indefinitely and holding on to cluster resources, since HS2 doesn't know it's alive but the connection still exists. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10143) HS2 fails to clean up Spark client state on timeout [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387437#comment-14387437 ] Xuefu Zhang commented on HIVE-10143: That's correct. These failures are known, captured in HIVE-10134. Please ignore for now. HS2 fails to clean up Spark client state on timeout [Spark Branch] -- Key: HIVE-10143 URL: https://issues.apache.org/jira/browse/HIVE-10143 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Attachments: HIVE-10143.1-spark.patch When a new client is registered with the Spark client and fails to connect back in time, the code will time out the future and HS2 will give up on that client. But the RSC backend does not clean up all the state, and the client is still allowed to connect back. That can lead to the client staying alive indefinitely and holding on to cluster resources, since HS2 doesn't know it's alive but the connection still exists. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383219#comment-14383219 ] Xuefu Zhang commented on HIVE-10073: Okay. Makes sense. Runtime exception when querying HBase with Spark [Spark Branch] --- Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10073.1-spark.patch, HIVE-10073.2-spark.patch, HIVE-10073.3-spark.patch When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382162#comment-14382162 ] Xuefu Zhang commented on HIVE-10073: Hi [~jxiang] and [~chengxiang li], before we patch this on Hive side, I think it's better to find the root cause. If the problem is due to Spark, we can bring up the problem to that community. So far, I'm not convinced that the problem is on hive side. Runtime exception when querying HBase with Spark [Spark Branch] --- Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10073.1-spark.patch When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9969) Avoid Utilities.getMapRedWork for spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389226#comment-14389226 ] Xuefu Zhang commented on HIVE-9969: --- +1 Avoid Utilities.getMapRedWork for spark [Spark Branch] -- Key: HIVE-9969 URL: https://issues.apache.org/jira/browse/HIVE-9969 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Priority: Minor Attachments: HIVE-9969.1-spark.patch The method shouldn't be used for spark mode. Specifically, map work and reduce work have different plan paths in spark. Calling this method will leave lots of errors in executor's log: {noformat} 15/03/16 02:57:23 INFO Utilities: Open file to read in plan: hdfs://node13-1:8020/tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml 15/03/16 02:57:23 INFO Utilities: File not found: File does not exist: /tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10130) Merge from Spark branch to trunk 03/27/2015 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10130: --- Summary: Merge from Spark branch to trunk 03/27/2015 [Spark Branch] (was: Merge from Spark branch to trunk 03/27/2015) Merge from Spark branch to trunk 03/27/2015 [Spark Branch] -- Key: HIVE-10130 URL: https://issues.apache.org/jira/browse/HIVE-10130 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-10130.1-spark.patch, HIVE-10130.2-spark.patch, HIVE-10130.2-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10058) Log the information of cached RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375896#comment-14375896 ] Xuefu Zhang commented on HIVE-10058: Hi Chinna, do you agree that if we fulfill HIVE-8858 we don't need this one? My concern is that RDD id helps little in understanding Spark plan. Log the information of cached RDD [Spark Branch] Key: HIVE-10058 URL: https://issues.apache.org/jira/browse/HIVE-10058 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: spark-branch Attachments: HIVE-10058.1-spark.patch Log the cached RDD Id's at info level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9990) TestMultiSessionsHS2WithLocalClusterSpark is failing
[ https://issues.apache.org/jira/browse/HIVE-9990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375345#comment-14375345 ] Xuefu Zhang commented on HIVE-9990: --- [~Ferd], thanks for looking into this. My desktop always has the problem running spark tests, due to snappy native library. I guess my problem could be different from Jenkins. If you cannot produce, I think that could be just a transient failure. You may close the problem as not reproducible. Thanks. TestMultiSessionsHS2WithLocalClusterSpark is failing Key: HIVE-9990 URL: https://issues.apache.org/jira/browse/HIVE-9990 Project: Hive Issue Type: Bug Components: Spark Affects Versions: 1.2.0 Reporter: Xuefu Zhang Assignee: Ferdinand Xu At least sometimes. I can reproduce it with mvn test -Dtest=TestMultiSessionsHS2WithLocalClusterSpark -Phadoop-2 consistently on my local box (both trunk and spark branch). {code} --- T E S T S --- Running org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 92.438 sec FAILURE! - in org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark testSparkQuery(org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark) Time elapsed: 21.514 sec ERROR! java.util.concurrent.ExecutionException: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296) at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392) at org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.verifyResult(TestMultiSessionsHS2WithLocalClusterSpark.java:244) at org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testKvQuery(TestMultiSessionsHS2WithLocalClusterSpark.java:220) at org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.access$000(TestMultiSessionsHS2WithLocalClusterSpark.java:53) {code} The error was also seen in HIVE-9934 test run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9793) Remove hard coded paths from cli driver tests
[ https://issues.apache.org/jira/browse/HIVE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340258#comment-14340258 ] Xuefu Zhang commented on HIVE-9793: --- +1 Remove hard coded paths from cli driver tests - Key: HIVE-9793 URL: https://issues.apache.org/jira/browse/HIVE-9793 Project: Hive Issue Type: Improvement Components: Tests Affects Versions: 1.2.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9793.patch, HIVE-9793.patch, HIVE-9793.patch At some point a change which generates a hard coded path into the test files snuck in. Insert we should use the {{HIVE_ROOT}} directory as this is better for ptest environments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347821#comment-14347821 ] Xuefu Zhang edited comment on HIVE-9863 at 3/5/15 12:18 AM: cc: [~rdblue] [~spena] was (Author: xuefuz): cc: [~rdblue] Querying parquet tables fails with IllegalStateException [Spark Branch] --- Key: HIVE-9863 URL: https://issues.apache.org/jira/browse/HIVE-9863 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Not necessarily happens only in spark branch, queries such as select count(*) from table_name fails with error: {code} hive select * from content limit 2; OK Failed with exception java.io.IOException:java.lang.IllegalStateException: All the offsets listed in the split should be found in the file. expected: [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] BINARY [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] BINARY [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] INT64 [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP [meta_timestamp] INT64 [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, ColumnMetaData{GZIP [doc_timestamp] INT64 [RLE, PLAIN_DICTIONARY, BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32 [RLE, PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] INT32 [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP [source] BINARY [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP [delete_flag] BOOLEAN [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP [meta] BINARY [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP [content] BINARY [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 129785482, 260224757] in range 0, 134217728 Time taken: 0.253 seconds hive {code} I can reproduce the problem with either local or yarn-cluster. It seems happening to MR also. Thus, I suspect this is an parquet problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347821#comment-14347821 ] Xuefu Zhang commented on HIVE-9863: --- cc: [~rdblue] Querying parquet tables fails with IllegalStateException [Spark Branch] --- Key: HIVE-9863 URL: https://issues.apache.org/jira/browse/HIVE-9863 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Not necessarily happens only in spark branch, queries such as select count(*) from table_name fails with error: {code} hive select * from content limit 2; OK Failed with exception java.io.IOException:java.lang.IllegalStateException: All the offsets listed in the split should be found in the file. expected: [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] BINARY [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] BINARY [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] INT64 [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP [meta_timestamp] INT64 [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, ColumnMetaData{GZIP [doc_timestamp] INT64 [RLE, PLAIN_DICTIONARY, BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32 [RLE, PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] INT32 [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP [source] BINARY [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP [delete_flag] BOOLEAN [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP [meta] BINARY [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP [content] BINARY [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 129785482, 260224757] in range 0, 134217728 Time taken: 0.253 seconds hive {code} I can reproduce the problem with either local or yarn-cluster. It seems happening to MR also. Thus, I suspect this is an parquet problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9869) Trunk doesn't build with hadoop-1
[ https://issues.apache.org/jira/browse/HIVE-9869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang reassigned HIVE-9869: - Assignee: Rui Li Trunk doesn't build with hadoop-1 - Key: HIVE-9869 URL: https://issues.apache.org/jira/browse/HIVE-9869 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-9869.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9855) Runtime skew join doesn't work when skewed data only exists in big table
[ https://issues.apache.org/jira/browse/HIVE-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346896#comment-14346896 ] Xuefu Zhang commented on HIVE-9855: --- +1 pending on test. Runtime skew join doesn't work when skewed data only exists in big table Key: HIVE-9855 URL: https://issues.apache.org/jira/browse/HIVE-9855 Project: Hive Issue Type: Bug Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-9855.1.patch To reproduce, enable runtime skew join and then join two tables that skewed data only exists in one of them. The task will fail with the following exception: {noformat} Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.IOException: Unable to rename output to: hdfs://.. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9302) Beeline add commands to register local jdbc driver names and jars
[ https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348841#comment-14348841 ] Xuefu Zhang commented on HIVE-9302: --- +1 Beeline add commands to register local jdbc driver names and jars - Key: HIVE-9302 URL: https://issues.apache.org/jira/browse/HIVE-9302 Project: Hive Issue Type: New Feature Reporter: Brock Noland Assignee: Ferdinand Xu Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, HIVE-9302.2.patch, HIVE-9302.3.patch, HIVE-9302.3.patch, HIVE-9302.4.patch, HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar At present if a beeline user uses {{add jar}} the path they give is actually on the HS2 server. It'd be great to allow beeline users to add local jdbc driver jars and register custom jdbc driver names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9793) Remove hard coded paths from cli driver tests
[ https://issues.apache.org/jira/browse/HIVE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337810#comment-14337810 ] Xuefu Zhang commented on HIVE-9793: --- Looks good to me. What about the result directory, which is also using basedir? Remove hard coded paths from cli driver tests - Key: HIVE-9793 URL: https://issues.apache.org/jira/browse/HIVE-9793 Project: Hive Issue Type: Improvement Components: Tests Affects Versions: 1.2.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-9793.patch At some point a change which generates a hard coded path into the test files snuck in. Insert we should use the {{HIVE_ROOT}} directory as this is better for ptest environments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9086) Add language support to PURGE data while dropping partitions.
[ https://issues.apache.org/jira/browse/HIVE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337437#comment-14337437 ] Xuefu Zhang commented on HIVE-9086: --- Could we get a summary on the disagreement here? If the syntax for table is adding PURGE after table name, we should be adding PURGE after partition spec just to be consistent. Add language support to PURGE data while dropping partitions. - Key: HIVE-9086 URL: https://issues.apache.org/jira/browse/HIVE-9086 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.15.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: HIVE-9086.1.patch HIVE-9083 adds metastore-support to skip-trash while dropping partitions. This patch includes language support to do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9794) java.lang.NoSuchMethodError occurs during hive query execution which has 'ADD FILE XXXX.jar' sentence
[ https://issues.apache.org/jira/browse/HIVE-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339492#comment-14339492 ] Xuefu Zhang commented on HIVE-9794: --- cc: [~chengxiang li], [~lirui] java.lang.NoSuchMethodError occurs during hive query execution which has 'ADD FILE .jar' sentence - Key: HIVE-9794 URL: https://issues.apache.org/jira/browse/HIVE-9794 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xin Hao We updated our code to the latest revision on Spark Branch (i.e. fd0f638a8d481a9a98b34d3dd08236d6d591812f) , rebuild and deploy Hive in our cluster and run BigBench case again. Many cases (e.g. Q1, Q2, Q3, Q4, Q8) failed due to a common 'NoSuchMethodError'. The root cause sentence in these queries should be ‘ADD FILE .jar’. Detail error message: Exception in thread main java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.session.SessionState.add_resources(Lorg/apache/hadoop/hive/ql/session/SessionState$ResourceType;Ljava/util/List;)Ljava/util/List; at org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:67) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:262) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:419) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others
[ https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369575#comment-14369575 ] Xuefu Zhang commented on HIVE-7018: --- +1, Thanks, Chaoyu! Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others - Key: HIVE-7018 URL: https://issues.apache.org/jira/browse/HIVE-7018 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Yongzhi Chen Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch It appears that at least postgres and oracle do not have the LINK_TARGET_ID column while mysql does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9994) Hive query plan returns sensitive data to external applications
[ https://issues.apache.org/jira/browse/HIVE-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367771#comment-14367771 ] Xuefu Zhang commented on HIVE-9994: --- +1 Hive query plan returns sensitive data to external applications --- Key: HIVE-9994 URL: https://issues.apache.org/jira/browse/HIVE-9994 Project: Hive Issue Type: Bug Affects Versions: 1.0.0 Reporter: Sergio Peña Assignee: Sergio Peña Attachments: HIVE-9994.1.patch, HIVE-9994.2.patch, HIVE-9994.3.patch Some applications are using getQueryString() method from the QueryPlan class to get the query that is being executed by Hive. The query string returned is not redacted, and it is returning sensitive information that is logged in Navigator. We need to return data redacted from the QueryPlan to avoid other applications to log sensitive data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10058) Log the information of cached RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376552#comment-14376552 ] Xuefu Zhang commented on HIVE-10058: [~chinnalalam], Sorry that I wasn't clear. I was thinking more on HIVE-8858, where I'd like to have visual representation of SparkPlan. As you can see in the class definition, A SparkPlan consists of a graph of SparkTrans. SparkTran has a few subclasses. Some subclass, such as MapInput has property such as toCache. What is desirable is that we log a SparkPlan in a graphical way similar to what's show for work graph in explain plan, such as: {code} MapInput (cache off) - Shuffle (cache on) - Reduce \- Reduce {code} This is will give us some idea about SparkPlan that we are executing. Let me know if you have any questions. Log the information of cached RDD [Spark Branch] Key: HIVE-10058 URL: https://issues.apache.org/jira/browse/HIVE-10058 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: spark-branch Attachments: HIVE-10058.1-spark.patch, HIVE-10058.2-spark.patch Log the cached RDD Id's at info level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-10058) Log the information of cached RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376552#comment-14376552 ] Xuefu Zhang edited comment on HIVE-10058 at 3/23/15 8:16 PM: - [~chinnalalam], Sorry that I wasn't clear. I was thinking more on HIVE-8858, where I'd like to have visual representation of SparkPlan. As you can see in the class definition, A SparkPlan consists of a graph of SparkTrans. SparkTran has a few subclasses. Some subclass, such as MapInput has property such as toCache. What is desirable is that we log a SparkPlan in a graphical way similar to what's show for work graph in explain plan, such as: {code} MapInput (cache off) - Shuffle (cache on) - Reduce \- Reduce {code} This is will give us some idea about SparkPlan that we are executing. Let me know if you have any questions. was (Author: xuefuz): [~chinnalalam], Sorry that I wasn't clear. I was thinking more on HIVE-8858, where I'd like to have visual representation of SparkPlan. As you can see in the class definition, A SparkPlan consists of a graph of SparkTrans. SparkTran has a few subclasses. Some subclass, such as MapInput has property such as toCache. What is desirable is that we log a SparkPlan in a graphical way similar to what's show for work graph in explain plan, such as: {code} MapInput (cache off) - Shuffle (cache on) - Reduce \- Reduce {code} This is will give us some idea about SparkPlan that we are executing. Let me know if you have any questions. Log the information of cached RDD [Spark Branch] Key: HIVE-10058 URL: https://issues.apache.org/jira/browse/HIVE-10058 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: spark-branch Attachments: HIVE-10058.1-spark.patch, HIVE-10058.2-spark.patch Log the cached RDD Id's at info level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-10058) Log the information of cached RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376552#comment-14376552 ] Xuefu Zhang edited comment on HIVE-10058 at 3/23/15 8:16 PM: - [~chinnalalam], Sorry that I wasn't clear. I was thinking more on HIVE-8858, where I'd like to have visual representation of SparkPlan. As you can see in the class definition, A SparkPlan consists of a graph of SparkTrans. SparkTran has a few subclasses. Some subclass, such as MapInput has property such as toCache. What is desirable is that we log a SparkPlan in a graphical way similar to what's show for work graph in explain plan, such as: {code} MapInput (cache off) - Shuffle (cache on) - Reduce \- Reduce {code} This is will give us some idea about SparkPlan that we are executing. Let me know if you have any questions. was (Author: xuefuz): [~chinnalalam], Sorry that I wasn't clear. I was thinking more on HIVE-8858, where I'd like to have visual representation of SparkPlan. As you can see in the class definition, A SparkPlan consists of a graph of SparkTrans. SparkTran has a few subclasses. Some subclass, such as MapInput has property such as toCache. What is desirable is that we log a SparkPlan in a graphical way similar to what's show for work graph in explain plan, such as: {code} MapInput (cache off) - Shuffle (cache on) - Reduce \- Reduce {code} This is will give us some idea about SparkPlan that we are executing. Let me know if you have any questions. Log the information of cached RDD [Spark Branch] Key: HIVE-10058 URL: https://issues.apache.org/jira/browse/HIVE-10058 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: spark-branch Attachments: HIVE-10058.1-spark.patch, HIVE-10058.2-spark.patch Log the cached RDD Id's at info level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-10058) Log the information of cached RDD [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376552#comment-14376552 ] Xuefu Zhang edited comment on HIVE-10058 at 3/23/15 8:16 PM: - [~chinnalalam], Sorry that I wasn't clear. I was thinking more on HIVE-8858, where I'd like to have visual representation of SparkPlan. As you can see in the class definition, A SparkPlan consists of a graph of SparkTrans. SparkTran has a few subclasses. Some subclass, such as MapInput has property such as toCache. What is desirable is that we log a SparkPlan in a graphical way similar to what's show for work graph in explain plan, such as: {code} MapInput (cache off) - Shuffle (cache on) - Reduce \- Reduce {code} This is will give us some idea about SparkPlan that we are executing. Let me know if you have any questions. was (Author: xuefuz): [~chinnalalam], Sorry that I wasn't clear. I was thinking more on HIVE-8858, where I'd like to have visual representation of SparkPlan. As you can see in the class definition, A SparkPlan consists of a graph of SparkTrans. SparkTran has a few subclasses. Some subclass, such as MapInput has property such as toCache. What is desirable is that we log a SparkPlan in a graphical way similar to what's show for work graph in explain plan, such as: {code} MapInput (cache off) - Shuffle (cache on) - Reduce \- Reduce {code} This is will give us some idea about SparkPlan that we are executing. Let me know if you have any questions. Log the information of cached RDD [Spark Branch] Key: HIVE-10058 URL: https://issues.apache.org/jira/browse/HIVE-10058 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chinna Rao Lalam Assignee: Chinna Rao Lalam Fix For: spark-branch Attachments: HIVE-10058.1-spark.patch, HIVE-10058.2-spark.patch Log the cached RDD Id's at info level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10130) Merge from Spark branch to trunk 03/27/2015
[ https://issues.apache.org/jira/browse/HIVE-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10130: --- Attachment: HIVE-10130.1-spark.patch Merge from Spark branch to trunk 03/27/2015 --- Key: HIVE-10130 URL: https://issues.apache.org/jira/browse/HIVE-10130 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-10130.1-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10130) Merge from Spark branch to trunk 03/27/2015
[ https://issues.apache.org/jira/browse/HIVE-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10130: --- Attachment: HIVE-10130.2-spark.patch Merge from Spark branch to trunk 03/27/2015 --- Key: HIVE-10130 URL: https://issues.apache.org/jira/browse/HIVE-10130 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-10130.1-spark.patch, HIVE-10130.2-spark.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10146) Not count session as idle if query is running
[ https://issues.apache.org/jira/browse/HIVE-10146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10146: --- Labels: TODOC1.2 (was: ) Not count session as idle if query is running - Key: HIVE-10146 URL: https://issues.apache.org/jira/browse/HIVE-10146 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-10146.1.patch, HIVE-10146.2.patch Currently, as long as there is no activity, we think the HS2 session is idle. This makes it very hard to set HIVE_SERVER2_IDLE_SESSION_TIMEOUT. If we don't set it long enough, an unattended query could be killed. We should provide an option to not to count the session as idle if some query is still running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10385) Optionally disable partition creation to speedup ETL jobs
[ https://issues.apache.org/jira/browse/HIVE-10385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504116#comment-14504116 ] Xuefu Zhang commented on HIVE-10385: Not sure if I understand the request correctly. If we load a table with dynamic partitioning w/o creating these partitions at the end, why do we even bother using dynamic partitioning at all. A use case would help. Optionally disable partition creation to speedup ETL jobs - Key: HIVE-10385 URL: https://issues.apache.org/jira/browse/HIVE-10385 Project: Hive Issue Type: Improvement Components: Hive Reporter: Slava Markeyev Priority: Minor Attachments: HIVE-10385.patch ETL jobs that create dynamic partitions with high cardinality perform the expensive step of metastore partition creation after query completion. Until bulk partition creation can be optimized there should be a way of optionally skipping this step. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10464) How i find the kryo version
[ https://issues.apache.org/jira/browse/HIVE-10464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-10464. Resolution: Invalid How i find the kryo version Key: HIVE-10464 URL: https://issues.apache.org/jira/browse/HIVE-10464 Project: Hive Issue Type: Improvement Reporter: ankush Could you please let me know how i find the kryo version that i using ? Please help me on this, We are just running HQL (Hive) queries -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10454) Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified.
[ https://issues.apache.org/jira/browse/HIVE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509403#comment-14509403 ] Xuefu Zhang commented on HIVE-10454: I think the point of strict mode is to prevent full scan all partitions of a table. In your case, while rows are filtered, the scanner will have to scan all partitions, which should be prevented by the virtue of the strict mode. Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified. --- Key: HIVE-10454 URL: https://issues.apache.org/jira/browse/HIVE-10454 Project: Hive Issue Type: Bug Reporter: Aihua Xu Assignee: Aihua Xu The following queries fail: {noformat} create table t1 (c1 int) PARTITIONED BY (c2 string); set hive.mapred.mode=strict; select * from t1 where t1.c2 to_date(date_add(from_unixtime( unix_timestamp() ),1)); {noformat} The query failed with No partition predicate found for alias t1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5672) Insert with custom separator not supported for non-local directory
[ https://issues.apache.org/jira/browse/HIVE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509440#comment-14509440 ] Xuefu Zhang commented on HIVE-5672: --- Looking at the patch, I'm not sure if I understand the changes correctly. I can see that we modified the grammar to make local optional and the rest is about refactoring. I'm not sure if this is sufficient. Did I miss anything? Also, instead of adding a new grammar rule, we should combine it with the old one. We just need to make KW_LOCAL optional. Insert with custom separator not supported for non-local directory -- Key: HIVE-5672 URL: https://issues.apache.org/jira/browse/HIVE-5672 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 1.0.0 Reporter: Romain Rigaux Assignee: Nemon Lou Attachments: HIVE-5672.1.patch, HIVE-5672.2.patch, HIVE-5672.3.patch, HIVE-5672.4.patch, HIVE-5672.5.patch, HIVE-5672.5.patch.tar.gz https://issues.apache.org/jira/browse/HIVE-3682 is great but non local directory don't seem to be supported: {code} insert overwrite directory '/tmp/test-02' row format delimited FIELDS TERMINATED BY ':' select description FROM sample_07 {code} {code} Error while compiling statement: FAILED: ParseException line 2:0 cannot recognize input near 'row' 'format' 'delimited' in select clause {code} This works (with 'local'): {code} insert overwrite local directory '/tmp/test-02' row format delimited FIELDS TERMINATED BY ':' select code, description FROM sample_07 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10302) Load small tables (for map join) in executor memory only once[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10302: --- Summary: Load small tables (for map join) in executor memory only once[Spark Branch] (was: Cache small tables in memory [Spark Branch]) Load small tables (for map join) in executor memory only once[Spark Branch] --- Key: HIVE-10302 URL: https://issues.apache.org/jira/browse/HIVE-10302 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10302.spark-1.patch If we can cache small tables in executor memory, we could save some time in loading them from HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10302: --- Summary: Load small tables (for map join) in executor memory only once [Spark Branch] (was: Load small tables (for map join) in executor memory only once[Spark Branch]) Load small tables (for map join) in executor memory only once [Spark Branch] Key: HIVE-10302 URL: https://issues.apache.org/jira/browse/HIVE-10302 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10302.spark-1.patch If we can cache small tables in executor memory, we could save some time in loading them from HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-10302: --- Description: Usually there are multiple cores in a Spark executor, and thus it's possible that multiple map-join tasks can be running in the same executor (concurrently or sequentially). Currently, each task will load its own copy of the small tables for map join into memory, ending up with inefficiency. Ideally, we only load the small tables once and share them among the tasks running in that executor. (was: If we can cache small tables in executor memory, we could save some time in loading them from HDFS.) Load small tables (for map join) in executor memory only once [Spark Branch] Key: HIVE-10302 URL: https://issues.apache.org/jira/browse/HIVE-10302 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10302.spark-1.patch Usually there are multiple cores in a Spark executor, and thus it's possible that multiple map-join tasks can be running in the same executor (concurrently or sequentially). Currently, each task will load its own copy of the small tables for map join into memory, ending up with inefficiency. Ideally, we only load the small tables once and share them among the tasks running in that executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10477) Provide option to disable Spark tests in Windows OS
[ https://issues.apache.org/jira/browse/HIVE-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511038#comment-14511038 ] Xuefu Zhang commented on HIVE-10477: I'm wondering it's possible to detect the OS type in pom.xml and skip spark test automatically if os is windows. Provide option to disable Spark tests in Windows OS --- Key: HIVE-10477 URL: https://issues.apache.org/jira/browse/HIVE-10477 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-10477.1.patch In the current master branch, unit tests fail with windows OS because of the dependency on bash executable in itests/hive-unit/pom.xml around these lines : {code} target exec executable=bash dir=${basedir} failonerror=true arg line=../target/download.sh/ /exec /target {code} We should provide an option to disable spark tests in OSes like Windows where bash might be absent -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-5672) Insert with custom separator not supported for non-local directory
[ https://issues.apache.org/jira/browse/HIVE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511084#comment-14511084 ] Xuefu Zhang commented on HIVE-5672: --- Here is what I have about the combined grammar: {code} (local = KW_LOCAL)? KW_DIRECTORY StringLiteral tableRowFormat? tableFileFormat? - ^(TOK_DIR StringLiteral $local? tableRowFormat? tableFileFormat?) {code} With this, I'm sure SemanticAnalyzer has the information about whether the directory is local. Insert with custom separator not supported for non-local directory -- Key: HIVE-5672 URL: https://issues.apache.org/jira/browse/HIVE-5672 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 1.0.0 Reporter: Romain Rigaux Assignee: Nemon Lou Attachments: HIVE-5672.1.patch, HIVE-5672.2.patch, HIVE-5672.3.patch, HIVE-5672.4.patch, HIVE-5672.5.patch, HIVE-5672.5.patch.tar.gz, HIVE-5672.6.patch, HIVE-5672.6.patch.tar.gz https://issues.apache.org/jira/browse/HIVE-3682 is great but non local directory don't seem to be supported: {code} insert overwrite directory '/tmp/test-02' row format delimited FIELDS TERMINATED BY ':' select description FROM sample_07 {code} {code} Error while compiling statement: FAILED: ParseException line 2:0 cannot recognize input near 'row' 'format' 'delimited' in select clause {code} This works (with 'local'): {code} insert overwrite local directory '/tmp/test-02' row format delimited FIELDS TERMINATED BY ':' select code, description FROM sample_07 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10312) SASL.QOP in JDBC URL is ignored for Delegation token Authentication
[ https://issues.apache.org/jira/browse/HIVE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513029#comment-14513029 ] Xuefu Zhang commented on HIVE-10312: [~mkazia], please feel free to update the doc. SASL.QOP in JDBC URL is ignored for Delegation token Authentication --- Key: HIVE-10312 URL: https://issues.apache.org/jira/browse/HIVE-10312 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 1.2.0 Reporter: Mubashir Kazia Assignee: Mubashir Kazia Fix For: 1.2.0 Attachments: HIVE-10312.1.patch, HIVE-10312.1.patch When HS2 is configured for QOP other than auth (auth-int or auth-conf), Kerberos client connection works fine when the JDBC URL specifies the matching QOP, however when this HS2 is accessed through Oozie (Delegation token / Digest authentication), connections fails because the JDBC driver ignores the SASL.QOP parameters in the JDBC URL. SASL.QOP setting should be valid for DIGEST Auth mech. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10487) remove non-ISO restriction that projections in a union have identical column names
[ https://issues.apache.org/jira/browse/HIVE-10487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513028#comment-14513028 ] Xuefu Zhang commented on HIVE-10487: Interesting. If the restriction is lifted, what's the column name of the result schema then? Does ISO say anything? remove non-ISO restriction that projections in a union have identical column names -- Key: HIVE-10487 URL: https://issues.apache.org/jira/browse/HIVE-10487 Project: Hive Issue Type: Improvement Components: SQL Affects Versions: 0.13.1 Reporter: N Campbell Priority: Critical While documented https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union an application should be able to perform a union query where the projections are union compatible which does not include the projected column names being identical which Hive imposes vs ISO-SQL 20xx. i.e rejected select c1 from t1 union all select c2 from t2 Schema of both sides of union should match. _u1-subquery2 accepted select c1 from t1 union all select c2 c1 from t2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10312) SASL.QOP in JDBC URL is ignored for Delegation token Authentication
[ https://issues.apache.org/jira/browse/HIVE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang resolved HIVE-10312. Resolution: Fixed Committed to master. Thanks, Mubashir! SASL.QOP in JDBC URL is ignored for Delegation token Authentication --- Key: HIVE-10312 URL: https://issues.apache.org/jira/browse/HIVE-10312 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 1.2.0 Reporter: Mubashir Kazia Assignee: Mubashir Kazia Fix For: 1.2.0 Attachments: HIVE-10312.1.patch, HIVE-10312.1.patch When HS2 is configured for QOP other than auth (auth-int or auth-conf), Kerberos client connection works fine when the JDBC URL specifies the matching QOP, however when this HS2 is accessed through Oozie (Delegation token / Digest authentication), connections fails because the JDBC driver ignores the SASL.QOP parameters in the JDBC URL. SASL.QOP setting should be valid for DIGEST Auth mech. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10312) SASL.QOP in JDBC URL is ignored for Delegation token Authentication
[ https://issues.apache.org/jira/browse/HIVE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508124#comment-14508124 ] Xuefu Zhang commented on HIVE-10312: +1 SASL.QOP in JDBC URL is ignored for Delegation token Authentication --- Key: HIVE-10312 URL: https://issues.apache.org/jira/browse/HIVE-10312 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 1.2.0 Reporter: Mubashir Kazia Assignee: Mubashir Kazia Fix For: 1.2.0 Attachments: HIVE-10312.1.patch, HIVE-10312.1.patch When HS2 is configured for QOP other than auth (auth-int or auth-conf), Kerberos client connection works fine when the JDBC URL specifies the matching QOP, however when this HS2 is accessed through Oozie (Delegation token / Digest authentication), connections fails because the JDBC driver ignores the SASL.QOP parameters in the JDBC URL. SASL.QOP setting should be valid for DIGEST Auth mech. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10454) Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified.
[ https://issues.apache.org/jira/browse/HIVE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509084#comment-14509084 ] Xuefu Zhang commented on HIVE-10454: I'm not sure if in strict mode an equal condition on partition column is expected. Otherwise, the query can still span to all or a large number of partitions. Query against partitioned table in strict mode failed with No partition predicate found even if partition predicate is specified. --- Key: HIVE-10454 URL: https://issues.apache.org/jira/browse/HIVE-10454 Project: Hive Issue Type: Bug Reporter: Aihua Xu Assignee: Aihua Xu The following queries fail: {noformat} create table t1 (c1 int) PARTITIONED BY (c2 string); set hive.mapred.mode=strict; select * from t1 where t1.c2 to_date(date_add(from_unixtime( unix_timestamp() ),1)); {noformat} The query failed with No partition predicate found for alias t1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)