[jira] [Updated] (SPARK-23518) Avoid metastore access when users only want to read and store data frames
[ https://issues.apache.org/jira/browse/SPARK-23518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Liu updated SPARK-23518: - Description: https://issues.apache.org/jira/browse/SPARK-21732 added one patch, which allows a spark session to be created when the hive metastore server is down. However, it does not allow running any commands with the spark session. So the users could not read / write data frames, when the hive metastore server is down. (was: This is to followup https://issues.apache.org/jira/browse/SPARK-21732, which allows a spark session to be created when the hive metastore server is down. However, it does not allow running any commands with the spark session. ) > Avoid metastore access when users only want to read and store data frames > - > > Key: SPARK-23518 > URL: https://issues.apache.org/jira/browse/SPARK-23518 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Feng Liu >Priority: Major > > https://issues.apache.org/jira/browse/SPARK-21732 added one patch, which > allows a spark session to be created when the hive metastore server is down. > However, it does not allow running any commands with the spark session. So > the users could not read / write data frames, when the hive metastore server > is down. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23518) Avoid metastore access when users only want to read and store data frames
[ https://issues.apache.org/jira/browse/SPARK-23518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Liu updated SPARK-23518: - Summary: Avoid metastore access when users only want to read and store data frames (was: Completely remove metastore access if the query is not using tables) > Avoid metastore access when users only want to read and store data frames > - > > Key: SPARK-23518 > URL: https://issues.apache.org/jira/browse/SPARK-23518 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Feng Liu >Priority: Major > > This is to followup https://issues.apache.org/jira/browse/SPARK-21732, which > allows a spark session to be created when the hive metastore server is down. > However, it does not allow running any commands with the spark session. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23518) Completely remove metastore access if the query is not using tables
[ https://issues.apache.org/jira/browse/SPARK-23518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Liu updated SPARK-23518: - Description: This is to followup https://issues.apache.org/jira/browse/SPARK-21732, which allows a spark session to be created when the hive metastore server is down. However, it does not allow running any commands with the spark session. (was: This is to followup https://issues.apache.org/jira/browse/SPARK-21732, which allows a spark session to be created when the hive metastore server is down. However, it does not allow running any commands with spark sessions. ) > Completely remove metastore access if the query is not using tables > --- > > Key: SPARK-23518 > URL: https://issues.apache.org/jira/browse/SPARK-23518 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Feng Liu >Priority: Major > > This is to followup https://issues.apache.org/jira/browse/SPARK-21732, which > allows a spark session to be created when the hive metastore server is down. > However, it does not allow running any commands with the spark session. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23518) Completely remove metastore access if the query is not using tables
Feng Liu created SPARK-23518: Summary: Completely remove metastore access if the query is not using tables Key: SPARK-23518 URL: https://issues.apache.org/jira/browse/SPARK-23518 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.0 Reporter: Feng Liu This is to followup https://issues.apache.org/jira/browse/SPARK-21732, which allows a spark session to be created when the hive metastore server is down. However, it does not allow running any commands with spark sessions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23379) remove redundant metastore access if the current database name is the same
Feng Liu created SPARK-23379: Summary: remove redundant metastore access if the current database name is the same Key: SPARK-23379 URL: https://issues.apache.org/jira/browse/SPARK-23379 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.1 Reporter: Feng Liu We should be able to reduce one metastore access if the target database name is as same as the current database: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L295 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23378) move setCurrentDatabase from HiveExternalCatalog to HiveClientImpl
Feng Liu created SPARK-23378: Summary: move setCurrentDatabase from HiveExternalCatalog to HiveClientImpl Key: SPARK-23378 URL: https://issues.apache.org/jira/browse/SPARK-23378 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.1 Reporter: Feng Liu Conceptually, no methods of HiveExternalCatalog, besides the `setCurrentDatabase`, should change the `currentDatabase` in the hive session state. We can enforce this rule by removing the usage of `setCurrentDatabase` in the HiveExternalCatalog. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23259) Clean up legacy code around hive external catalog
Feng Liu created SPARK-23259: Summary: Clean up legacy code around hive external catalog Key: SPARK-23259 URL: https://issues.apache.org/jira/browse/SPARK-23259 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Feng Liu Some legacy code around the hive metastore catalog need to be removed for further code improvement: # in HiveExternalCatalog: The `withClient` wrapper is not necessary for the private method `getRawTable`. # in HiveClientImpl: The statement `runSqlHive()` is not necessary for the `addJar` method, after the jar being added to the single class loader. # in HiveClientImpl: There are some redundant code in both the `tableExists` and `getTableOption` method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22891) NullPointerException when use udf
[ https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305881#comment-16305881 ] Feng Liu commented on SPARK-22891: -- A side note: if we don't want to merge https://github.com/apache/spark/pull/20029, we should make the creation of hive client lazy inside the HiveSessionResourceLoader: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala#L123 as we know the hive client creation is expensive, so it does not make sense to materialize it if we don't use it. > NullPointerException when use udf > - > > Key: SPARK-22891 > URL: https://issues.apache.org/jira/browse/SPARK-22891 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.2.1 > Environment: hadoop 2.7.2 >Reporter: gaoyang >Priority: Minor > > In my application,i use multi threads. Each thread has a SparkSession and use > SparkSession.sqlContext.udf.register to register my udf. Sometimes there > throws exception like this: > {code:java} > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveSessionStateBuilder': > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136) > at > org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133) > at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207) > at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203) > at > com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63) > at > ... 20 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289) > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059) > ... 20 more > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:744) > at > org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1391) > at > org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:210) > ... 34 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:769) > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:736) > ... 36 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.isCompatibleWith(HiveMetaStoreClient.java:287) > at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) > at
[jira] [Comment Edited] (SPARK-22891) NullPointerException when use udf
[ https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305862#comment-16305862 ] Feng Liu edited comment on SPARK-22891 at 12/29/17 12:56 AM: - This is definitely caused by the race from https://issues.apache.org/jira/browse/HIVE-11935. In spark 2.1, spark creates the `metadataHive` lazily until `addJar`(https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala#L40), so this can only be triggered by concurrent `addJar` (can't imagine this will happen in practice) In spark 2.2, the `metadataHive` creation is tied to the `resourceLoader` creation (see the stack trace), so it starts to be triggered by new spark session creation. In https://github.com/apache/spark/pull/20029, I'm trying to make an argument that it is safe to remove the new hive client creation. Besides this change, I think we should also make the hive client creation thread safe: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L251 was (Author: liufeng...@gmail.com): This is definitely caused by the race from https://issues.apache.org/jira/browse/HIVE-11935. In spark 2.1, spark creates the `metadataHive` lazily until `addJar`(https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala#L40), so this can only be triggered by concurrent `addJar` (can't imagine this will happen in practice) In spark 2.2, the `metadataHive` creation is tied to the `resourceLoader` creation (see the stack trace), so it starts to be triggered by new spark session creation. In https://github.com/apache/spark/pull/20029, I'm trying to make an argument that it is safe to remove the new hive client creation. Besides this change, I think should also make the hive client creation thread safe: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L251 > NullPointerException when use udf > - > > Key: SPARK-22891 > URL: https://issues.apache.org/jira/browse/SPARK-22891 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.2.1 > Environment: hadoop 2.7.2 >Reporter: gaoyang >Priority: Minor > > In my application,i use multi threads. Each thread has a SparkSession and use > SparkSession.sqlContext.udf.register to register my udf. Sometimes there > throws exception like this: > {code:java} > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveSessionStateBuilder': > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136) > at > org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133) > at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207) > at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203) > at > com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63) > at > ... 20 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52) > at >
[jira] [Comment Edited] (SPARK-22891) NullPointerException when use udf
[ https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305862#comment-16305862 ] Feng Liu edited comment on SPARK-22891 at 12/29/17 12:49 AM: - This is definitely caused by the race from https://issues.apache.org/jira/browse/HIVE-11935. In spark 2.1, spark creates the `metadataHive` lazily until `addJar`(https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala#L40), so this can only be triggered by concurrent `addJar` (can't imagine this will happen in practice) In spark 2.2, the `metadataHive` creation is tied to the `resourceLoader` creation (see the stack trace), so it starts to be triggered by new spark session creation. In https://github.com/apache/spark/pull/20029, I'm trying to make an argument that it is safe to remove the new hive client creation. Besides this change, I think should also make the hive client creation thread safe: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L251 was (Author: liufeng...@gmail.com): This is definitely caused by the race from https://issues.apache.org/jira/browse/HIVE-11935. In spark 2.1, spark creates the `metadataHive` lazily until `addJar`(https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala#L40), so this can only be triggered by concurrent `addJar` (can't imagine this will happen in practice) In spark 2.2, the `metadataHive` creation is tied to the `resourceLoader` (see the stack trace), so it starts to be triggered by new spark session creation. In https://github.com/apache/spark/pull/20029, I'm trying to make an argument that it is safe to remove the new hive client creation. Besides change, I think should also make the hive client creation thread safe: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L251 > NullPointerException when use udf > - > > Key: SPARK-22891 > URL: https://issues.apache.org/jira/browse/SPARK-22891 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.2.1 > Environment: hadoop 2.7.2 >Reporter: gaoyang >Priority: Minor > > In my application,i use multi threads. Each thread has a SparkSession and use > SparkSession.sqlContext.udf.register to register my udf. Sometimes there > throws exception like this: > {code:java} > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveSessionStateBuilder': > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136) > at > org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133) > at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207) > at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203) > at > com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63) > at > ... 20 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52) > at >
[jira] [Commented] (SPARK-22891) NullPointerException when use udf
[ https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305862#comment-16305862 ] Feng Liu commented on SPARK-22891: -- This is definitely caused by the race from https://issues.apache.org/jira/browse/HIVE-11935. In spark 2.1, spark creates the `metadataHive` lazily until `addJar`(https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala#L40), so this can only be triggered by concurrent `addJar` (can't imagine this will happen in practice) In spark 2.2, the `metadataHive` creation is tied to the `resourceLoader` (see the stack trace), so it starts to be triggered by new spark session creation. In https://github.com/apache/spark/pull/20029, I'm trying to make an argument that it is safe to remove the new hive client creation. Besides change, I think should also make the hive client creation thread safe: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L251 > NullPointerException when use udf > - > > Key: SPARK-22891 > URL: https://issues.apache.org/jira/browse/SPARK-22891 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.2.1 > Environment: hadoop 2.7.2 >Reporter: gaoyang >Priority: Minor > > In my application,i use multi threads. Each thread has a SparkSession and use > SparkSession.sqlContext.udf.register to register my udf. Sometimes there > throws exception like this: > {code:java} > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveSessionStateBuilder': > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137) > at > org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136) > at > org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133) > at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207) > at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203) > at > com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63) > at > ... 20 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289) > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059) > ... 20 more > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:744) > at > org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1391) > at > org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:210) > ... 34 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException > at > org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:769) > at > org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:736) > ... 36 more > Caused by:
[jira] [Created] (SPARK-22916) shouldn't bias towards build right if user does not specify
Feng Liu created SPARK-22916: Summary: shouldn't bias towards build right if user does not specify Key: SPARK-22916 URL: https://issues.apache.org/jira/browse/SPARK-22916 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Feng Liu This is an issue very similar to SPARK-22489. When there are no broadcast hints, the current spark strategies will prefer to build right, without considering the sizes of the two sides. To reproduce: {code:java} import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec spark.createDataFrame(Seq((1, "4"), (2, "2"))).toDF("key", "value").createTempView("table1") spark.createDataFrame(Seq((1, "1"), (2, "2"), (3, "3"))).toDF("key", "value").createTempView("table2") val bl = sql(s"SELECT * FROM table1 t1 JOIN table2 t2 ON t1.key = t2.key").queryExecution.executedPlan {code} The plan is going to broadcast right side (`t2`), even though it is larger. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22254) clean up the implementation of `growToSize` in CompactBuffer
Feng Liu created SPARK-22254: Summary: clean up the implementation of `growToSize` in CompactBuffer Key: SPARK-22254 URL: https://issues.apache.org/jira/browse/SPARK-22254 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.1 Reporter: Feng Liu two issues: 1. the arrayMax should be `ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH ` 2. I believe some `-2` were introduced because `Integer.Max_Value` was used previously. We should make the calculation of newArrayLen concise. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22222) Fix the ARRAY_MAX in BufferHolder and add a test
Feng Liu created SPARK-2: Summary: Fix the ARRAY_MAX in BufferHolder and add a test Key: SPARK-2 URL: https://issues.apache.org/jira/browse/SPARK-2 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.1 Reporter: Feng Liu This is actually a followup for SPARK-22033, which set the `ARRAY_MAX` to `Int.MaxValue - 8`. It is not a valid number because it will cause the following line fail when such a large byte array is allocated: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java#L86 We need to make sure the new length is a multiple of 8. We need to add one test for the fix. Note that the test should work independently with the heap size of the test JVM. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22003) vectorized reader does not work with UDF when the column is array
Feng Liu created SPARK-22003: Summary: vectorized reader does not work with UDF when the column is array Key: SPARK-22003 URL: https://issues.apache.org/jira/browse/SPARK-22003 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.2.0 Reporter: Feng Liu The UDF needs to deserialize the UnsafeRow. When the column type is Array, the `get` method from the ColumnVector, which is used by the vectorized reader, is called, but this method is not implemented, unfortunately. Code to reproduce the issue: {code:java} val fileName = "testfile" val str = """{ "choices": ["key1", "key2", "key3"] }""" val rdd = sc.parallelize(Seq(str)) val df = spark.read.json(rdd) df.write.mode("overwrite").parquet(s"file:///tmp/$fileName ") import org.apache.spark.sql._ import org.apache.spark.sql.functions._ spark.udf.register("acf", (rows: Seq[Row]) => Option[String](null)) spark.read.parquet(s"file:///tmp/$fileName ").select(expr("""acf(choices)""")).show {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21188) releaseAllLocksForTask should synchronize the whole method
Feng Liu created SPARK-21188: Summary: releaseAllLocksForTask should synchronize the whole method Key: SPARK-21188 URL: https://issues.apache.org/jira/browse/SPARK-21188 Project: Spark Issue Type: Bug Components: Block Manager, Spark Core Affects Versions: 2.1.0, 2.2.0 Reporter: Feng Liu Since the objects readLocksByTask, writeLocksByTask and infos are coupled and supposed to be modified by other threads concurrently, all the read and writes of them in the releaseAllLocksForTask method should be protected by a single synchronized block. The fine-grained synchronization in the current code can cause some test flakiness. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20991) BROADCAST_TIMEOUT conf should be a timeoutConf
Feng Liu created SPARK-20991: Summary: BROADCAST_TIMEOUT conf should be a timeoutConf Key: SPARK-20991 URL: https://issues.apache.org/jira/browse/SPARK-20991 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0, 2.2.1 Reporter: Feng Liu -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org