Unable to connect to Spark thrift JDBC server with pluggable authentication
Hi, if Spark thrift JDBC server is started with non-secure mode, it is working fine. with a secured mode in case of pluggable authentication, I placed the authentication class configuration in conf/hive-site.xml property namehive.server2.authentication/name valueCUSTOM/value /property property namehive.server2.custom.authentication.class/name valueorg.apache.hive.service.auth.WebConsoleAuthenticationProviderImpl/value /property and the jar containing the implementation is in Spark classpath, still getting exception, it seems to me it couldn't find the authentication class I specified in the configuration: 14/10/17 12:44:33 ERROR server.TThreadPoolServer: Error occurred during processing of message. java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hive.service.auth.PasswdAuthenticationProvider.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.hive.service.auth.CustomAuthenticationProviderImpl.init(CustomAuthenticationProviderImpl.java:38) at org.apache.hive.service.auth.AuthenticationProviderFactory.getAuthenticationProvider(AuthenticationProviderFactory.java:57) at org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:61) at org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:127) at org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:509) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:264) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1176) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) at java.lang.Thread.run(Thread.java:853) Caused by: java.lang.NoSuchMethodException: org.apache.hive.service.auth.PasswdAuthenticationProvider.init() at java.lang.Class.throwNoSuchMethodException(Class.java:367) at java.lang.Class.getDeclaredConstructor(Class.java:541) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125) why is that? Thanks for your help! Jenny
Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database
Hi Yin, hive-site.xml was copied to spark/conf and the same as the one under $HIVE_HOME/conf. through hive cli, I don't see any problem. but for spark on yarn-cluster mode, I am not able to switch to a database other than the default one, for Yarn-client mode, it works fine. Thanks! Jenny On Tue, Aug 12, 2014 at 12:53 PM, Yin Huai huaiyin@gmail.com wrote: Hi Jenny, Have you copied hive-site.xml to spark/conf directory? If not, can you put it in conf/ and try again? Thanks, Yin On Mon, Aug 11, 2014 at 8:57 PM, Jenny Zhao linlin200...@gmail.com wrote: Thanks Yin! here is my hive-site.xml, which I copied from $HIVE_HOME/conf, didn't experience problem connecting to the metastore through hive. which uses DB2 as metastore database. ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- configuration property namehive.hwi.listen.port/name value/value /property property namehive.querylog.location/name value/var/ibm/biginsights/hive/query/${user.name}/value /property property namehive.metastore.warehouse.dir/name value/biginsights/hive/warehouse/value /property property namehive.hwi.war.file/name valuelib/hive-hwi-0.12.0.war/value /property property namehive.metastore.metrics.enabled/name valuetrue/value /property property namejavax.jdo.option.ConnectionURL/name valuejdbc:db2://hdtest022.svl.ibm.com:50001/BIDB/value /property property namejavax.jdo.option.ConnectionDriverName/name valuecom.ibm.db2.jcc.DB2Driver/value /property property namehive.stats.autogather/name valuefalse/value /property property namejavax.jdo.mapping.Schema/name valueHIVE/value /property property namejavax.jdo.option.ConnectionUserName/name valuecatalog/value /property property namejavax.jdo.option.ConnectionPassword/name valueV2pJNWMxbFlVbWhaZHowOQ==/value /property property namehive.metastore.password.encrypt/name valuetrue/value /property property nameorg.jpox.autoCreateSchema/name valuetrue/value /property property namehive.server2.thrift.min.worker.threads/name value5/value /property property namehive.server2.thrift.max.worker.threads/name value100/value /property property namehive.server2.thrift.port/name value1/value /property property namehive.server2.thrift.bind.host/name valuehdtest022.svl.ibm.com/value /property property namehive.server2.authentication/name valueCUSTOM/value /property property namehive.server2.custom.authentication.class/name valueorg.apache.hive.service.auth.WebConsoleAuthenticationProviderImpl/value /property property namehive.server2.enable.impersonation/name valuetrue/value /property property namehive.security.webconsole.url/name valuehttp://hdtest022.svl.ibm.com:8080/value /property property namehive.security.authorization.enabled/name valuetrue/value /property property namehive.security.authorization.createtable.owner.grants/name valueALL/value /property /configuration On Mon, Aug 11, 2014 at 4:29 PM, Yin Huai huaiyin@gmail.com wrote: Hi Jenny, How's your metastore configured for both Hive and Spark SQL? Which metastore mode are you using (based on https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin )? Thanks, Yin On Mon, Aug 11, 2014 at 6:15 PM, Jenny Zhao linlin200...@gmail.com wrote: you can reproduce this issue with the following steps (assuming you have Yarn cluster + Hive 12): 1) using hive shell, create a database, e.g: create database ttt 2) write a simple spark sql program import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.sql._ import org.apache.spark.sql.hive.HiveContext object HiveSpark { case class Record(key: Int, value: String) def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName(HiveSpark) val sc = new SparkContext(sparkConf) // A hive context creates an instance of the Hive Metastore in process, val hiveContext = new HiveContext(sc) import hiveContext._ hql(use ttt
Re: Spark sql failed in yarn-cluster mode when connecting to non-default hive database
Thanks Yin! here is my hive-site.xml, which I copied from $HIVE_HOME/conf, didn't experience problem connecting to the metastore through hive. which uses DB2 as metastore database. ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- configuration property namehive.hwi.listen.port/name value/value /property property namehive.querylog.location/name value/var/ibm/biginsights/hive/query/${user.name}/value /property property namehive.metastore.warehouse.dir/name value/biginsights/hive/warehouse/value /property property namehive.hwi.war.file/name valuelib/hive-hwi-0.12.0.war/value /property property namehive.metastore.metrics.enabled/name valuetrue/value /property property namejavax.jdo.option.ConnectionURL/name valuejdbc:db2://hdtest022.svl.ibm.com:50001/BIDB/value /property property namejavax.jdo.option.ConnectionDriverName/name valuecom.ibm.db2.jcc.DB2Driver/value /property property namehive.stats.autogather/name valuefalse/value /property property namejavax.jdo.mapping.Schema/name valueHIVE/value /property property namejavax.jdo.option.ConnectionUserName/name valuecatalog/value /property property namejavax.jdo.option.ConnectionPassword/name valueV2pJNWMxbFlVbWhaZHowOQ==/value /property property namehive.metastore.password.encrypt/name valuetrue/value /property property nameorg.jpox.autoCreateSchema/name valuetrue/value /property property namehive.server2.thrift.min.worker.threads/name value5/value /property property namehive.server2.thrift.max.worker.threads/name value100/value /property property namehive.server2.thrift.port/name value1/value /property property namehive.server2.thrift.bind.host/name valuehdtest022.svl.ibm.com/value /property property namehive.server2.authentication/name valueCUSTOM/value /property property namehive.server2.custom.authentication.class/name valueorg.apache.hive.service.auth.WebConsoleAuthenticationProviderImpl/value /property property namehive.server2.enable.impersonation/name valuetrue/value /property property namehive.security.webconsole.url/name valuehttp://hdtest022.svl.ibm.com:8080/value /property property namehive.security.authorization.enabled/name valuetrue/value /property property namehive.security.authorization.createtable.owner.grants/name valueALL/value /property /configuration On Mon, Aug 11, 2014 at 4:29 PM, Yin Huai huaiyin@gmail.com wrote: Hi Jenny, How's your metastore configured for both Hive and Spark SQL? Which metastore mode are you using (based on https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin )? Thanks, Yin On Mon, Aug 11, 2014 at 6:15 PM, Jenny Zhao linlin200...@gmail.com wrote: you can reproduce this issue with the following steps (assuming you have Yarn cluster + Hive 12): 1) using hive shell, create a database, e.g: create database ttt 2) write a simple spark sql program import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.sql._ import org.apache.spark.sql.hive.HiveContext object HiveSpark { case class Record(key: Int, value: String) def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName(HiveSpark) val sc = new SparkContext(sparkConf) // A hive context creates an instance of the Hive Metastore in process, val hiveContext = new HiveContext(sc) import hiveContext._ hql(use ttt) hql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING)) hql(LOAD DATA INPATH '/user/biadmin/kv1.txt' INTO TABLE src) // Queries are expressed in HiveQL println(Result of 'SELECT *': ) hql(SELECT * FROM src).collect.foreach(println) sc.stop() } } 3) run it in yarn-cluster mode. On Mon, Aug 11, 2014 at 9:44 AM, Cheng Lian lian.cs@gmail.com wrote: Since you were using hql(...), it’s probably not related to JDBC driver. But I failed to reproduce this issue locally with a single node pseudo distributed YARN cluster. Would you mind to elaborate more about steps to reproduce this bug? Thanks On Sun, Aug 10, 2014 at 9:36 PM, Cheng Lian
Spark sql failed in yarn-cluster mode when connecting to non-default hive database
Hi, I am able to run my hql query on yarn cluster mode when connecting to the default hive metastore defined in hive-site.xml. however, if I want to switch to a different database, like: hql(use other-database) it only works in yarn client mode, but failed on yarn-cluster mode with the following stack: 14/08/08 12:09:11 INFO HiveMetaStore: 0: get_database: tt 14/08/08 12:09:11 INFO audit: ugi=biadmin ip=unknown-ip-addr cmd=get_database: tt 14/08/08 12:09:11 ERROR RetryingHMSHandler: NoSuchObjectException(message:There is no database named tt) at org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:431) at org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:441) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124) at $Proxy15.getDatabase(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:628) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103) at $Proxy17.get_database(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:810) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) at $Proxy18.getDatabase(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1139) at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1128) at org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3479) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:208) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:182) at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:272) at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:269) at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:86) at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:91) at org.apache.spark.examples.sql.hive.HiveSpark$.main(HiveSpark.scala:35) at org.apache.spark.examples.sql.hive.HiveSpark.main(HiveSpark.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:186) 14/08/08 12:09:11 ERROR DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: Database does not exist: tt at org.apache.hadoop.hive.ql.exec.DDLTask.switchDatabase(DDLTask.java:3480) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:237) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at
Spark sql with hive table running on Yarn-cluster mode
Hi, For running spark sql, the dataneuclus*.jar are automatically added in classpath, this works fine for spark standalone mode and yarn-client mode, however, for Yarn-cluster mode, I have to explicitly put these jars using --jars option when submitting this job, otherwise, the job will fail, why it won't work for yarn-cluster mode? Thank you for your help! Jenny
Re: Spark sql unable to connect to db2 hive metastore
Thanks Michael! as I run it using spark-shell, so I added both jars through bin/spark-shell --jars options. I noticed if I don't pass these jars, it complains it couldn't find the driver, if I pass them through --jars options, it complains there is no suitable driver. Regards. On Tue, Jun 17, 2014 at 2:43 AM, Michael Armbrust mich...@databricks.com wrote: First a clarification: Spark SQL does not talk to HiveServer2, as that JDBC interface is for retrieving results from queries that are executed using Hive. Instead Spark SQL will execute queries itself by directly accessing your data using Spark. Spark SQL's Hive module can use JDBC to connect to an external metastore, in your case DB2. This is only used to retrieve the metadata (i.e., column names and types, HDFS locations for data) Looking at your exception I still see java.sql.SQLException: No suitable driver, so my guess would be that the DB2 JDBC drivers are not being correctly included. How are you trying to add them to the classpath? Michael On Tue, Jun 17, 2014 at 1:29 AM, Jenny Zhao linlin200...@gmail.com wrote: Hi, my hive configuration use db2 as it's metastore database, I have built spark with the extra step sbt/sbt assembly/assembly to include the dependency jars. and copied HIVE_HOME/conf/hive-site.xml under spark/conf. when I ran : hql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING)) got following exception, pasted portion of the stack trace here, looking at the stack, this made me wondering if Spark supports remote metastore configuration, it seems spark doesn't talk to hiveserver2 directly? the driver jars: db2jcc-10.5.jar, db2jcc_license_cisuz-10.5.jar both are included in the classpath, otherwise, it will complain it couldn't find the driver. Appreciate any help to resolve it. Thanks! Caused by: java.sql.SQLException: Unable to open a test connection to the given database. JDBC url = jdbc:db2://localhost:50001/BIDB, username = catalog. Terminating connection pool. Original Exception: -- java.sql.SQLException: No suitable driver at java.sql.DriverManager.getConnection(DriverManager.java:422) at java.sql.DriverManager.getConnection(DriverManager.java:374) at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:254) at com.jolbox.bonecp.BoneCP.init(BoneCP.java:305) at com.jolbox.bonecp.BoneCPDataSource.maybeInit(BoneCPDataSource.java:150) at com.jolbox.bonecp.BoneCPDataSource.getConnection(BoneCPDataSource.java:112) at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:479) at org.datanucleus.store.rdbms.RDBMSStoreManager.init(RDBMSStoreManager.java:304) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:56) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:39) at java.lang.reflect.Constructor.newInstance(Constructor.java:527) at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301) at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1069) at org.datanucleus.NucleusContext.initialise(NucleusContext.java:359) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:768) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:326) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965) at java.security.AccessController.doPrivileged(AccessController.java:277) at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960) at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701) at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:275) at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:304
Re: Spark sql unable to connect to db2 hive metastore
finally got it work out, mimicked how spark added datanucleus jars in compute-classpath.sh, and added the db2jcc*.jar in the classpath, it works now. Thanks! On Tue, Jun 17, 2014 at 10:50 AM, Jenny Zhao linlin200...@gmail.com wrote: Thanks Michael! as I run it using spark-shell, so I added both jars through bin/spark-shell --jars options. I noticed if I don't pass these jars, it complains it couldn't find the driver, if I pass them through --jars options, it complains there is no suitable driver. Regards. On Tue, Jun 17, 2014 at 2:43 AM, Michael Armbrust mich...@databricks.com wrote: First a clarification: Spark SQL does not talk to HiveServer2, as that JDBC interface is for retrieving results from queries that are executed using Hive. Instead Spark SQL will execute queries itself by directly accessing your data using Spark. Spark SQL's Hive module can use JDBC to connect to an external metastore, in your case DB2. This is only used to retrieve the metadata (i.e., column names and types, HDFS locations for data) Looking at your exception I still see java.sql.SQLException: No suitable driver, so my guess would be that the DB2 JDBC drivers are not being correctly included. How are you trying to add them to the classpath? Michael On Tue, Jun 17, 2014 at 1:29 AM, Jenny Zhao linlin200...@gmail.com wrote: Hi, my hive configuration use db2 as it's metastore database, I have built spark with the extra step sbt/sbt assembly/assembly to include the dependency jars. and copied HIVE_HOME/conf/hive-site.xml under spark/conf. when I ran : hql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING)) got following exception, pasted portion of the stack trace here, looking at the stack, this made me wondering if Spark supports remote metastore configuration, it seems spark doesn't talk to hiveserver2 directly? the driver jars: db2jcc-10.5.jar, db2jcc_license_cisuz-10.5.jar both are included in the classpath, otherwise, it will complain it couldn't find the driver. Appreciate any help to resolve it. Thanks! Caused by: java.sql.SQLException: Unable to open a test connection to the given database. JDBC url = jdbc:db2://localhost:50001/BIDB, username = catalog. Terminating connection pool. Original Exception: -- java.sql.SQLException: No suitable driver at java.sql.DriverManager.getConnection(DriverManager.java:422) at java.sql.DriverManager.getConnection(DriverManager.java:374) at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:254) at com.jolbox.bonecp.BoneCP.init(BoneCP.java:305) at com.jolbox.bonecp.BoneCPDataSource.maybeInit(BoneCPDataSource.java:150) at com.jolbox.bonecp.BoneCPDataSource.getConnection(BoneCPDataSource.java:112) at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:479) at org.datanucleus.store.rdbms.RDBMSStoreManager.init(RDBMSStoreManager.java:304) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:56) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:39) at java.lang.reflect.Constructor.newInstance(Constructor.java:527) at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301) at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1069) at org.datanucleus.NucleusContext.initialise(NucleusContext.java:359) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:768) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:326) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965) at java.security.AccessController.doPrivileged(AccessController.java:277) at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960) at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java
Spark sql unable to connect to db2 hive metastore
Hi, my hive configuration use db2 as it's metastore database, I have built spark with the extra step sbt/sbt assembly/assembly to include the dependency jars. and copied HIVE_HOME/conf/hive-site.xml under spark/conf. when I ran : hql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING)) got following exception, pasted portion of the stack trace here, looking at the stack, this made me wondering if Spark supports remote metastore configuration, it seems spark doesn't talk to hiveserver2 directly? the driver jars: db2jcc-10.5.jar, db2jcc_license_cisuz-10.5.jar both are included in the classpath, otherwise, it will complain it couldn't find the driver. Appreciate any help to resolve it. Thanks! Caused by: java.sql.SQLException: Unable to open a test connection to the given database. JDBC url = jdbc:db2://localhost:50001/BIDB, username = catalog. Terminating connection pool. Original Exception: -- java.sql.SQLException: No suitable driver at java.sql.DriverManager.getConnection(DriverManager.java:422) at java.sql.DriverManager.getConnection(DriverManager.java:374) at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:254) at com.jolbox.bonecp.BoneCP.init(BoneCP.java:305) at com.jolbox.bonecp.BoneCPDataSource.maybeInit(BoneCPDataSource.java:150) at com.jolbox.bonecp.BoneCPDataSource.getConnection(BoneCPDataSource.java:112) at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:479) at org.datanucleus.store.rdbms.RDBMSStoreManager.init(RDBMSStoreManager.java:304) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:56) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:39) at java.lang.reflect.Constructor.newInstance(Constructor.java:527) at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301) at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1069) at org.datanucleus.NucleusContext.initialise(NucleusContext.java:359) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:768) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:326) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965) at java.security.AccessController.doPrivileged(AccessController.java:277) at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960) at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701) at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:275) at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:304) at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:234) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:209) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.metastore.RetryingRawStore.init(RetryingRawStore.java:64) at org.apache.hadoop.hive.metastore.RetryingRawStore.getProxy(RetryingRawStore.java:73) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:415) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:402) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:441) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:326) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:286) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.init(RetryingHMSHandler.java:54) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
Re: Invalid Class Exception
we experienced similar issue in our environment, below is the whole stack trace, it works fine if we run local mode, if we run it in cluster mode (even with Master and 1 worker on the same node), we have this serialversionUID issue. we use Spark 1.0.0 and compiled with JDK6. here is a link about serialVersionUID and suggestion on using it for Serializable class.. which suggests to define a serialVersionUID in the serializable class http://stackoverflow.com/questions/285793/what-is-a-serialversionuid-and-why-should-i-use-it 14/06/05 09:52:18 WARN scheduler.TaskSetManager: Lost TID 9 (task 1.0:9) 14/06/05 09:52:18 WARN scheduler.TaskSetManager: Loss was due to java.io.InvalidClassException java.io.InvalidClassException: org.apache.spark.SerializableWritable; local class incompatible: stream classdesc serialVersionUID = 6301214776158303468, local class serialVersionUID = -7785455416944904980 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:630) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1600) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1513) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1749) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:365) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:165) at org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1039) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1866) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1964) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1888) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1964) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1888) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:365) at scala.collection.immutable.$colon$colon.readObject(List.scala:362) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1039) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1866) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1964) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1888) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:365) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63) at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1809) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1768) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1346) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:365) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:195) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at
configure spark history server for running on Yarn
Hi, I have installed spark 1.0 from the branch-1.0, build went fine, and I have tried running the example on Yarn client mode, here is my command: /home/hadoop/spark-branch-1.0/bin/spark-submit /home/hadoop/spark-branch-1.0/examples/target/scala-2.10/spark-examples-1.0.0-hadoop2.2.0.jar --master yarn --deploy-mode client --executor-memory 6g --executor-cores 3 --driver-memory 3g --name SparkPi --num-executors 2 --class org.apache.spark.examples.SparkPi yarn-client 5 after the run, I was not being able to retrieve the log from Yarn's web UI, while I have tried to specify the history server in spark-env.sh export SPARK_DAEMON_JAVA_OPTS=-Dspark.yarn.historyServer.address= master:18080 http://hdtest022.svl.ibm.com:18080 I also tried to specify it in spark-defaults.conf, doesn't work as well, I would appreciate if someone can tell me what is the way of specifying it either in spark-env.sh or spark-defaults.conf, so that this option can be applied to any spark application. another thing I found is the usage output for spark-submit is not complete/not in sync with the online documentation, hope it is addressed with the formal release. and is this the latest documentation for spark 1.0? http://people.csail.mit.edu/matei/spark-unified-docs/running-on-yarn.html Thank you!
Problem with running LogisticRegression in spark cluster mode
Hi all, I have been able to run LR in local mode, but I am facing problem to run it in cluster mode, below is the source script, and stack trace when running it cluster mode, I used sbt package to build the project, not sure what it is complaining? another question I have is for LogisticRegression itself: 1) I noticed, the LogisticRegressionWithSGD doesn't ask information about the input features, for instance, if the feature is scale, norminal or ordinal, or if MLLib only supports scale features? 2) Trainning error is pretty high even when the iteration is set to very high, do we have number about the accuracy rate of LR model? Thank you for your help! /** * Logistic regression */ object SparkLogisticRegression { def main(args: Array[String]) { if ( args.length != 3) { System.err.println(Usage: SparkLogisticRegression master input file path number of iterations] ) System.exit(1) } val numIterations = args(2).toInt; val sc = new SparkContext(args(0), SparkLogisticRegression, System.getenv(SPARK_HOME), SparkContext.jarOfClass(this.getClass)) // parse in the input data val data = sc.textFile(args(1)) val lpoints = data.map{ line = val parts = line.split(',') LabeledPoint(parts(0).toDouble, parts.tail.map( x = x.toDouble).toArray) } // setup LR val model = LogisticRegressionWithSGD.train(lpoints, numIterations) val labelPred = lpoints.map { p = val pred = model.predict(p.features) (p.label, pred) } val predErr = labelPred.filter (r = r._1 != r._2).count println(Training Error: + predErr.toDouble/lpoints.count + + predErr + / + lpoints.count) } } 14/04/09 14:50:48 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0) 14/04/09 14:50:48 WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassNotFoundException java.lang.ClassNotFoundException: SparkLinearRegression$$anonfun$2 at java.lang.Class.forName(Class.java:211) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1609) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1768) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1988) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1795) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:364) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63) at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1793) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:364) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:195) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:906) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:929) at java.lang.Thread.run(Thread.java:796) 14/04/09 14:50:48 WARN scheduler.TaskSetManager: Lost TID 1 (task 0.0:1) 14/04/09 14:50:48 INFO scheduler.TaskSetManager: Loss was due to java.lang.ClassNotFoundException: SparkLinearRegression$$anonfun$2 [duplicate 1] 14/04/09 14:50:48 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID 2 on executor 1: hdtest022.svl.ibm.com (NODE_LOCAL) 14/04/09 14:50:48 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as 1696 bytes in 0 ms 14/04/09 14:50:48 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID 3 on executor 0: hdtest023.svl.ibm.com (NODE_LOCAL) 14/04/09 14:50:48 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as 1696 bytes in 0 ms 14/04/09 14:50:48 WARN scheduler.TaskSetManager: Lost TID 3 (task 0.0:0) 14/04/09 14:50:48 INFO scheduler.TaskSetManager: Loss was due to
Re: Problem with running LogisticRegression in spark cluster mode
Hi Jagat, yes, I did specify mllib in build.sbt name := Spark LogisticRegression version :=1.0 scalaVersion := 2.10.3 libraryDependencies += org.apache.spark % spark-core_2.10 % 0.9.0-incubating libraryDependencies += org.apache.spark % spark-mllib_2.10 % 0.9.0-incubating libraryDependencies += org.apache.hadoop % hadoop-client % 1.2.1 resolvers += Akka Repository at http://repo.akka.io/releases/; On Wed, Apr 9, 2014 at 3:23 PM, Jagat Singh jagatsi...@gmail.com wrote: Hi Jenny, How are you packaging your jar. Can you please confirm if you have included the Mlib jar inside the fat jar you have created for your code. libraryDependencies += org.apache.spark % spark-mllib_2.9.3 % 0.8.1-incubating Thanks, Jagat Singh On Thu, Apr 10, 2014 at 8:05 AM, Jenny Zhao linlin200...@gmail.comwrote: Hi all, I have been able to run LR in local mode, but I am facing problem to run it in cluster mode, below is the source script, and stack trace when running it cluster mode, I used sbt package to build the project, not sure what it is complaining? another question I have is for LogisticRegression itself: 1) I noticed, the LogisticRegressionWithSGD doesn't ask information about the input features, for instance, if the feature is scale, norminal or ordinal, or if MLLib only supports scale features? 2) Trainning error is pretty high even when the iteration is set to very high, do we have number about the accuracy rate of LR model? Thank you for your help! /** * Logistic regression */ object SparkLogisticRegression { def main(args: Array[String]) { if ( args.length != 3) { System.err.println(Usage: SparkLogisticRegression master input file path number of iterations] ) System.exit(1) } val numIterations = args(2).toInt; val sc = new SparkContext(args(0), SparkLogisticRegression, System.getenv(SPARK_HOME), SparkContext.jarOfClass(this.getClass)) // parse in the input data val data = sc.textFile(args(1)) val lpoints = data.map{ line = val parts = line.split(',') LabeledPoint(parts(0).toDouble, parts.tail.map( x = x.toDouble).toArray) } // setup LR val model = LogisticRegressionWithSGD.train(lpoints, numIterations) val labelPred = lpoints.map { p = val pred = model.predict(p.features) (p.label, pred) } val predErr = labelPred.filter (r = r._1 != r._2).count println(Training Error: + predErr.toDouble/lpoints.count + + predErr + / + lpoints.count) } } 14/04/09 14:50:48 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0) 14/04/09 14:50:48 WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassNotFoundException java.lang.ClassNotFoundException: SparkLinearRegression$$anonfun$2 at java.lang.Class.forName(Class.java:211) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1609) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1514) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1768) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1988) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1795) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:364) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63) at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1793) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:364) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:195) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:906