[jira] [Commented] (SPARK-21697) NPE & ExceptionInInitializerError trying to load UTF from HDFS
[ https://issues.apache.org/jira/browse/SPARK-21697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328928#comment-16328928 ] Steve Loughran commented on SPARK-21697: No, it's spark's ability to have hdfs:// URLs on the classpath. The classpath is being scanned for commons logging properties, which is forcing in HDFS which is then NPEing as the logging code is being called before commons-logging is fully set up. Kind of a recursive class init problem triggered by a scan for commons-logging.properties. > NPE & ExceptionInInitializerError trying to load UTF from HDFS > -- > > Key: SPARK-21697 > URL: https://issues.apache.org/jira/browse/SPARK-21697 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 > Environment: Spark Client mode, Hadoop 2.6.0 >Reporter: Steve Loughran >Priority: Minor > > Reported on [the > PR|https://github.com/apache/spark/pull/17342#issuecomment-321438157] for > SPARK-12868: trying to load a UDF of HDFS is triggering an > {{ExceptionInInitializerError}}, caused by an NPE which should only happen if > the commons-logging {{LOG}} log is null. > Hypothesis: the commons logging scan for {{commons-logging.properties}} is > happening in the classpath with the HDFS JAR; this is triggering a D/L of the > JAR, which needs to force in commons-logging, and, as that's not inited yet, > NPEs -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21697) NPE & ExceptionInInitializerError trying to load UTF from HDFS
[ https://issues.apache.org/jira/browse/SPARK-21697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328777#comment-16328777 ] Sean Owen commented on SPARK-21697: --- Isn't this an HDFS problem? what could Spark do about it? > NPE & ExceptionInInitializerError trying to load UTF from HDFS > -- > > Key: SPARK-21697 > URL: https://issues.apache.org/jira/browse/SPARK-21697 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 > Environment: Spark Client mode, Hadoop 2.6.0 >Reporter: Steve Loughran >Priority: Minor > > Reported on [the > PR|https://github.com/apache/spark/pull/17342#issuecomment-321438157] for > SPARK-12868: trying to load a UDF of HDFS is triggering an > {{ExceptionInInitializerError}}, caused by an NPE which should only happen if > the commons-logging {{LOG}} log is null. > Hypothesis: the commons logging scan for {{commons-logging.properties}} is > happening in the classpath with the HDFS JAR; this is triggering a D/L of the > JAR, which needs to force in commons-logging, and, as that's not inited yet, > NPEs -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21697) NPE & ExceptionInInitializerError trying to load UTF from HDFS
[ https://issues.apache.org/jira/browse/SPARK-21697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328304#comment-16328304 ] h2s commented on SPARK-21697: - my environment : spark-2.2.0 hadoop-2.10. delete spark/jars/commons-logging-1.1.3.jar,then it works well > NPE & ExceptionInInitializerError trying to load UTF from HDFS > -- > > Key: SPARK-21697 > URL: https://issues.apache.org/jira/browse/SPARK-21697 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 > Environment: Spark Client mode, Hadoop 2.6.0 >Reporter: Steve Loughran >Priority: Minor > > Reported on [the > PR|https://github.com/apache/spark/pull/17342#issuecomment-321438157] for > SPARK-12868: trying to load a UDF of HDFS is triggering an > {{ExceptionInInitializerError}}, caused by an NPE which should only happen if > the commons-logging {{LOG}} log is null. > Hypothesis: the commons logging scan for {{commons-logging.properties}} is > happening in the classpath with the HDFS JAR; this is triggering a D/L of the > JAR, which needs to force in commons-logging, and, as that's not inited yet, > NPEs -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21697) NPE & ExceptionInInitializerError trying to load UTF from HDFS
[ https://issues.apache.org/jira/browse/SPARK-21697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16207098#comment-16207098 ] Bang Xiao commented on SPARK-21697: --- in the case describe above, i added "spark jars : file:///xxx.jar" in conf/spark-defaults.conf, then when i use "add jar" through SparkSQL CLI in yarn-client mode, it occurs the error. if i added "spark jars : file:///xxx.jar, hdfs://xxx.jar" in conf/spark-defaults.conf. i can "add jar hdfs:///.jar" successfully through SparkSQL CLI in yarn-client mode > NPE & ExceptionInInitializerError trying to load UTF from HDFS > -- > > Key: SPARK-21697 > URL: https://issues.apache.org/jira/browse/SPARK-21697 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 > Environment: Spark Client mode, Hadoop 2.6.0 >Reporter: Steve Loughran >Priority: Minor > > Reported on [the > PR|https://github.com/apache/spark/pull/17342#issuecomment-321438157] for > SPARK-12868: trying to load a UDF of HDFS is triggering an > {{ExceptionInInitializerError}}, caused by an NPE which should only happen if > the commons-logging {{LOG}} log is null. > Hypothesis: the commons logging scan for {{commons-logging.properties}} is > happening in the classpath with the HDFS JAR; this is triggering a D/L of the > JAR, which needs to force in commons-logging, and, as that's not inited yet, > NPEs -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21697) NPE & ExceptionInInitializerError trying to load UTF from HDFS
[ https://issues.apache.org/jira/browse/SPARK-21697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123208#comment-16123208 ] Steve Loughran commented on SPARK-21697: What would a test to replicate look like? # Create MiniDFS cluster (better: find a test with one already spun up @ class level) # copy JAR to it (issue: can't rely on the local test suite being in a JAR due to SBT & IDEs doing things differently/faster than maven) # add JAR to CP/create a CP which *only* has the JAR in # Load something from the CP which triggers download. If you can assume that some common library (junit.jar?) is always in a JAR then the JAR could be uploaded by: locating its URL, translate to local path & then use FileSystm.copyFromLocalFile() to upload. Or: create/find a UDF JAR, copy to MiniDFSCluster, start spark SQL with the HDFS URL. This would verify the desired codepath & be best to make sure its gone away > NPE & ExceptionInInitializerError trying to load UTF from HDFS > -- > > Key: SPARK-21697 > URL: https://issues.apache.org/jira/browse/SPARK-21697 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 > Environment: Spark Client mode, Hadoop 2.6.0 >Reporter: Steve Loughran >Priority: Minor > > Reported on [the > PR|https://github.com/apache/spark/pull/17342#issuecomment-321438157] for > SPARK-12868: trying to load a UDF of HDFS is triggering an > {{ExceptionInInitializerError}}, caused by an NPE which should only happen if > the commons-logging {{LOG}} log is null. > Hypothesis: the commons logging scan for {{commons-logging.properties}} is > happening in the classpath with the HDFS JAR; this is triggering a D/L of the > JAR, which needs to force in commons-logging, and, as that's not inited yet, > NPEs -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21697) NPE & ExceptionInInitializerError trying to load UTF from HDFS
[ https://issues.apache.org/jira/browse/SPARK-21697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123193#comment-16123193 ] Steve Loughran commented on SPARK-21697: PS: right now, probably doesn't work at all > NPE & ExceptionInInitializerError trying to load UTF from HDFS > -- > > Key: SPARK-21697 > URL: https://issues.apache.org/jira/browse/SPARK-21697 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 > Environment: Spark Client mode, Hadoop 2.6.0 >Reporter: Steve Loughran >Priority: Minor > > Reported on [the > PR|https://github.com/apache/spark/pull/17342#issuecomment-321438157] for > SPARK-12868: trying to load a UDF of HDFS is triggering an > {{ExceptionInInitializerError}}, caused by an NPE which should only happen if > the commons-logging {{LOG}} log is null. > Hypothesis: the commons logging scan for {{commons-logging.properties}} is > happening in the classpath with the HDFS JAR; this is triggering a D/L of the > JAR, which needs to force in commons-logging, and, as that's not inited yet, > NPEs -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21697) NPE & ExceptionInInitializerError trying to load UTF from HDFS
[ https://issues.apache.org/jira/browse/SPARK-21697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16123063#comment-16123063 ] Steve Loughran commented on SPARK-21697: # I don't see anything which can be done in HDFS here; it's in the libraries below it. # Recent Hadoop releases => SLF4J, which *may* not have this problem. But as log4j looks for log4j.properties, and as dependent libraries may use commons-logging, there's no guarantee of that. What to do? # Classloader games: bring up the log infra then add the HDFS JARs to the CP. Maybe requires knowledge of what to force in before anything else. e.g: using new CP, do a stat of every JAR path, then inject them into the CP. Risky, as nobody really understands classpaths. # Force D/L the remote artifact to local temp FS before execution, as YARN does itself. Do it for HFDS, WASB, S3x, ..., all filesystems known by Hadoop FS. (side issue, is there a way to enumerate this? Probably not, except for merging the list of service-discovered entries and those with an {{fs.SCHEMA.imp}} entry. I think that' #2 is potentially the simplest and so most viable. It's not quite as elegant as saying "this is a supported URL you can directly use in the CP", but its the one that is going to avoid these problems > NPE & ExceptionInInitializerError trying to load UTF from HDFS > -- > > Key: SPARK-21697 > URL: https://issues.apache.org/jira/browse/SPARK-21697 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 > Environment: Spark Client mode, Hadoop 2.6.0 >Reporter: Steve Loughran >Priority: Minor > > Reported on [the > PR|https://github.com/apache/spark/pull/17342#issuecomment-321438157] for > SPARK-12868: trying to load a UDF of HDFS is triggering an > {{ExceptionInInitializerError}}, caused by an NPE which should only happen if > the commons-logging {{LOG}} log is null. > Hypothesis: the commons logging scan for {{commons-logging.properties}} is > happening in the classpath with the HDFS JAR; this is triggering a D/L of the > JAR, which needs to force in commons-logging, and, as that's not inited yet, > NPEs -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21697) NPE & ExceptionInInitializerError trying to load UTF from HDFS
[ https://issues.apache.org/jira/browse/SPARK-21697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122234#comment-16122234 ] Weiqing Yang commented on SPARK-21697: -- Thanks for filing this issue! > NPE & ExceptionInInitializerError trying to load UTF from HDFS > -- > > Key: SPARK-21697 > URL: https://issues.apache.org/jira/browse/SPARK-21697 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 > Environment: Spark Client mode, Hadoop 2.6.0 >Reporter: Steve Loughran >Priority: Minor > > Reported on [the > PR|https://github.com/apache/spark/pull/17342#issuecomment-321438157] for > SPARK-12868: trying to load a UDF of HDFS is triggering an > {{ExceptionInInitializerError}}, caused by an NPE which should only happen if > the commons-logging {{LOG}} log is null. > Hypothesis: the commons logging scan for {{commons-logging.properties}} is > happening in the classpath with the HDFS JAR; this is triggering a D/L of the > JAR, which needs to force in commons-logging, and, as that's not inited yet, > NPEs -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21697) NPE & ExceptionInInitializerError trying to load UTF from HDFS
[ https://issues.apache.org/jira/browse/SPARK-21697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122195#comment-16122195 ] Steve Loughran commented on SPARK-21697: {code} Have u tried it in yarn-client mode? i add this path in v2.1.1 + Hadoop 2.6.0, when i run "add jar" through SparkSQL CLI , it comes out this error: ERROR thriftserver.SparkSQLDriver: Failed in [add jar hdfs://SunshineNameNode3:8020/lib/clouddata-common-lib/chardet-0.0.1.jar] java.lang.ExceptionInInitializerError at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:889) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:947) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:369) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:341) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2107) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2076) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2052) at org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1274) at org.apache.hadoop.hive.ql.session.SessionState.resolveAndDownload(SessionState.java:1242) at org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1163) at org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1149) at org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:67) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:632) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:601) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:278) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:225) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:224) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:267) at org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:601) at org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:591) at org.apache.spark.sql.hive.client.HiveClientImpl.addJar(HiveClientImpl.scala:738) at org.apache.spark.sql.hive.HiveSessionState.addJar(HiveSessionState.scala:105) at org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at org.apache.spark.sql.Dataset.(Dataset.scala:185) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:335) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:247) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743) at