Re: org.apache.hadoop.fs.FileSystem: Provider tachyon.hadoop.TFS could not be instantiated

2017-05-07 Thread Rohit Karlupia
Last time I checked, this happens only with Spark < 2.0.0. The reason
is ServiceLoader
used for loading all fileSystems from the classpath. In pre Spark < 2.0.0
tachyon.hadoop.TFS was packaged with Spark distribution and gets loaded
irrespective of it being used or not.  Moving to Spark 2.0.0+ will be the
simplest way out.

The NoClassDefFoundError is sort of distraction here because the real
reason is not able to instantiate this class. This happens because of calls
to find local ip. Check if DNS is working properly. This error is usually
not easily reproducible.

thanks,
rohitk


On Fri, May 5, 2017 at 2:23 PM, Jone Zhang  wrote:

> *When i use sparksql, the error as follows*
>
> 17/05/05 15:58:44 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 20.0 
> (TID 4080, 10.196.143.233): java.util.ServiceConfigurationError: 
> org.apache.hadoop.fs.FileSystem: Provider tachyon.hadoop.TFS could not be 
> instantiated
>   at java.util.ServiceLoader.fail(ServiceLoader.java:224)
>   at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
>   at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)
>   at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
>   at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2558)
>   at 
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2569)
>   at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2586)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:365)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
>   at 
> org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:654)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:436)
>   at 
> org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:321)
>   at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:276)
>   at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:276)
>   at 
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
>   at 
> org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
>   at scala.Option.map(Option.scala:145)
>   at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
>   at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:212)
>   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
>   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
> tachyon.hadoop.TFS
>   at sun.reflect.Nativ

org.apache.hadoop.fs.FileSystem: Provider tachyon.hadoop.TFS could not be instantiated

2017-05-05 Thread Jone Zhang
*When i use sparksql, the error as follows*

17/05/05 15:58:44 WARN scheduler.TaskSetManager: Lost task 0.0 in
stage 20.0 (TID 4080, 10.196.143.233):
java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem:
Provider tachyon.hadoop.TFS could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:224)
at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)
at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2558)
at 
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2569)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2586)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:365)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
at 
org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:654)
at 
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:436)
at 
org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:321)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:276)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:276)
at 
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at 
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:212)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class
tachyon.hadoop.TFS
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at java.lang.Class.newInstance(Class.java:379)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
... 45 more

17/05/05 15:58:44 INFO cluster.YarnClusterScheduler: Removed TaskSet
20.0, whose tasks have all completed, from pool


I did not use tachyon directly.


*Grateful for any idea!*