Looks like your spark is not able to pick up the HADOOP_CONF. To fix this,
you can actually add jets3t-0.9.0.jar to the classpath
(sc.addJar(/path/to/jets3t-0.9.0.jar).

Thanks
Best Regards

On Thu, Jun 11, 2015 at 6:44 PM, shahab <shahab.mok...@gmail.com> wrote:

> Hi,
>
> I tried to read a csv file from amazon s3, but I get the following
> exception which I have no clue how to solve this. I tried both spark 1.3.1
> and 1.2.1, but no success.  Any idea how to solve this is appreciated.
>
>
> best,
> /Shahab
>
> the code:
>
> val hadoopConf=sc.hadoopConfiguration;
>
>  hadoopConf.set("fs.s3.impl",
> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>
>  hadoopConf.set("fs.s3.awsAccessKeyId", aws_access_key_id)
>
>  hadoopConf.set("fs.s3.awsSecretAccessKey", aws_secret_access_key)
>
>  val csv = sc.textFile(""s3n://mybucket/info.csv")  // original file
>
>  val data = csv.map(line => line.split(",").map(elem => elem.trim)) //lines
> in rows
>
>
> Here is the exception I faced:
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/jets3t/service/ServiceException
>
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(
> NativeS3FileSystem.java:280)
>
> at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(
> NativeS3FileSystem.java:270)
>
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)
>
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
>
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
>
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
>
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
>
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
>
> at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(
> FileInputFormat.java:256)
>
> at org.apache.hadoop.mapred.FileInputFormat.listStatus(
> FileInputFormat.java:228)
>
> at org.apache.hadoop.mapred.FileInputFormat.getSplits(
> FileInputFormat.java:304)
>
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
>
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>
> at scala.Option.getOrElse(Option.scala:120)
>
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>
> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
> MapPartitionsRDD.scala:32)
>
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>
> at scala.Option.getOrElse(Option.scala:120)
>
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>
> at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
> MapPartitionsRDD.scala:32)
>
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>
> at scala.Option.getOrElse(Option.scala:120)
>
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512)
>
> at org.apache.spark.rdd.RDD.count(RDD.scala:1006)
>

Reply via email to