Re: reading file from S3

2016-03-16 Thread Chris Miller
+1 for Sab's thoughtful answer...

Yasemin: As Gourav said, using IAM roles is considered best practice and
generally will give you fewer headaches in the end... but you may have a
reason for doing it the way you are, and certainly the way you posted
should be supported and not cause the error you described.

--
Chris Miller

On Tue, Mar 15, 2016 at 11:22 PM, Sabarish Sasidharan <
sabarish.sasidha...@manthan.com> wrote:

> There are many solutions to a problem.
>
> Also understand that sometimes your situation might be such. For ex what
> if you are accessing S3 from your Spark job running in your continuous
> integration server sitting in your data center or may be a box under your
> desk. And sometimes you are just trying something.
>
> Also understand that sometimes you want answers to solve your problem at
> hand without redirecting you to something else. Understand what you
> suggested is an appropriate way of doing it, which I myself have proposed
> before, but that doesn't solve the OP's problem at hand.
>
> Regards
> Sab
> On 15-Mar-2016 8:27 pm, "Gourav Sengupta" 
> wrote:
>
>> Oh!!! What the hell
>>
>> Please never use the URI
>>
>> *s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY.*That is a major cause of
>> pain, security issues, code maintenance issues and ofcourse something that
>> Amazon strongly suggests that we do not use. Please use roles and you will
>> not have to worry about security.
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Tue, Mar 15, 2016 at 2:38 PM, Sabarish Sasidharan <
>> sabarish@gmail.com> wrote:
>>
>>> You have a slash before the bucket name. It should be @.
>>>
>>> Regards
>>> Sab
>>> On 15-Mar-2016 4:03 pm, "Yasemin Kaya"  wrote:
>>>
 Hi,

 I am using Spark 1.6.0 standalone and I want to read a txt file from S3
 bucket named yasemindeneme and my file name is deneme.txt. But I am getting
 this error. Here is the simple code
 
 Exception in thread "main" java.lang.IllegalArgumentException: Invalid
 hostname in URI s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@
 /yasemindeneme/deneme.txt
 at
 org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:45)
 at
 org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:55)


 I try 2 options
 *sc.hadoopConfiguration() *and
 *sc.textFile("s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@/yasemindeneme/deneme.txt/");*

 Also I did export AWS_ACCESS_KEY_ID= .
  export AWS_SECRET_ACCESS_KEY=
 But there is no change about error.

 Could you please help me about this issue?


 --
 hiç ender hiç

>>>
>>


Re: reading file from S3

2016-03-16 Thread Yasemin Kaya
Hi,

Thanx a lot all, I understand my problem comes from *hadoop version* and I
move the spark 1.6.0 *hadoop 2.4 *version and there is no problem.

Best,
yasemin

2016-03-15 17:31 GMT+02:00 Gourav Sengupta :

> Once again, please use roles, there is no way that you have to specify the
> access keys in the URI under any situation. Please read Amazon
> documentation and they will say the same. The only situation when you use
> the access keys in URI is when you have not read the Amazon documentation :)
>
> Regards,
> Gourav
>
> On Tue, Mar 15, 2016 at 3:22 PM, Sabarish Sasidharan <
> sabarish.sasidha...@manthan.com> wrote:
>
>> There are many solutions to a problem.
>>
>> Also understand that sometimes your situation might be such. For ex what
>> if you are accessing S3 from your Spark job running in your continuous
>> integration server sitting in your data center or may be a box under your
>> desk. And sometimes you are just trying something.
>>
>> Also understand that sometimes you want answers to solve your problem at
>> hand without redirecting you to something else. Understand what you
>> suggested is an appropriate way of doing it, which I myself have proposed
>> before, but that doesn't solve the OP's problem at hand.
>>
>> Regards
>> Sab
>> On 15-Mar-2016 8:27 pm, "Gourav Sengupta" 
>> wrote:
>>
>>> Oh!!! What the hell
>>>
>>> Please never use the URI
>>>
>>> *s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY.*That is a major cause
>>> of pain, security issues, code maintenance issues and ofcourse something
>>> that Amazon strongly suggests that we do not use. Please use roles and you
>>> will not have to worry about security.
>>>
>>> Regards,
>>> Gourav Sengupta
>>>
>>> On Tue, Mar 15, 2016 at 2:38 PM, Sabarish Sasidharan <
>>> sabarish@gmail.com> wrote:
>>>
 You have a slash before the bucket name. It should be @.

 Regards
 Sab
 On 15-Mar-2016 4:03 pm, "Yasemin Kaya"  wrote:

> Hi,
>
> I am using Spark 1.6.0 standalone and I want to read a txt file from
> S3 bucket named yasemindeneme and my file name is deneme.txt. But I am
> getting this error. Here is the simple code
> 
> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
> hostname in URI s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@
> /yasemindeneme/deneme.txt
> at
> org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:45)
> at
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:55)
>
>
> I try 2 options
> *sc.hadoopConfiguration() *and
> *sc.textFile("s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@/yasemindeneme/deneme.txt/");*
>
> Also I did export AWS_ACCESS_KEY_ID= .
>  export AWS_SECRET_ACCESS_KEY=
> But there is no change about error.
>
> Could you please help me about this issue?
>
>
> --
> hiç ender hiç
>

>>>
>


-- 
hiç ender hiç


Re: reading file from S3

2016-03-15 Thread Gourav Sengupta
Once again, please use roles, there is no way that you have to specify the
access keys in the URI under any situation. Please read Amazon
documentation and they will say the same. The only situation when you use
the access keys in URI is when you have not read the Amazon documentation :)

Regards,
Gourav

On Tue, Mar 15, 2016 at 3:22 PM, Sabarish Sasidharan <
sabarish.sasidha...@manthan.com> wrote:

> There are many solutions to a problem.
>
> Also understand that sometimes your situation might be such. For ex what
> if you are accessing S3 from your Spark job running in your continuous
> integration server sitting in your data center or may be a box under your
> desk. And sometimes you are just trying something.
>
> Also understand that sometimes you want answers to solve your problem at
> hand without redirecting you to something else. Understand what you
> suggested is an appropriate way of doing it, which I myself have proposed
> before, but that doesn't solve the OP's problem at hand.
>
> Regards
> Sab
> On 15-Mar-2016 8:27 pm, "Gourav Sengupta" 
> wrote:
>
>> Oh!!! What the hell
>>
>> Please never use the URI
>>
>> *s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY.*That is a major cause of
>> pain, security issues, code maintenance issues and ofcourse something that
>> Amazon strongly suggests that we do not use. Please use roles and you will
>> not have to worry about security.
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Tue, Mar 15, 2016 at 2:38 PM, Sabarish Sasidharan <
>> sabarish@gmail.com> wrote:
>>
>>> You have a slash before the bucket name. It should be @.
>>>
>>> Regards
>>> Sab
>>> On 15-Mar-2016 4:03 pm, "Yasemin Kaya"  wrote:
>>>
 Hi,

 I am using Spark 1.6.0 standalone and I want to read a txt file from S3
 bucket named yasemindeneme and my file name is deneme.txt. But I am getting
 this error. Here is the simple code
 
 Exception in thread "main" java.lang.IllegalArgumentException: Invalid
 hostname in URI s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@
 /yasemindeneme/deneme.txt
 at
 org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:45)
 at
 org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:55)


 I try 2 options
 *sc.hadoopConfiguration() *and
 *sc.textFile("s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@/yasemindeneme/deneme.txt/");*

 Also I did export AWS_ACCESS_KEY_ID= .
  export AWS_SECRET_ACCESS_KEY=
 But there is no change about error.

 Could you please help me about this issue?


 --
 hiç ender hiç

>>>
>>


Re: reading file from S3

2016-03-15 Thread Sabarish Sasidharan
There are many solutions to a problem.

Also understand that sometimes your situation might be such. For ex what if
you are accessing S3 from your Spark job running in your continuous
integration server sitting in your data center or may be a box under your
desk. And sometimes you are just trying something.

Also understand that sometimes you want answers to solve your problem at
hand without redirecting you to something else. Understand what you
suggested is an appropriate way of doing it, which I myself have proposed
before, but that doesn't solve the OP's problem at hand.

Regards
Sab
On 15-Mar-2016 8:27 pm, "Gourav Sengupta"  wrote:

> Oh!!! What the hell
>
> Please never use the URI
>
> *s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY.*That is a major cause of
> pain, security issues, code maintenance issues and ofcourse something that
> Amazon strongly suggests that we do not use. Please use roles and you will
> not have to worry about security.
>
> Regards,
> Gourav Sengupta
>
> On Tue, Mar 15, 2016 at 2:38 PM, Sabarish Sasidharan <
> sabarish@gmail.com> wrote:
>
>> You have a slash before the bucket name. It should be @.
>>
>> Regards
>> Sab
>> On 15-Mar-2016 4:03 pm, "Yasemin Kaya"  wrote:
>>
>>> Hi,
>>>
>>> I am using Spark 1.6.0 standalone and I want to read a txt file from S3
>>> bucket named yasemindeneme and my file name is deneme.txt. But I am getting
>>> this error. Here is the simple code
>>> 
>>> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
>>> hostname in URI s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@
>>> /yasemindeneme/deneme.txt
>>> at
>>> org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:45)
>>> at
>>> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:55)
>>>
>>>
>>> I try 2 options
>>> *sc.hadoopConfiguration() *and
>>> *sc.textFile("s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@/yasemindeneme/deneme.txt/");*
>>>
>>> Also I did export AWS_ACCESS_KEY_ID= .
>>>  export AWS_SECRET_ACCESS_KEY=
>>> But there is no change about error.
>>>
>>> Could you please help me about this issue?
>>>
>>>
>>> --
>>> hiç ender hiç
>>>
>>
>


Re: reading file from S3

2016-03-15 Thread Gourav Sengupta
Oh!!! What the hell

Please never use the URI

*s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY.*That is a major cause of
pain, security issues, code maintenance issues and ofcourse something that
Amazon strongly suggests that we do not use. Please use roles and you will
not have to worry about security.

Regards,
Gourav Sengupta

On Tue, Mar 15, 2016 at 2:38 PM, Sabarish Sasidharan  wrote:

> You have a slash before the bucket name. It should be @.
>
> Regards
> Sab
> On 15-Mar-2016 4:03 pm, "Yasemin Kaya"  wrote:
>
>> Hi,
>>
>> I am using Spark 1.6.0 standalone and I want to read a txt file from S3
>> bucket named yasemindeneme and my file name is deneme.txt. But I am getting
>> this error. Here is the simple code
>> 
>> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
>> hostname in URI s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@
>> /yasemindeneme/deneme.txt
>> at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:45)
>> at
>> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:55)
>>
>>
>> I try 2 options
>> *sc.hadoopConfiguration() *and
>> *sc.textFile("s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@/yasemindeneme/deneme.txt/");*
>>
>> Also I did export AWS_ACCESS_KEY_ID= .
>>  export AWS_SECRET_ACCESS_KEY=
>> But there is no change about error.
>>
>> Could you please help me about this issue?
>>
>>
>> --
>> hiç ender hiç
>>
>


Re: reading file from S3

2016-03-15 Thread Sabarish Sasidharan
You have a slash before the bucket name. It should be @.

Regards
Sab
On 15-Mar-2016 4:03 pm, "Yasemin Kaya"  wrote:

> Hi,
>
> I am using Spark 1.6.0 standalone and I want to read a txt file from S3
> bucket named yasemindeneme and my file name is deneme.txt. But I am getting
> this error. Here is the simple code
> 
> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
> hostname in URI s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@
> /yasemindeneme/deneme.txt
> at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:45)
> at
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:55)
>
>
> I try 2 options
> *sc.hadoopConfiguration() *and
> *sc.textFile("s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@/yasemindeneme/deneme.txt/");*
>
> Also I did export AWS_ACCESS_KEY_ID= .
>  export AWS_SECRET_ACCESS_KEY=
> But there is no change about error.
>
> Could you please help me about this issue?
>
>
> --
> hiç ender hiç
>


Re: reading file from S3

2016-03-15 Thread Gourav Sengupta
Hi,

Try starting your clusters with roles, and you will not have to configure,
hard code anything at all.

Let me know in case you need any help with this.


Regards,
Gourav Sengupta

On Tue, Mar 15, 2016 at 11:32 AM, Yasemin Kaya  wrote:

> Hi Safak,
>
> I changed the Keys but there is no change.
>
> Best,
> yasemin
>
>
> 2016-03-15 12:46 GMT+02:00 Şafak Serdar Kapçı :
>
>> Hello Yasemin,
>> Maybe your key id or access key has special chars like backslash or
>> something. You need to change it.
>> Best Regards,
>> Safak.
>>
>> 2016-03-15 12:33 GMT+02:00 Yasemin Kaya :
>>
>>> Hi,
>>>
>>> I am using Spark 1.6.0 standalone and I want to read a txt file from S3
>>> bucket named yasemindeneme and my file name is deneme.txt. But I am getting
>>> this error. Here is the simple code
>>> 
>>> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
>>> hostname in URI s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@
>>> /yasemindeneme/deneme.txt
>>> at
>>> org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:45)
>>> at
>>> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:55)
>>>
>>>
>>> I try 2 options
>>> *sc.hadoopConfiguration() *and
>>> *sc.textFile("s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@/yasemindeneme/deneme.txt/");*
>>>
>>> Also I did export AWS_ACCESS_KEY_ID= .
>>>  export AWS_SECRET_ACCESS_KEY=
>>> But there is no change about error.
>>>
>>> Could you please help me about this issue?
>>>
>>>
>>> --
>>> hiç ender hiç
>>>
>>
>>
>
>
> --
> hiç ender hiç
>


Re: reading file from S3

2016-03-15 Thread Yasemin Kaya
Hi Safak,

I changed the Keys but there is no change.

Best,
yasemin


2016-03-15 12:46 GMT+02:00 Şafak Serdar Kapçı :

> Hello Yasemin,
> Maybe your key id or access key has special chars like backslash or
> something. You need to change it.
> Best Regards,
> Safak.
>
> 2016-03-15 12:33 GMT+02:00 Yasemin Kaya :
>
>> Hi,
>>
>> I am using Spark 1.6.0 standalone and I want to read a txt file from S3
>> bucket named yasemindeneme and my file name is deneme.txt. But I am getting
>> this error. Here is the simple code
>> 
>> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
>> hostname in URI s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@
>> /yasemindeneme/deneme.txt
>> at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:45)
>> at
>> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:55)
>>
>>
>> I try 2 options
>> *sc.hadoopConfiguration() *and
>> *sc.textFile("s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@/yasemindeneme/deneme.txt/");*
>>
>> Also I did export AWS_ACCESS_KEY_ID= .
>>  export AWS_SECRET_ACCESS_KEY=
>> But there is no change about error.
>>
>> Could you please help me about this issue?
>>
>>
>> --
>> hiç ender hiç
>>
>
>


-- 
hiç ender hiç


Re: reading file from S3

2016-03-15 Thread Şafak Serdar Kapçı
Hello Yasemin,
Maybe your key id or access key has special chars like backslash or
something. You need to change it.
Best Regards,
Safak.

2016-03-15 12:33 GMT+02:00 Yasemin Kaya :

> Hi,
>
> I am using Spark 1.6.0 standalone and I want to read a txt file from S3
> bucket named yasemindeneme and my file name is deneme.txt. But I am getting
> this error. Here is the simple code
> 
> Exception in thread "main" java.lang.IllegalArgumentException: Invalid
> hostname in URI s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@
> /yasemindeneme/deneme.txt
> at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:45)
> at
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:55)
>
>
> I try 2 options
> *sc.hadoopConfiguration() *and
> *sc.textFile("s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@/yasemindeneme/deneme.txt/");*
>
> Also I did export AWS_ACCESS_KEY_ID= .
>  export AWS_SECRET_ACCESS_KEY=
> But there is no change about error.
>
> Could you please help me about this issue?
>
>
> --
> hiç ender hiç
>


reading file from S3

2016-03-15 Thread Yasemin Kaya
Hi,

I am using Spark 1.6.0 standalone and I want to read a txt file from S3
bucket named yasemindeneme and my file name is deneme.txt. But I am getting
this error. Here is the simple code

Exception in thread "main" java.lang.IllegalArgumentException: Invalid
hostname in URI s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@
/yasemindeneme/deneme.txt
at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:45)
at
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:55)


I try 2 options
*sc.hadoopConfiguration() *and
*sc.textFile("s3n://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@/yasemindeneme/deneme.txt/");*

Also I did export AWS_ACCESS_KEY_ID= .
 export AWS_SECRET_ACCESS_KEY=
But there is no change about error.

Could you please help me about this issue?


-- 
hiç ender hiç


Re: Reading file from S3, facing java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException

2015-06-15 Thread shahab
Thanks Akhil, it solved the problem.

best
/Shahab

On Fri, Jun 12, 2015 at 8:50 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Looks like your spark is not able to pick up the HADOOP_CONF. To fix this,
 you can actually add jets3t-0.9.0.jar to the classpath
 (sc.addJar(/path/to/jets3t-0.9.0.jar).

 Thanks
 Best Regards

 On Thu, Jun 11, 2015 at 6:44 PM, shahab shahab.mok...@gmail.com wrote:

 Hi,

 I tried to read a csv file from amazon s3, but I get the following
 exception which I have no clue how to solve this. I tried both spark 1.3.1
 and 1.2.1, but no success.  Any idea how to solve this is appreciated.


 best,
 /Shahab

 the code:

 val hadoopConf=sc.hadoopConfiguration;

  hadoopConf.set(fs.s3.impl,
 org.apache.hadoop.fs.s3native.NativeS3FileSystem)

  hadoopConf.set(fs.s3.awsAccessKeyId, aws_access_key_id)

  hadoopConf.set(fs.s3.awsSecretAccessKey, aws_secret_access_key)

  val csv = sc.textFile(s3n://mybucket/info.csv)  // original file

  val data = csv.map(line = line.split(,).map(elem = elem.trim)) //lines
 in rows


 Here is the exception I faced:

 Exception in thread main java.lang.NoClassDefFoundError:
 org/jets3t/service/ServiceException

 at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(
 NativeS3FileSystem.java:280)

 at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(
 NativeS3FileSystem.java:270)

 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)

 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)

 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431
 )

 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)

 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)

 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)

 at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(
 FileInputFormat.java:256)

 at org.apache.hadoop.mapred.FileInputFormat.listStatus(
 FileInputFormat.java:228)

 at org.apache.hadoop.mapred.FileInputFormat.getSplits(
 FileInputFormat.java:304)

 at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
 MapPartitionsRDD.scala:32)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
 MapPartitionsRDD.scala:32)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512)

 at org.apache.spark.rdd.RDD.count(RDD.scala:1006)





Re: Reading file from S3, facing java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException

2015-06-12 Thread Akhil Das
Looks like your spark is not able to pick up the HADOOP_CONF. To fix this,
you can actually add jets3t-0.9.0.jar to the classpath
(sc.addJar(/path/to/jets3t-0.9.0.jar).

Thanks
Best Regards

On Thu, Jun 11, 2015 at 6:44 PM, shahab shahab.mok...@gmail.com wrote:

 Hi,

 I tried to read a csv file from amazon s3, but I get the following
 exception which I have no clue how to solve this. I tried both spark 1.3.1
 and 1.2.1, but no success.  Any idea how to solve this is appreciated.


 best,
 /Shahab

 the code:

 val hadoopConf=sc.hadoopConfiguration;

  hadoopConf.set(fs.s3.impl,
 org.apache.hadoop.fs.s3native.NativeS3FileSystem)

  hadoopConf.set(fs.s3.awsAccessKeyId, aws_access_key_id)

  hadoopConf.set(fs.s3.awsSecretAccessKey, aws_secret_access_key)

  val csv = sc.textFile(s3n://mybucket/info.csv)  // original file

  val data = csv.map(line = line.split(,).map(elem = elem.trim)) //lines
 in rows


 Here is the exception I faced:

 Exception in thread main java.lang.NoClassDefFoundError:
 org/jets3t/service/ServiceException

 at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(
 NativeS3FileSystem.java:280)

 at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(
 NativeS3FileSystem.java:270)

 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)

 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)

 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)

 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)

 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)

 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)

 at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(
 FileInputFormat.java:256)

 at org.apache.hadoop.mapred.FileInputFormat.listStatus(
 FileInputFormat.java:228)

 at org.apache.hadoop.mapred.FileInputFormat.getSplits(
 FileInputFormat.java:304)

 at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
 MapPartitionsRDD.scala:32)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
 MapPartitionsRDD.scala:32)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 at scala.Option.getOrElse(Option.scala:120)

 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512)

 at org.apache.spark.rdd.RDD.count(RDD.scala:1006)



Reading file from S3, facing java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException

2015-06-11 Thread shahab
Hi,

I tried to read a csv file from amazon s3, but I get the following
exception which I have no clue how to solve this. I tried both spark 1.3.1
and 1.2.1, but no success.  Any idea how to solve this is appreciated.


best,
/Shahab

the code:

val hadoopConf=sc.hadoopConfiguration;

 hadoopConf.set(fs.s3.impl,
org.apache.hadoop.fs.s3native.NativeS3FileSystem)

 hadoopConf.set(fs.s3.awsAccessKeyId, aws_access_key_id)

 hadoopConf.set(fs.s3.awsSecretAccessKey, aws_secret_access_key)

 val csv = sc.textFile(s3n://mybucket/info.csv)  // original file

 val data = csv.map(line = line.split(,).map(elem = elem.trim)) //lines
in rows


Here is the exception I faced:

Exception in thread main java.lang.NoClassDefFoundError:
org/jets3t/service/ServiceException

at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(
NativeS3FileSystem.java:280)

at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(
NativeS3FileSystem.java:270)

at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)

at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)

at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)

at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)

at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)

at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(
FileInputFormat.java:256)

at org.apache.hadoop.mapred.FileInputFormat.listStatus(
FileInputFormat.java:228)

at org.apache.hadoop.mapred.FileInputFormat.getSplits(
FileInputFormat.java:304)

at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
MapPartitionsRDD.scala:32)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
MapPartitionsRDD.scala:32)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

at org.apache.spark.SparkContext.runJob(SparkContext.scala:1512)

at org.apache.spark.rdd.RDD.count(RDD.scala:1006)