Re: Which OutputCommitter to use for S3?

2015-03-25 Thread Pei-Lun Lee
; >> >>> >> By setting --hadoop-major-version=2 when using the ec2 scripts, >>> >> everything worked fine. >>> >> >>> >> Darin. >>> >> >>> >> >>> >> - Original Message - >>>

Re: Which OutputCommitter to use for S3?

2015-03-16 Thread Pei-Lun Lee
> >> Just to close the loop in case anyone runs into the same problem I had. >> >> >> >> By setting --hadoop-major-version=2 when using the ec2 scripts, >> >> everything worked fine. >> >> >> >> Darin. >> >> >>

Re: Which OutputCommitter to use for S3?

2015-03-05 Thread Aaron Davidson
one runs into the same problem I had. > >> > >> By setting --hadoop-major-version=2 when using the ec2 scripts, > >> everything worked fine. > >> > >> Darin. > >> > >> > >> - Original Message - > >> From: Darin McBeath &

Re: Which OutputCommitter to use for S3?

2015-03-05 Thread Pei-Lun Lee
gt;> >at >> >> >org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.sc >> >ala:940) >> >at >> >> >org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.sc >> >ala:902) >> >at >> >> >org.apache.spark.a

Re: Which OutputCommitter to use for S3?

2015-02-26 Thread Thomas Demoor
ajor-version=2 when using the ec2 scripts, everything > worked fine. > > Darin. > > > - Original Message - > From: Darin McBeath > To: Mingyu Kim ; Aaron Davidson > Cc: "user@spark.apache.org" > Sent: Monday, February 23, 2015 3:16 PM > Subject

Re: Which OutputCommitter to use for S3?

2015-02-23 Thread Darin McBeath
nt: Monday, February 23, 2015 3:16 PM Subject: Re: Which OutputCommitter to use for S3? Thanks. I think my problem might actually be the other way around. I'm compiling with hadoop 2, but when I startup Spark, using the ec2 scripts, I don't specify a -hadoop-major-version and the

Re: Which OutputCommitter to use for S3?

2015-02-23 Thread Darin McBeath
I'll try it and post a response. - Original Message - From: Mingyu Kim To: Darin McBeath ; Aaron Davidson Cc: "user@spark.apache.org" Sent: Monday, February 23, 2015 3:06 PM Subject: Re: Which OutputCommitter to use for S3? Cool, we will start from there. Thanks Aaron

Re: Which OutputCommitter to use for S3?

2015-02-23 Thread Mingyu Kim
p.mapred.JobContext. > >Is there something obvious that I might be doing wrong (or messed up in >the translation from Scala to Java) or something I should look into? I'm >using Spark 1.2 with hadoop 2.4. > > >Thanks. > >Darin. > > >______________

Re: Which OutputCommitter to use for S3?

2015-02-23 Thread Darin McBeath
From: Aaron Davidson To: Andrew Ash Cc: Josh Rosen ; Mingyu Kim ; "user@spark.apache.org" ; Aaron Davidson Sent: Saturday, February 21, 2015 7:01 PM Subject: Re: Which OutputCommitter to use for S3? Here is the class: https://gist.github.com/aarond

Re: Which OutputCommitter to use for S3?

2015-02-21 Thread Aaron Davidson
one using >>> a special OutputCommitter for S3 can comment on this! >>> >>> Thanks, >>> Mingyu >>> >>> From: Mingyu Kim >>> Date: Monday, February 16, 2015 at 1:15 AM >>> To: "user@spark.apache.org" >>> Subject: W

Re: Which OutputCommitter to use for S3?

2015-02-21 Thread Andrew Ash
nyone using a >> special OutputCommitter for S3 can comment on this! >> >> Thanks, >> Mingyu >> >> From: Mingyu Kim >> Date: Monday, February 16, 2015 at 1:15 AM >> To: "user@spark.apache.org" >> Subject: Which OutputCommitter

Re: Which OutputCommitter to use for S3?

2015-02-20 Thread Josh Rosen
any response. It’d be really appreciated if anyone using a > special OutputCommitter for S3 can comment on this! > > Thanks, > Mingyu > > From: Mingyu Kim > Date: Monday, February 16, 2015 at 1:15 AM > To: "user@spark.apache.org" > Subject: Which OutputCommitt

Re: Which OutputCommitter to use for S3?

2015-02-20 Thread Mingyu Kim
ark.apache.org>" mailto:user@spark.apache.org>> Subject: Which OutputCommitter to use for S3? HI all, The default OutputCommitter used by RDD, which is FileOutputCommitter, seems to require moving files at the commit step, which is not a constant operation in S3, as discussed in http://mail-

Which OutputCommitter to use for S3?

2015-02-16 Thread Mingyu Kim
HI all, The default OutputCommitter used by RDD, which is FileOutputCommitter, seems to require moving files at the commit step, which is not a constant operation in S3, as discussed in http://mail-archives.apache.org/mod_mbox/spark-user/201410.mbox/%3c543e33fa.2000...@entropy.be%3E. People se