Re: Writing output of key-value Pair RDD
Answering my own question. I filtered out the keys from the output file by overriding MultipleOutputFormat.generateActualKey to return the empty string. -- Nick class RDDMultipleTextOutputFormat extends MultipleTextOutputFormat { @Override protected String generateFileNameForKeyValue(String key, String value, String name) { return key; } @Override protected String generateActualKey(String key, String value) { return ""; } } From: Afshartous, Nick Sent: Thursday, May 5, 2016 3:35:17 PM To: Nicholas Chammas; user@spark.apache.org Subject: Re: Writing output of key-value Pair RDD Thanks, I got the example below working. Though it writes both the keys and values to the output file. Is there any way to write just the values ? -- Nick String[] strings = { "Abcd", "Azlksd", "whhd", "wasc", "aDxa" }; sc.parallelize(Arrays.asList(strings)) .mapToPair(pairFunction) .saveAsHadoopFile("s3://...", String.class, String.class, RDDMultipleTextOutputFormat.class); From: Nicholas Chammas Sent: Wednesday, May 4, 2016 4:21:12 PM To: Afshartous, Nick; user@spark.apache.org Subject: Re: Writing output of key-value Pair RDD You're looking for this discussion: http://stackoverflow.com/q/23995040/877069 Also, a simpler alternative with DataFrames: https://github.com/apache/spark/pull/8375#issuecomment-202458325 On Wed, May 4, 2016 at 4:09 PM Afshartous, Nick mailto:nafshart...@turbine.com>> wrote: Hi, Is there any way to write out to S3 the values of a f key-value Pair RDD ? I'd like each value of a pair to be written to its own file where the file name corresponds to the key name. Thanks, -- Nick
Re: Writing output of key-value Pair RDD
Thanks, I got the example below working. Though it writes both the keys and values to the output file. Is there any way to write just the values ? -- Nick String[] strings = { "Abcd", "Azlksd", "whhd", "wasc", "aDxa" }; sc.parallelize(Arrays.asList(strings)) .mapToPair(pairFunction) .saveAsHadoopFile("s3://...", String.class, String.class, RDDMultipleTextOutputFormat.class); From: Nicholas Chammas Sent: Wednesday, May 4, 2016 4:21:12 PM To: Afshartous, Nick; user@spark.apache.org Subject: Re: Writing output of key-value Pair RDD You're looking for this discussion: http://stackoverflow.com/q/23995040/877069 Also, a simpler alternative with DataFrames: https://github.com/apache/spark/pull/8375#issuecomment-202458325 On Wed, May 4, 2016 at 4:09 PM Afshartous, Nick mailto:nafshart...@turbine.com>> wrote: Hi, Is there any way to write out to S3 the values of a f key-value Pair RDD ? I'd like each value of a pair to be written to its own file where the file name corresponds to the key name. Thanks, -- Nick
Re: Writing output of key-value Pair RDD
You're looking for this discussion: http://stackoverflow.com/q/23995040/877069 Also, a simpler alternative with DataFrames: https://github.com/apache/spark/pull/8375#issuecomment-202458325 On Wed, May 4, 2016 at 4:09 PM Afshartous, Nick wrote: > Hi, > > > Is there any way to write out to S3 the values of a f key-value Pair RDD ? > > > I'd like each value of a pair to be written to its own file where the file > name corresponds to the key name. > > > Thanks, > > -- > > Nick >
Writing output of key-value Pair RDD
Hi, Is there any way to write out to S3 the values of a f key-value Pair RDD ? I'd like each value of a pair to be written to its own file where the file name corresponds to the key name. Thanks, -- Nick