Re: Creating RDD with key and Subkey

2015-08-19 Thread Silas Davis
This should be sent to the user mailing list, I think. It depends what you want to do with the RDD, so yes you could throw around (String, HashMap>) tuples or perhaps you'd like to be able to groupByKey, reduceByKey on the key and sub-key as a composite in which case JavaPairRDD, List> might be mo

Re: Writing to multiple outputs in Spark

2015-08-17 Thread Silas Davis
t; feature. >> >> We would like to write data to folders with the structure >> `//` but have had to hold off on that because of the lack >> of support for MultipleOutputs. >> >> On Fri, Aug 14, 2015 at 10:56 AM, Silas Davis >> wrote: >> >>> W

Re: Writing to multiple outputs in Spark

2015-08-14 Thread Silas Davis
Would it be right to assume that the silence on this topic implies others don't really have this issue/desire? On Sat, 18 Jul 2015 at 17:24 Silas Davis wrote: > *tl;dr hadoop and cascading* *provide ways of writing tuples to multiple > output files based on key, but the plain RD

Writing to multiple outputs in Spark

2015-07-18 Thread Silas Davis
*tl;dr hadoop and cascading* *provide ways of writing tuples to multiple output files based on key, but the plain RDD interface doesn't seem to and it should.* I have been looking into ways to write to multiple outputs in Spark. It seems like a feature that is somewhat missing from Spark. The ide