@Reynold Xin: not really: it only works for Parquet (see partitionBy:
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameWriter),
it requires you to have a DataFrame in the first place (for my use case the
spark sql interface to avro records is more of a hindera
This is already supported with the new partitioned data sources in
DataFrame/SQL right?
On Fri, Aug 14, 2015 at 8:04 AM, Alex Angelini
wrote:
> Speaking about Shopify's deployment, this would be a really nice to have
> feature.
>
> We would like to write data to folders with the structure
> `//
See: https://issues.apache.org/jira/browse/SPARK-3533
Feel free to comment there and make a case if you think the issue should be
reopened.
Nick
On Fri, Aug 14, 2015 at 11:11 AM Abhishek R. Singh <
abhis...@tetrationanalytics.com> wrote:
> A workaround would be to have multiple passes on the RD
A workaround would be to have multiple passes on the RDD and each pass write
its own output?
Or in a foreachPartition do it in a single pass (open up multiple files per
partition to write out)?
-Abhishek-
On Aug 14, 2015, at 7:56 AM, Silas Davis wrote:
> Would it be right to assume that the
Speaking about Shopify's deployment, this would be a really nice to have
feature.
We would like to write data to folders with the structure
`//` but have had to hold off on that because of the lack
of support for MultipleOutputs.
On Fri, Aug 14, 2015 at 10:56 AM, Silas Davis wrote:
> Would it b
Would it be right to assume that the silence on this topic implies others
don't really have this issue/desire?
On Sat, 18 Jul 2015 at 17:24 Silas Davis wrote:
> *tl;dr hadoop and cascading* *provide ways of writing tuples to multiple
> output files based on key, but the plain RDD interface doesn
*tl;dr hadoop and cascading* *provide ways of writing tuples to multiple
output files based on key, but the plain RDD interface doesn't seem to and
it should.*
I have been looking into ways to write to multiple outputs in Spark. It
seems like a feature that is somewhat missing from Spark.
The ide