Re: Force inner join to shuffle the smallest table

2015-06-24 Thread Stephen Carman
Have you tried shuffle compression? spark.shuffle.compress (true|false) if you have a filesystem capable also I’ve noticed file consolidation helps disk usage a bit. spark.shuffle.consolidateFiles (true|false) Steve On Jun 24, 2015, at 3:27 PM, Ulanov, Alexander

s3 vfs on Mesos Slaves

2015-05-12 Thread Stephen Carman
We have a small mesos cluster and these slaves need to have a vfs setup on them so that the slaves can pull down the data they need from S3 when spark runs. There doesn’t seem to be any obvious way online on how to do this or how easily accomplish this. Does anyone have some best practices or

Re: Tungsten + Flink

2015-05-01 Thread Stephen Carman
I think as long as the two frameworks follow the same paradigm for how their interfaces work it’s fine to have 2 competing frameworks. This way the frameworks have some motivation to be the best at what they do rather than being the only choice whether you like it or not. They also seem to have