Hi Cheng,
Are you saying that by setting up the lineage
schemaRdd.keyBy(_.getString(1)).partitionBy(new
HashPartitioner(n)).values.applySchema(schema)
then Spark SQL will know that an SQL “group by” on Customer Code will not have
to shuffle?
But the prepared will have already shuffled so we
to discuss?
Thanks
Mick
On 30 Dec 2014, at 17:40, Michael Davies michael.belldav...@gmail.com wrote:
Hi Michael,
I’ve looked through the example and the test cases and I think I understand
what we need to do - so I’ll give it a go.
I think what I’d like to try to do is allow files
Hi Michael,
I’ve looked through the example and the test cases and I think I understand
what we need to do - so I’ll give it a go.
I think what I’d like to try to do is allow files to be added at anytime, so
perhaps I can cache partition info, and also what may be useful for us would be
to