Yes, the resulting matrix would be sparse. Thanks for the suggestion. Will
explore ways of doing this using an agg and UDF.
On Fri, Oct 30, 2020 at 6:26 AM Patrick McCarthy
wrote:
> That's a very large vector. Is it sparse? Perhaps you'd have better luck
> performing an aggregate instead of a pi
Spark distribute loads to executors and the executors are usually
pre-configured with the number of cores. You may want to check with
your Spark admin on how many executors (or slaves) your Spark cluster is
configured with and how many cores are pre-configured for executors.
The debugging too
That's a very large vector. Is it sparse? Perhaps you'd have better luck
performing an aggregate instead of a pivot, and assembling the vector using
a UDF.
On Thu, Oct 29, 2020 at 10:19 PM Daniel Chalef
wrote:
> Hello,
>
> I have a very large long-format dataframe (several billion rows) that I'd