I'd like to echo a question that was asked earlier this year:
If we do a global sort of a dataframe (with two columns: col_1, col_2) by
(col_1, col_2/desc) and then dropDuplicates on col_1, will it retain the
first row of each sorted group? i.e. Will it return the row with the
greatest value of co
Hi guys,
I'm having a problem where respawning a failed executor during a job that
reads/writes parquet on S3 causes subsequent tasks to fail because of
missing AWS keys.
Setup:
I'm using Spark 1.5.2 with Hadoop 2.7 and running experiments on a simple
standalone cluster:
1 master
2 workers
My