I have a spark job that's running on a 10 node cluster and the python
process on all the nodes is pegged at 100%.
I was wondering what parts of a spark script are run in the python process
and which get passed to the Java processes? Is there any documentation on
this?
Thanks,
Justin
Hi all,
I'm running Spark on AWS EMR and I'm having some issues getting the correct
permissions on the output files using
rdd.saveAsTextFile('file_dir_name'). In hive, I would add a line in the
beginning of the script with
set fs.s3.canned.acl=BucketOwnerFullControl
and that would set the