Hi, I have a fairly complex Crunch pipeline with many joins and multiple inputs and outputs. I've been using the MRPipeline on AWS with EMR/Hadoop successfully, but was curious to try out the SparkPipeline.
I'm using Crunch 0.12.0 and tried Spark 1.4.0 with 25 core instances. Spark seemed to successfully run one small part of the pipeline, but then stalled, showing that all submitted jobs had succeeded, but that only 16 jobs had been submitted. It never terminated, but all the workers seemed idle. Has anyone seen something like that before? Is there a configuration parameter that controls how many jobs Crunch will submit to Spark? Thanks! - Everett -- *DISCLAIMER:* The contents of this email, including any attachments, may contain information that is confidential, proprietary in nature, protected health information (PHI), or otherwise protected by law from disclosure, and is solely for the use of the intended recipient(s). If you are not the intended recipient, you are hereby notified that any use, disclosure or copying of this email, including any attachments, is unauthorized and strictly prohibited. If you have received this email in error, please notify the sender of this email. Please delete this and all copies of this email from your system. Any opinions either expressed or implied in this email and all attachments, are those of its author only, and do not necessarily reflect those of Nuna Health, Inc.
