I was reading about Spark scheduler[1], and this line caught my attention *Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By “job”, in this section, we mean a Spark action (e.g. save, collect) and any tasks that need to run to evaluate that action. Spark’s scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users).*
If I understood the above statement, I think it is possible to have multiple jobs running parallel on a Spark application, as long as the *actions *are triggered by separate thread. I was trying to test this out on my Crunch Spark application(yarn-client) which reads two independent HDFS sources and perform *PCollection#getLenght() *on each source*. *The Spark WebUI starts with Job1 as submitted; after Job1 is completed Job2 is submitted and finished. I would like to get some thoughts on whether it is possible in Crunch to identify independent source/targets and possibly create separate threads that can interact with Spark scheduler? This way I think we can have some independent jobs running in parallel. Here is the example that I used https://gist.github.com/nasokan/7a0820411656f618f182 [1] https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application
