Hi Juan,

Thanks for your response. Dynamic allocation is already enabled. It's not
about resource allocation but how Spark schedule jobs and stages.

On Tue, Nov 13, 2018 at 10:19 PM Juan Carlos Garcia <[email protected]>
wrote:

> I suggest to play around with some spark configurations like: dynamic
> execution parameters
>
> https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation
>
>
>
> Am Mi., 14. Nov. 2018, 02:28 hat Jiayue Zhang (Bravo) <
> [email protected]> geschrieben:
>
>> Hi,
>>
>> I'm writing Java Beam code to run with both Dataflow and Spark. The input
>> files are tfrecord format and are from multiple directories. Java
>> TFRecordIO doesn't have readAll from list of files so what I'm doing is:
>>
>> for (String dir: listOfDirs) {
>>     p.apply(TFRecordIO.read().from(dir))
>>      .apply(ParDo.of(new BatchElements()))
>>      .apply(ParDo.of(new Process()))
>>      .apply(Combine.globally(new CombineResult()))
>>      .apply(TextIO.write().to(dir))
>> }
>>
>> These directories are fairly independent and I only need result of each
>> directory. When running on Dataflow, processing of these directories happen
>> concurrently. But when running with Spark, I saw the spark jobs and stages
>> are sequential. It needs finish all steps in one directory before moving to
>> next one. What's the way to make multiple transforms run concurrently with
>> SparkRunner?
>>
>

Reply via email to