Re: best practice for paralleling model training

2017-01-24 Thread Jacek Laskowski
Hi Shiyuan, Re 1) Yes, but it has (almost) nothing to do with Spark since model1 = pipeline1.fit(df) is a blocking operation and therefore the following line will only be executed after this line has finished. Re 2) Use a concurrency library like Java's https://docs.oracle.com/javase/8/docs/api/j

best practice for paralleling model training

2017-01-24 Thread Shiyuan
Hi spark users, I am looking for a way to paralleling #A and #B in the code below. Since dataframe in spark is immutable, #A and #B are completely separated operations My question is: 1). As for spark 2.1, #B only starts when #A is completed. Is it right? 2). What's the best way to paralleli