Re: Flink Iterations vs. While loop

2016-09-07 Thread Till Rohrmann
Hi Dan, first a general remark: I fear that your L-BFGS implementation is not well suited for large scale problems. You might wanna take a look at [1]. In the case of the while loop solution you're actually executing n jobs with n being the number of iterations. Thus, you have to add the

Re: Flink Iterations vs. While loop

2016-09-06 Thread Dan Drewes
Hi, I am not broadcasting the data but the model, i.e. the weight vector contained in the "State". You are right, it would be better for the implementation with the while loop to have the data on HDFS. But that's exactly the point of my question: Why are the Flink Iterations not faster if you

Re: Flink Iterations vs. While loop

2016-09-05 Thread Theodore Vasiloudis
Hello Dan, are you broadcasting the 85GB of data then? I don't get why you wouldn't store that file on HDFS so it's accessible by your workers. If you have the full code available somewhere we might be able to help better. For L-BFGS you should only be broadcasting the model (i.e. the weight

Re: Flink Iterations vs. While loop

2016-09-02 Thread Greg Hogan
Hi Dan, Where are you reading the 200 GB "data" from? How much memory per node? If the DataSet is read from a distributed filesystem and if with iterations Flink must spill to disk then I wouldn't expect much difference. About how many iterations are run in the 30 minutes? I don't know that this