Hi Dan,
first a general remark: I fear that your L-BFGS implementation is not well
suited for large scale problems. You might wanna take a look at [1].
In the case of the while loop solution you're actually executing n jobs
with n being the number of iterations. Thus, you have to add the
Hi,
I am not broadcasting the data but the model, i.e. the weight vector
contained in the "State".
You are right, it would be better for the implementation with the while
loop to have the data on HDFS. But that's exactly the point of my
question: Why are the Flink Iterations not faster if you
Hello Dan,
are you broadcasting the 85GB of data then? I don't get why you wouldn't
store that file on HDFS so it's accessible by your workers.
If you have the full code available somewhere we might be able to help
better.
For L-BFGS you should only be broadcasting the model (i.e. the weight
Hi Dan,
Where are you reading the 200 GB "data" from? How much memory per node? If
the DataSet is read from a distributed filesystem and if with iterations
Flink must spill to disk then I wouldn't expect much difference. About how
many iterations are run in the 30 minutes? I don't know that this