Re: flink-ml algorithms

2022-06-06 Thread Natia Chachkhiani
Hi, I have another question. Is the implementation of kmeans in flink-ml same as Spark's StreamingKmeans? Should the accuracy/results from the same dataset be comparable between the two? On Sun, Jun 5, 2022 at 8:14 PM Natia Chachkhiani < natia.chachkhia...@gmail.com> wrote: > Thanks for the reply

Re: flink-ml algorithms

2022-06-05 Thread Natia Chachkhiani
Thanks for the reply Zhipeng and Jing. Running the OnlineKmeans with a fixed initial model removed the randomness! On Sun, Jun 5, 2022 at 6:19 PM Zhipeng Zhang wrote: > Hi Natia, > > As I understand, the processing order of onlineKmeans is the same the > input data. > > Are you running OnlineKm

Re: flink-ml algorithms

2022-06-05 Thread Zhipeng Zhang
Hi Natia, As I understand, the processing order of onlineKmeans is the same the input data. Are you running OnlineKmeans with using one data point with random initial KmeansModel? Could you use a fixed initial model following [1] and try out? [1] https://github.com/apache/flink-ml/blob/239788f2b

Re: flink-ml algorithms

2022-06-03 Thread Jing Ge
Hi, It seems like an evaluation with a small dataset. In this case, would you like to share your data sample and code? In addition, have you tried KMeans with the same dataset and got inconsistent results too? Best regards, Jing On Fri, Jun 3, 2022 at 4:29 AM Natia Chachkhiani < natia.chachkhia.

flink-ml algorithms

2022-06-02 Thread Natia Chachkhiani
Hi, I am running OnlineKmeans from flink-ml repo on a small dataset. I've noticed that I don't get consistent results, assignments to clusters, across different runs. I have set both parallelism and globalBatchSize to 1. I am doing simple fit and transform on each data point ingested. Is the order