Re: [Dev] [ML] Spark K-means clustering on KDD cup 99 dataset

2015-08-25 Thread Ashen Weerathunga
@Nirmal, okay i'll arange it today. @Mahesan Thanks for the suggestion. yes 100 must me too high for some cases. I thought that during 100 iterations most probably it will converge to stable clusters. Thats why I put 100. yes as cases like k = 100 it might be not enough. Thanks and ill try with

Re: [Dev] [ML] Spark K-means clustering on KDD cup 99 dataset

2015-08-25 Thread Nirmal Fernando
@Ashen let's have a code review today, if it's possible. @Srinath Forgot to mention that I've already given some feedback to Ashen, on how he could use Spark transformations effectively in his code. On Tue, Aug 25, 2015 at 4:33 PM, Ashen Weerathunga as...@wso2.com wrote: Okay sure. On Tue,

Re: [Dev] [ML] Spark K-means clustering on KDD cup 99 dataset

2015-08-25 Thread Ashen Weerathunga
Thanks all for the suggestions, There are few assumptions I have made, - Clusters are uniform - Fraud data always will be outliers to the normal clusters - Clusters are not intersect with each other - I have given the number of Iterations as 100. So I assume that 100 iterations

Re: [Dev] [ML] Spark K-means clustering on KDD cup 99 dataset

2015-08-25 Thread Srinath Perera
Nirmal, Seshika, shall we do a code review? This code should go into ML after UI part is done. Thanks Srinath On Tue, Aug 25, 2015 at 2:20 PM, Ashen Weerathunga as...@wso2.com wrote: Hi all, This is the source code of the project. https://github.com/ashensw/Spark-KMeans-fraud-detection

Re: [Dev] [ML] Spark K-means clustering on KDD cup 99 dataset

2015-08-25 Thread Nirmal Fernando
Sure. @Ashen, can you please arrange one? On Tue, Aug 25, 2015 at 2:35 PM, Srinath Perera srin...@wso2.com wrote: Nirmal, Seshika, shall we do a code review? This code should go into ML after UI part is done. Thanks Srinath On Tue, Aug 25, 2015 at 2:20 PM, Ashen Weerathunga as...@wso2.com

Re: [Dev] [ML] Spark K-means clustering on KDD cup 99 dataset

2015-08-25 Thread CD Athuraliya
Hi Ashen, It would be better if you can add the assumptions you make in this process (uniform clusters etc). It will make the process more clear IMO. Regards, CD On Tue, Aug 25, 2015 at 11:39 AM, Nirmal Fernando nir...@wso2.com wrote: Can we see the code too? On Tue, Aug 25, 2015 at 11:36

Re: [Dev] [ML] Spark K-means clustering on KDD cup 99 dataset

2015-08-25 Thread Maheshakya Wijewardena
Is there any particular reason why you are putting aside 65% of anomalous data at the evaluation? Since there is an obvious imbalance when the numbers of normal and abnormal cases are taken into account, you will get greater accuracy at the evaluation because a model tends to produce more accurate

Re: [Dev] [ML] Spark K-means clustering on KDD cup 99 dataset

2015-08-25 Thread Nirmal Fernando
Can we see the code too? On Tue, Aug 25, 2015 at 11:36 AM, Ashen Weerathunga as...@wso2.com wrote: Hi all, I am currently working on fraud detection project. I was able to cluster the KDD cup 99 network anomaly detection dataset using apache spark k means algorithm. So far I was able to

Re: [Dev] [ML] Spark K-means clustering on KDD cup 99 dataset

2015-08-25 Thread Ashen Weerathunga
Hi all, This is the source code of the project. https://github.com/ashensw/Spark-KMeans-fraud-detection Best Regards, Ashen On Tue, Aug 25, 2015 at 2:00 PM, Ashen Weerathunga as...@wso2.com wrote: Thanks all for the suggestions, There are few assumptions I have made, - Clusters are

Re: [Dev] [ML] Spark K-means clustering on KDD cup 99 dataset

2015-08-25 Thread Ashen Weerathunga
Okay sure. On Tue, Aug 25, 2015 at 3:55 PM, Nirmal Fernando nir...@wso2.com wrote: Sure. @Ashen, can you please arrange one? On Tue, Aug 25, 2015 at 2:35 PM, Srinath Perera srin...@wso2.com wrote: Nirmal, Seshika, shall we do a code review? This code should go into ML after UI part is