Vipul, Thanks for your feedback. As far as I understand, mean RDD[(Double, Double)] (note the parenthesis), and each of these Double values is supposed to contain one coordinate of a point. It limits us to 2-dimensional space, which is not suitable for many tasks. I want the algorithm to be able to work in multidimensional space. Actually, there is a class org.alitouka.spark.dbscan.spatial.Point in my code, which represents a point with an arbitrary number of coordinates.
IOHelper.readDataset is just a convenience method which reads a CSV file and returns an RDD of Points (more precisely, it returns a value of type RawDataset, which is just an alias for RDD[Point]). If your data is stored in a format other than CSV, you will have to write your own code to convert your data to RawDataset. I can add support for other data formats in future versions. As for other distance measures - it is a high priority issue in my list ;) On Thu, Jun 12, 2014 at 6:02 PM, Vipul Pandey <vipan...@gmail.com> wrote: > Great! I was going to implement one of my own - but I may not need to do > that any more :) > I haven't had a chance to look deep into your code but I would recommend > accepting an RDD[Double,Double] as well, instead of just a file. > > val data = IOHelper.readDataset(sc, "/path/to/my/data.csv") > > And other distance measures ofcourse. > > Thanks, > Vipul > > > > > On Jun 12, 2014, at 2:31 PM, Aliaksei Litouka <aliaksei.lito...@gmail.com> > wrote: > > Hi. > I'm not sure if messages like this are appropriate in this list; I just > want to share with you an application I am working on. This is my personal > project which I started to learn more about Spark and Scala, and, if it > succeeds, to contribute it to the Spark community. > > Maybe someone will find it useful. Or maybe someone will want to join > development. > > The application is available at https://github.com/alitouka/spark_dbscan > > Any questions, comments, suggestions, as well as criticism are welcome :) > > Best regards, > Aliaksei Litouka > > >