Re: Density based Clustering in Mahout

2017-07-06 Thread Dmitriy Lyubimov
PS Maybe we should say, if you can provide kryo serialization, it can be assumed platform agnostic, and provide api for embedding that further. In practice all backends (except, I guess, H20 which is going extinct if not yet) currently support kryo, and the new potential ones could easily add it to

Re: Density based Clustering in Mahout

2017-07-06 Thread Dmitriy Lyubimov
On Thu, Jul 6, 2017 at 9:45 AM, Trevor Grant wrote: > To Dmitriy's point (2)- I think it is acceptable to create an R-Tree > structure, that will exist only within the algorithm for doing in-core > operations, (or maybe it lives slightly outside of the algorithm so we > don't need to recreate tre

Re: Density based Clustering in Mahout

2017-07-06 Thread Trevor Grant
To Dmitriy's point (2)- I think it is acceptable to create an R-Tree structure, that will exist only within the algorithm for doing in-core operations, (or maybe it lives slightly outside of the algorithm so we don't need to recreate trees for DBSCAN, Random Forrests, other tree-based algorithms- e

Re: Density based Clustering in Mahout

2017-07-05 Thread Dmitriy Lyubimov
PS i read a few papers, including i believe that of Google's, on partitioning of the DBScan problem for parallelization. It did not fit my purposes though as they inherently assumed that every cluster problems had enough centroids to figure to be efficiently partitioned. In effect it amounted to si

Re: Density based Clustering in Mahout

2017-07-05 Thread Dmitriy Lyubimov
(1) I abandoned any attempts at DBScan and implemented another density algorithm itself (can't say which, subject to patent restrictions). The reason being, i couldn't immediately figure how to parallelize it efficiently (aside from data structure discussions), the base algorithm is inherently iter

Re: Density based Clustering in Mahout

2017-07-05 Thread Aditya
***Important** **Do read** * Hello everyone, Trevor and I have been discussing as to how to effectively represent an R-Tree in Mahout. Turns out there is a method to represent a Binary Search Tree (BST) in the form of an ancestor matrix. This

Re: Density based Clustering in Mahout

2017-06-23 Thread Trevor Grant
What if you had Arrays of Matrices, or Arrays of Arrays of Matrices? (e.g. 3d and 4d tensors)? I implemented these for the MLPs (still WIP) https://github.com/apache/mahout/pull/323/files#diff-cd8a7c5e2cf42b91b5aa47c96daf19c0R25 But those functions were specifically to overcome the challenges y