As a caveat, a lot of clustering algorithms assume that the distance matrix is a proper metric. If your distance is not a proper metric then the results might be meaningless (the narrative docs do a good job of discussing this).
On Mon, 12 Feb 2018 at 13:30 prince gosavi <princegosav...@gmail.com> wrote: > Hi, > Thanks for those tips Sebastian.That just saved my day. > > Regards, > Rajkumar > > On Tue, Feb 13, 2018 at 12:44 AM, Sebastian Raschka <se.rasc...@gmail.com> > wrote: > >> [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible >> for Automatic Cleanup! (se.rasc...@gmail.com) Add cleanup rule >> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3D0a2mz6HiALxmseA8EtEa3hg8FtAfQyTwNzLAvbS3JOk%253D%26token%3D8qZlnKU2OJ%252BeTscNUfA9PjpDKa2%252FZO8i9dvKkAyr7bKz%252Bi2MdFTFnLILfmhv4s3s%252Bva0Dy7LpRz63wO18BlP48DNIu3aSb%252FmxAVjQq1fCD0tDxFcxxdH2mq9Otany%252FdER3CzXyokyLg%253D&tc_serial=36653890807&tc_rand=854549477&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001> >> | More info >> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=36653890807&tc_rand=854549477&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001> >> > >> Hi, >> >> by default, the clustering classes from sklearn, (e.g., DBSCAN), take an >> [num_examples, num_features] array as input, but you can also provide the >> distance matrix directly, e.g., by instantiating it with >> metric='precomputed' >> >> my_dbscan = DBSCAN(..., metric='precomputed') >> my_dbscan.fit(my_distance_matrix) >> >> Not sure if it helps in that particular case (depending on how many zero >> elements you have), you can also use a sparse matrix in CSR format ( >> https://docs.scipy.org/doc/scipy-1.0.0/reference/generated/scipy.sparse.csr_matrix.html >> ). >> >> Also, you don't need to for-loop through the rows if you want to compute >> the pair-wise distances, you can simply do that on the complete array. E.g., >> >> from sklearn.metrics.pairwise import cosine_distances >> from scipy import sparse >> >> distance_matrix = cosine_distances(sparse.csr_matrix(X), >> dense_output=False) >> >> where X is your "[num_examples, num_features]" array. >> >> Best, >> Sebastian >> >> >> > On Feb 12, 2018, at 1:10 PM, prince gosavi <princegosav...@gmail.com> >> wrote: >> > >> > > I have generated a cosine distance matrix and would like to apply >> clustering algorithm to the given matrix. >> > np.shape(distance_matrix)==(14000,14000) >> > >> > I would like to know which clustering suits better and is there any >> need to process the data further to get it in the form so that a model can >> be applied. >> > Also any performance tip as the matrix takes around 3-4 hrs of >> processing. >> > You can find my code here >> https://github.com/maxyodedara5/BE_Project/blob/master/main.ipynb >> > Code for READ ONLY PURPOSE. >> > -- >> > Regards >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn@python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > > -- > Regards > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn