Hi, by default, the clustering classes from sklearn, (e.g., DBSCAN), take an [num_examples, num_features] array as input, but you can also provide the distance matrix directly, e.g., by instantiating it with metric='precomputed'
my_dbscan = DBSCAN(..., metric='precomputed') my_dbscan.fit(my_distance_matrix) Not sure if it helps in that particular case (depending on how many zero elements you have), you can also use a sparse matrix in CSR format (https://docs.scipy.org/doc/scipy-1.0.0/reference/generated/scipy.sparse.csr_matrix.html). Also, you don't need to for-loop through the rows if you want to compute the pair-wise distances, you can simply do that on the complete array. E.g., from sklearn.metrics.pairwise import cosine_distances from scipy import sparse distance_matrix = cosine_distances(sparse.csr_matrix(X), dense_output=False) where X is your "[num_examples, num_features]" array. Best, Sebastian > On Feb 12, 2018, at 1:10 PM, prince gosavi <princegosav...@gmail.com> wrote: > > I have generated a cosine distance matrix and would like to apply clustering > algorithm to the given matrix. > np.shape(distance_matrix)==(14000,14000) > > I would like to know which clustering suits better and is there any need to > process the data further to get it in the form so that a model can be applied. > Also any performance tip as the matrix takes around 3-4 hrs of processing. > You can find my code here > https://github.com/maxyodedara5/BE_Project/blob/master/main.ipynb > Code for READ ONLY PURPOSE. > -- > Regards > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn