Will look into it.Although I have problem generating cluster as my data is 14000x14000 distance_matrix and it says "Memory Error". I have 6GB RAM. Any insight on this error is welcomed.
Regards On Tue, Feb 13, 2018 at 3:19 AM, federico vaggi <vaggi.feder...@gmail.com> wrote: > [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible > for Automatic Cleanup! (vaggi.feder...@gmail.com) Add cleanup rule > <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3DWmN8Mni1Bb%252FE8vwryDuPZWhWHXFeTNQTCcULsb9jMjU%253D%26token%3DYb1l0t6dUur3BzQcPWoAtfZoRneTcSjBc7Hz71Vlw9rgeYey9pcoZFeiA382Ppwp1hXBAD8avWAhOWnB8n69OcpSzLkqvY%252BMRFSiHXcSsJwtcw0QnW%252BWD%252BK4P9E88owiW5dp7GSNsBxS0EzTLeEwNQ%253D%253D&tc_serial=36655412130&tc_rand=323288304&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001> > | More info > <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=36655412130&tc_rand=323288304&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001> > > As a caveat, a lot of clustering algorithms assume that the distance > matrix is a proper metric. If your distance is not a proper metric then > the results might be meaningless (the narrative docs do a good job of > discussing this). > > On Mon, 12 Feb 2018 at 13:30 prince gosavi <princegosav...@gmail.com> > wrote: > >> Hi, >> Thanks for those tips Sebastian.That just saved my day. >> >> Regards, >> Rajkumar >> >> On Tue, Feb 13, 2018 at 12:44 AM, Sebastian Raschka <se.rasc...@gmail.com >> > wrote: >> >>> [image: Boxbe] <https://www.boxbe.com/overview> This message is >>> eligible for Automatic Cleanup! (se.rasc...@gmail.com) Add cleanup rule >>> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3D0a2mz6HiALxmseA8EtEa3hg8FtAfQyTwNzLAvbS3JOk%253D%26token%3D8qZlnKU2OJ%252BeTscNUfA9PjpDKa2%252FZO8i9dvKkAyr7bKz%252Bi2MdFTFnLILfmhv4s3s%252Bva0Dy7LpRz63wO18BlP48DNIu3aSb%252FmxAVjQq1fCD0tDxFcxxdH2mq9Otany%252FdER3CzXyokyLg%253D&tc_serial=36653890807&tc_rand=854549477&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001> >>> | More info >>> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=36653890807&tc_rand=854549477&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001> >>> >> >>> Hi, >>> >>> by default, the clustering classes from sklearn, (e.g., DBSCAN), take an >>> [num_examples, num_features] array as input, but you can also provide the >>> distance matrix directly, e.g., by instantiating it with >>> metric='precomputed' >>> >>> my_dbscan = DBSCAN(..., metric='precomputed') >>> my_dbscan.fit(my_distance_matrix) >>> >>> Not sure if it helps in that particular case (depending on how many zero >>> elements you have), you can also use a sparse matrix in CSR format ( >>> https://docs.scipy.org/doc/scipy-1.0.0/reference/ >>> generated/scipy.sparse.csr_matrix.html). >>> >>> Also, you don't need to for-loop through the rows if you want to compute >>> the pair-wise distances, you can simply do that on the complete array. E.g., >>> >>> from sklearn.metrics.pairwise import cosine_distances >>> from scipy import sparse >>> >>> distance_matrix = cosine_distances(sparse.csr_matrix(X), >>> dense_output=False) >>> >>> where X is your "[num_examples, num_features]" array. >>> >>> Best, >>> Sebastian >>> >>> >>> > On Feb 12, 2018, at 1:10 PM, prince gosavi <princegosav...@gmail.com> >>> wrote: >>> > >>> >> > I have generated a cosine distance matrix and would like to apply >>> clustering algorithm to the given matrix. >>> > np.shape(distance_matrix)==(14000,14000) >>> > >>> > I would like to know which clustering suits better and is there any >>> need to process the data further to get it in the form so that a model can >>> be applied. >>> > Also any performance tip as the matrix takes around 3-4 hrs of >>> processing. >>> > You can find my code here https://github.com/ >>> maxyodedara5/BE_Project/blob/master/main.ipynb >>> > Code for READ ONLY PURPOSE. >>> > -- >>> > Regards >>> > _______________________________________________ >>> > scikit-learn mailing list >>> > scikit-learn@python.org >>> > https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> >> -- >> Regards >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -- Regards
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn