Will look into it.Although I have problem generating cluster as my data is
14000x14000 distance_matrix and it says "Memory Error".
I have 6GB RAM. Any insight on this error is welcomed.

Regards

On Tue, Feb 13, 2018 at 3:19 AM, federico vaggi <vaggi.feder...@gmail.com>
wrote:

> [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible
> for Automatic Cleanup! (vaggi.feder...@gmail.com) Add cleanup rule
> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3DWmN8Mni1Bb%252FE8vwryDuPZWhWHXFeTNQTCcULsb9jMjU%253D%26token%3DYb1l0t6dUur3BzQcPWoAtfZoRneTcSjBc7Hz71Vlw9rgeYey9pcoZFeiA382Ppwp1hXBAD8avWAhOWnB8n69OcpSzLkqvY%252BMRFSiHXcSsJwtcw0QnW%252BWD%252BK4P9E88owiW5dp7GSNsBxS0EzTLeEwNQ%253D%253D&tc_serial=36655412130&tc_rand=323288304&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
> | More info
> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=36655412130&tc_rand=323288304&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>
> As a caveat, a lot of clustering algorithms assume that the distance
> matrix is a proper metric.  If your distance is not a proper metric then
> the results might be meaningless (the narrative docs do a good job of
> discussing this).
>
> On Mon, 12 Feb 2018 at 13:30 prince gosavi <princegosav...@gmail.com>
> wrote:
>
>> Hi,
>> Thanks for those tips Sebastian.That just saved my day.
>>
>> Regards,
>> Rajkumar
>>
>> On Tue, Feb 13, 2018 at 12:44 AM, Sebastian Raschka <se.rasc...@gmail.com
>> > wrote:
>>
>>> [image: Boxbe] <https://www.boxbe.com/overview> This message is
>>> eligible for Automatic Cleanup! (se.rasc...@gmail.com) Add cleanup rule
>>> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Fkey%3D0a2mz6HiALxmseA8EtEa3hg8FtAfQyTwNzLAvbS3JOk%253D%26token%3D8qZlnKU2OJ%252BeTscNUfA9PjpDKa2%252FZO8i9dvKkAyr7bKz%252Bi2MdFTFnLILfmhv4s3s%252Bva0Dy7LpRz63wO18BlP48DNIu3aSb%252FmxAVjQq1fCD0tDxFcxxdH2mq9Otany%252FdER3CzXyokyLg%253D&tc_serial=36653890807&tc_rand=854549477&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>>> | More info
>>> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=36653890807&tc_rand=854549477&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>>>
>>
>>> Hi,
>>>
>>> by default, the clustering classes from sklearn, (e.g., DBSCAN), take an
>>> [num_examples, num_features] array as input, but you can also provide the
>>> distance matrix directly, e.g., by instantiating it with
>>> metric='precomputed'
>>>
>>> my_dbscan = DBSCAN(..., metric='precomputed')
>>> my_dbscan.fit(my_distance_matrix)
>>>
>>> Not sure if it helps in that particular case (depending on how many zero
>>> elements you have), you can also use a sparse matrix in CSR format (
>>> https://docs.scipy.org/doc/scipy-1.0.0/reference/
>>> generated/scipy.sparse.csr_matrix.html).
>>>
>>> Also, you don't need to for-loop through the rows if you want to compute
>>> the pair-wise distances, you can simply do that on the complete array. E.g.,
>>>
>>> from sklearn.metrics.pairwise import cosine_distances
>>> from scipy import sparse
>>>
>>> distance_matrix = cosine_distances(sparse.csr_matrix(X),
>>> dense_output=False)
>>>
>>> where X is your "[num_examples, num_features]" array.
>>>
>>> Best,
>>> Sebastian
>>>
>>>
>>> > On Feb 12, 2018, at 1:10 PM, prince gosavi <princegosav...@gmail.com>
>>> wrote:
>>> >
>>>
>> > I have generated a cosine distance matrix and would like to apply
>>> clustering algorithm to the given matrix.
>>> > np.shape(distance_matrix)==(14000,14000)
>>> >
>>> > I would like to know which clustering suits better and is there any
>>> need to process the data further to get it in the form so that a model can
>>> be applied.
>>> > Also any performance tip as the matrix takes around 3-4 hrs of
>>> processing.
>>> > You can find my code here https://github.com/
>>> maxyodedara5/BE_Project/blob/master/main.ipynb
>>> > Code for READ ONLY PURPOSE.
>>> > --
>>> > Regards
>>> > _______________________________________________
>>> > scikit-learn mailing list
>>> > scikit-learn@python.org
>>> > https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn@python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>
>>
>> --
>> Regards
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>


-- 
Regards
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to