Hi,

I have recently used the DBSCAN implementation of scikit-learn, and I have a 
"quick" question.

Currently, noise points are labelled as -1 in a numpy array. 

>From my point of view, clustering labels can be used for example as index of a 
>sequence.
However, in Python -1 is still a valid index value, therefore, it is possible 
to mix label -1 with
label N when accessing elements of a list.

If a list was used instead of a numpy array, label for error points could be 
None, and then
an exception would be raised when using this label as an index.

In both case, the user have to check for the validity of the label before doing 
anything with it.
However, in my humble opinion None would be more explicit than -1, and limit 
the risk of mixing
error points with correclty labeled ones.

I presumed there are valid reasons for using a numpy array, and represent error 
points as -1.
Could someone enlighten me on that matter? Are the implications limited to 
DBSCAN?

Regards,
Félix-Antoine Fortin
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to