Hi, I have recently used the DBSCAN implementation of scikit-learn, and I have a "quick" question.
Currently, noise points are labelled as -1 in a numpy array. >From my point of view, clustering labels can be used for example as index of a >sequence. However, in Python -1 is still a valid index value, therefore, it is possible to mix label -1 with label N when accessing elements of a list. If a list was used instead of a numpy array, label for error points could be None, and then an exception would be raised when using this label as an index. In both case, the user have to check for the validity of the label before doing anything with it. However, in my humble opinion None would be more explicit than -1, and limit the risk of mixing error points with correclty labeled ones. I presumed there are valid reasons for using a numpy array, and represent error points as -1. Could someone enlighten me on that matter? Are the implications limited to DBSCAN? Regards, Félix-Antoine Fortin ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
