Re: [scikit-learn] clustering on big dataset

2018-01-04 Thread Joel Nothman
Yes, use an approximate nearest neighbors approach. None is included in scikit-learn, but there are numerous implementations with Python interfaces. On 5 January 2018 at 12:51, Shiheng Duan wrote: > Thanks, Joel, > I am working on KD-tree to find the nearest neighbors.

Re: [scikit-learn] clustering on big dataset

2018-01-04 Thread Shiheng Duan
Thanks, Joel, I am working on KD-tree to find the nearest neighbors. Basically, I find the nearest neighbors for each point and then merge a couple of points if they are both NN for each other. The problem is that after each iteration, we will have a new bunch of points, where new clusters are

[scikit-learn] Position at BIDS (UC Berkeley) to work on NumPy

2018-01-04 Thread Stefan van der Walt
Hi everyone, The Berkeley Institute for Data Science (BIDS) is hiring scientific Python Developers to contribute to NumPy. You can read more about the new positions here: https://bids.berkeley.edu/news/bids-receives-sloan-foundation-grant-contribute-numpy-development If you enjoy

Re: [scikit-learn] A necessary feature for Decision trees

2018-01-04 Thread Andreas Mueller
Your contribution would be very welcome, I think the current work has stalled. On 01/04/2018 10:02 AM, Julio Antonio Soto de Vicente wrote: Hi Yang Li, I have to agree with you. Bitset and/or one hot encoding are just hacks which should not be necessary for decision tree learners. There

Re: [scikit-learn] A necessary feature for Decision trees

2018-01-04 Thread Julio Antonio Soto de Vicente
Hi Yang Li, I have to agree with you. Bitset and/or one hot encoding are just hacks which should not be necessary for decision tree learners. There is some WIP on an implementation for natural handling of categorical features in trees: please take a look at