2011/10/4 Conrad Lee <[email protected]>:
> On Tue, Oct 4, 2011 at 10:36 PM, Gael Varoquaux
> <[email protected]> wrote:
>>
>> On Tue, Oct 04, 2011 at 10:27:01PM +0200, Lars Buitinck wrote:
>> > You must mean from Θ(n² lg n) to Θ(n²). A generic Θ(n² lg n) algo is
>> > listed in many textbooks [1], I sure hope we don't have the naive algo
>> > scikit-learn?
>
> @Lars:
> I haven't gone through the source code and figured out what the actual
> complexity is.  But if you look at the table on this page, the author of
> that paper claims that scipy.cluster.hierarchy (as well as all of the other
> implementations) are O(n^3).

I've checked the source, turns out we have the priority queue-based
scheme, which should be n² lg n. The author of the arXiv paper lists a
similar algo on p.9.

> In the third paragraph of his paper, he mentions:
>
> Even when software packages do not use the inefficient primitive [stepwise,
> procedural] algorithm (as SciPy (Eads,2007) and the R (R Development Core
> Team, 2011) methods hclust and agnes do), the
> author found that implementations largely use suboptimal algorithms rather
> than improved

It would be interesting to experiment

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to