Re: [Haskell-cafe] ANNOUNCE: hierarchical-clustering and gsc-weighting

2010-08-04 Thread Felipe Lessa
On Tue, Aug 3, 2010 at 8:23 AM, Felipe Lessa  wrote:
> On Tue, Aug 3, 2010 at 8:01 AM, Ivan Lazar Miljenovic
>  wrote:
>> Felipe Lessa  writes:
>>> 'hierarchical-clustering' provides a function to create a dendrogram
>>> from a list of items and a distance function between them.  The most
>>> common linkage types are available: single linkage, complete linkage
>>> and UPGMA.  An item can be anything, for example a DNA sequence, so
>>> this may used to create a phylogenetic tree.
>>
>> What actual clustering algorithm are you using here?
>
> A naïve O(n^2) algorithm using a distance matrix.  This can be
> improved without changing the API, however.

What a blunder!  I mean, an O(n^3) algorithm -- each step takes
O(n^2), and you need 'n' steps to create the whole dendrogram.

I'll fix the documentation on the next release.

Cheers! =)

-- 
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANNOUNCE: hierarchical-clustering and gsc-weighting

2010-08-03 Thread Felipe Lessa
On Tue, Aug 3, 2010 at 8:01 AM, Ivan Lazar Miljenovic
 wrote:
> Felipe Lessa  writes:
>> 'hierarchical-clustering' provides a function to create a dendrogram
>> from a list of items and a distance function between them.  The most
>> common linkage types are available: single linkage, complete linkage
>> and UPGMA.  An item can be anything, for example a DNA sequence, so
>> this may used to create a phylogenetic tree.
>
> What actual clustering algorithm are you using here?

A naïve O(n^2) algorithm using a distance matrix.  This can be
improved without changing the API, however.

> Also, would it be possible to have some more documentation there in
> general?  At the very least, in your next release explain what a
> dendogram is and why someone would want to use your package (I had to do
> some quick wikipedia looking to refresh my memory on what dendogram,
> etc. were to get an understanding of what it does).

Documentation is always good, but I didn't want to take the time to
explain everything from the beginning.  I guess most people coming to
this package will already know that they want a dendrogram.  But if
they don't, a quick googling is very effective.  Hmm, I guess some
diagrams would be nice.

I've took the time only to explain why there is an "UPGMA" and a
"FakeAverageLinkage", because that distinction isn't easy to find on
the web.  Actually, I still haven't found someone talking about it,
just people using either with the same name "average linkage". =)

Cheers,

-- 
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANNOUNCE: hierarchical-clustering and gsc-weighting

2010-08-03 Thread Ivan Lazar Miljenovic
Felipe Lessa  writes:

> Hello!
>
> I'm pleased to announce the release of two new packages:
>
> http://hackage.haskell.org/package/hierarchical-clustering
> http://hackage.haskell.org/package/gsc-weighting
>
> 'hierarchical-clustering' provides a function to create a dendrogram
> from a list of items and a distance function between them.  The most
> common linkage types are available: single linkage, complete linkage
> and UPGMA.  An item can be anything, for example a DNA sequence, so
> this may used to create a phylogenetic tree.

What actual clustering algorithm are you using here?

Also, would it be possible to have some more documentation there in
general?  At the very least, in your next release explain what a
dendogram is and why someone would want to use your package (I had to do
some quick wikipedia looking to refresh my memory on what dendogram,
etc. were to get an understanding of what it does).

-- 
Ivan Lazar Miljenovic
ivan.miljeno...@gmail.com
IvanMiljenovic.wordpress.com
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] ANNOUNCE: hierarchical-clustering and gsc-weighting

2010-08-03 Thread Felipe Lessa
Hello!

I'm pleased to announce the release of two new packages:

http://hackage.haskell.org/package/hierarchical-clustering
http://hackage.haskell.org/package/gsc-weighting

'hierarchical-clustering' provides a function to create a dendrogram
from a list of items and a distance function between them.  The most
common linkage types are available: single linkage, complete linkage
and UPGMA.  An item can be anything, for example a DNA sequence, so
this may used to create a phylogenetic tree.

Or it may be used with the 'gsc-weighting' package to assign weights
to the items.  Weights are assigned such that close items get smaller
weight than distance items, meaning that the weights try to avoid the
over-representation of some closely related items.  The package name
come from the authors of the algorithm, Gerstein, Sonnhammer and
Chothia.  Again, this may be used for DNA or protein sequences.

Cheers!

-- 
Felipe.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe