[scikit-learn] Agglomerative clustering
I have some data and also the pairwise distance matrix of these data points. I want to cluster them using Agglomerative clustering. I readthat in sklearn, we can have 'precomputed' as affinity and I expect it is the distance matrix. But I could not find any example which uses precomputed affinity and a custom distance matrix. Any help will be highly appreciated. Best, -Noushin ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Agglomerative Clustering without knowing number of clusters
I want to perform agglomerative clustering, but I have no idea of number of clusters before hand. But I want that every cluster has at least 40 data points in it. How can I apply this to sklearn.agglomerative clustering? Should I use dendrogram and cut it somehow? I have no idea how to relate dendrogram to this and cutting it out. Any help will be appreciated! ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Agglomerative Clustering without knowing number of clusters
Dear Shane, Thanks for your time. But I have to implement it by agglomerative clustering and cut it when each cluster has at least 40 data points. But I am not sure how to do cut it. I was guessing maybe it can be done by cutting the dandrogram? Is it correct? If so, I do not know how to apply it. Could you give me a point? Best, Ariani On Thu, Jul 6, 2017 at 12:32 PM, Shane Grigsby wrote: > This sounds like it may be a problem more amenable to either DBSCAN or > OPTICS. Both algorithms don't require a priori knowledge of the number of > clusters, and both let you specify a minimum point membership threshold for > cluster membership. The OPTICS algorithm will also produce a dendrogram > that you can cut for sub clusters if need be. > > DBSCAN is part of the stable release and has been for some time; OPTICS is > pending as a pull request, but it's stable and you can try it if you like: > > https://github.com/scikit-learn/scikit-learn/pull/1984 > > Cheers, > Shane > > > On 06/30, Ariani A wrote: > >> I want to perform agglomerative clustering, but I have no idea of number >> of >> clusters before hand. But I want that every cluster has at least 40 data >> points in it. How can I apply this to sklearn.agglomerative clustering? >> Should I use dendrogram and cut it somehow? I have no idea how to relate >> dendrogram to this and cutting it out. Any help will be appreciated! >> > > ___ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > *PhD candidate & Research Assistant* > *Cooperative Institute for Research in Environmental Sciences (CIRES)* > *University of Colorado at Boulder* > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Help with NLP
Dear all, I need an urgent help with NLP, do you happen to know anyone who knows nltk or NLP modules? Have anybody of you read this paper? "Template-Based Information Extraction without the Templates." I am looking forward to hearirng from you soon! Best, -Ariani ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Help with NLP
Yes , it is. regards On Fri, Jul 7, 2017 at 12:23 PM, Carlton Banks wrote: > NLP as is Natural language processing? > > Den 7. jul. 2017 kl. 18.18 skrev Ariani A : > > Dear all, > I need an urgent help with NLP, do you happen to know anyone who knows > nltk or NLP modules? Have anybody of you read this paper? > "Template-Based Information Extraction without the Templates." > I am looking forward to hearirng from you soon! > Best, > -Ariani > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Help with NLP
Dear Jacob, I know, but I am just asking to get help! @Carlton, I want to do text processing, can I email you so that the others do not bother? Best, -Ariani On Fri, Jul 7, 2017 at 12:52 PM, Jacob Schreiber wrote: > The scikit-learn mailing list is probably not the best place to be asking > for help with another module. > > On Fri, Jul 7, 2017 at 9:28 AM Ariani A wrote: > >> Yes , it is. >> regards >> >> On Fri, Jul 7, 2017 at 12:23 PM, Carlton Banks wrote: >> >>> NLP as is Natural language processing? >>> >>> Den 7. jul. 2017 kl. 18.18 skrev Ariani A : >>> >>> Dear all, >>> I need an urgent help with NLP, do you happen to know anyone who knows >>> nltk or NLP modules? Have anybody of you read this paper? >>> "Template-Based Information Extraction without the Templates." >>> I am looking forward to hearirng from you soon! >>> Best, >>> -Ariani >>> ___ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> >>> ___ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> ___ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Agglomerative clustering problem
Hi all, I want to perform agglomerative clustering, but I have no idea of number of clusters before hand. But I want that every cluster has at least 40 data points in it. How can I apply this to sklearn.agglomerative clustering? Should I use dendrogram and cut it somehow? I have no idea how to relate dendrogram to this and cutting it out. Any help will be appreciated! I have to use agglomerative clustering! Thanks, -Ariani ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Agglomerative clustering problem
ِDear Uri, Thanks. I just have a pairwise distance matrix and I want to implement it so that each cluster has at least 40 data points. (in Agglomerative). Does it work? Thanks, -Ariani On Tue, Jul 11, 2017 at 1:54 PM, Uri Goren wrote: > Take a look at scipy's fcluster function. > If M is a matrix of all of your feature vectors, this code snippet should > work. > > You need to figure out what metric and algorithm work for you > > from sklearn.metrics import pairwise_distance > from scipy.cluster import hierarchy > X = pairwise_distance(M, metric=metric) > Z = hierarchy.linkage(X, algo, metric=metric) > C = hierarchy.fcluster(Z,threshold, criterion="distance") > > Best, > Uri Goren > > On Tue, Jul 11, 2017 at 7:42 PM, Ariani A wrote: > >> Hi all, >> I want to perform agglomerative clustering, but I have no idea of number >> of clusters before hand. But I want that every cluster has at least 40 >> data points in it. How can I apply this to sklearn.agglomerative clusteri >> ng? >> Should I use dendrogram and cut it somehow? I have no idea how to relate >> dendrogram to this and cutting it out. Any help will be appreciated! >> I have to use agglomerative clustering! >> Thanks, >> -Ariani >> >> ___ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > > -- > > > *Uri Goren,Software innovator* > > *Phone: +972-507-649-650* > > *EMail: u...@goren4u.com * > *Linkedin: il.linkedin.com/in/ugoren/ <http://il.linkedin.com/in/ugoren/>* > > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Agglomerative Clustering without knowing number of clusters
Dear Shane, Thanks for your answer. Does DBSCAN works with distance matrix/? I have a distance matrix (symmetric matrix which contains pairwise distances). Can you help me? I did not find DBSCAN code in that link. Best, -Ariani On Thu, Jul 6, 2017 at 12:32 PM, Shane Grigsby wrote: > This sounds like it may be a problem more amenable to either DBSCAN or > OPTICS. Both algorithms don't require a priori knowledge of the number of > clusters, and both let you specify a minimum point membership threshold for > cluster membership. The OPTICS algorithm will also produce a dendrogram > that you can cut for sub clusters if need be. > > DBSCAN is part of the stable release and has been for some time; OPTICS is > pending as a pull request, but it's stable and you can try it if you like: > > https://github.com/scikit-learn/scikit-learn/pull/1984 > > Cheers, > Shane > > > On 06/30, Ariani A wrote: > >> I want to perform agglomerative clustering, but I have no idea of number >> of >> clusters before hand. But I want that every cluster has at least 40 data >> points in it. How can I apply this to sklearn.agglomerative clustering? >> Should I use dendrogram and cut it somehow? I have no idea how to relate >> dendrogram to this and cutting it out. Any help will be appreciated! >> > > ___ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > *PhD candidate & Research Assistant* > *Cooperative Institute for Research in Environmental Sciences (CIRES)* > *University of Colorado at Boulder* > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Agglomerative Clustering without knowing number of clusters
Dear Shane, Thanks for your prompt answer. Do you mean that for DBSCAN there is no need to feed other parameters? Do I just call the function or I have to manipulate the code? P.S. I was not able to find the DBSCAN code on github. Looking forward to hearing from you. Best, -Noushin On Thu, Jul 13, 2017 at 5:38 PM, Shane Grigsby wrote: > Hi Ariani, > Yes, you can use a distance matrix-- I think that what you want is > metric='precomputed', and then X would be your N by N distance matrix. > Hope that helps, > ~Shane > > > On 07/13, Ariani A wrote: > >> Dear Shane, >> Thanks for your answer. >> Does DBSCAN works with distance matrix/? I have a distance matrix >> (symmetric matrix which contains pairwise distances). Can you help me? I >> did not find DBSCAN code in that link. >> Best, >> -Ariani >> >> On Thu, Jul 6, 2017 at 12:32 PM, Shane Grigsby < >> shane.grig...@colorado.edu> >> wrote: >> >> This sounds like it may be a problem more amenable to either DBSCAN or >>> OPTICS. Both algorithms don't require a priori knowledge of the number of >>> clusters, and both let you specify a minimum point membership threshold >>> for >>> cluster membership. The OPTICS algorithm will also produce a dendrogram >>> that you can cut for sub clusters if need be. >>> >>> DBSCAN is part of the stable release and has been for some time; OPTICS >>> is >>> pending as a pull request, but it's stable and you can try it if you >>> like: >>> >>> https://github.com/scikit-learn/scikit-learn/pull/1984 >>> >>> Cheers, >>> Shane >>> >>> >>> On 06/30, Ariani A wrote: >>> >>> I want to perform agglomerative clustering, but I have no idea of number >>>> of >>>> clusters before hand. But I want that every cluster has at least 40 data >>>> points in it. How can I apply this to sklearn.agglomerative clustering? >>>> Should I use dendrogram and cut it somehow? I have no idea how to relate >>>> dendrogram to this and cutting it out. Any help will be appreciated! >>>> >>>> >>> ___ >>> >>>> scikit-learn mailing list >>>> scikit-learn@python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> -- >>> *PhD candidate & Research Assistant* >>> *Cooperative Institute for Research in Environmental Sciences (CIRES)* >>> *University of Colorado at Boulder* >>> ___ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> > ___ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > *PhD candidate & Research Assistant* > *Cooperative Institute for Research in Environmental Sciences (CIRES)* > *University of Colorado at Boulder* > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Agglomerative Clustering without knowing number of clusters
Dear Shane, Sorry bothering you! Is the "precomputed" and "distance matrix" you are talking about, are about "DBSCAN" ? Thanks, Best. On Thu, Jul 13, 2017 at 7:03 PM, Ariani A wrote: > Dear Shane, > Thanks for your prompt answer. > Do you mean that for DBSCAN there is no need to feed other parameters? Do > I just call the function or I have to manipulate the code? > P.S. I was not able to find the DBSCAN code on github. > Looking forward to hearing from you. > Best, > -Noushin > > On Thu, Jul 13, 2017 at 5:38 PM, Shane Grigsby > wrote: > >> Hi Ariani, >> Yes, you can use a distance matrix-- I think that what you want is >> metric='precomputed', and then X would be your N by N distance matrix. >> Hope that helps, >> ~Shane >> >> >> On 07/13, Ariani A wrote: >> >>> Dear Shane, >>> Thanks for your answer. >>> Does DBSCAN works with distance matrix/? I have a distance matrix >>> (symmetric matrix which contains pairwise distances). Can you help me? I >>> did not find DBSCAN code in that link. >>> Best, >>> -Ariani >>> >>> On Thu, Jul 6, 2017 at 12:32 PM, Shane Grigsby < >>> shane.grig...@colorado.edu> >>> wrote: >>> >>> This sounds like it may be a problem more amenable to either DBSCAN or >>>> OPTICS. Both algorithms don't require a priori knowledge of the number >>>> of >>>> clusters, and both let you specify a minimum point membership threshold >>>> for >>>> cluster membership. The OPTICS algorithm will also produce a dendrogram >>>> that you can cut for sub clusters if need be. >>>> >>>> DBSCAN is part of the stable release and has been for some time; OPTICS >>>> is >>>> pending as a pull request, but it's stable and you can try it if you >>>> like: >>>> >>>> https://github.com/scikit-learn/scikit-learn/pull/1984 >>>> >>>> Cheers, >>>> Shane >>>> >>>> >>>> On 06/30, Ariani A wrote: >>>> >>>> I want to perform agglomerative clustering, but I have no idea of number >>>>> of >>>>> clusters before hand. But I want that every cluster has at least 40 >>>>> data >>>>> points in it. How can I apply this to sklearn.agglomerative clustering? >>>>> Should I use dendrogram and cut it somehow? I have no idea how to >>>>> relate >>>>> dendrogram to this and cutting it out. Any help will be appreciated! >>>>> >>>>> >>>> ___ >>>> >>>>> scikit-learn mailing list >>>>> scikit-learn@python.org >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>> >>>> -- >>>> *PhD candidate & Research Assistant* >>>> *Cooperative Institute for Research in Environmental Sciences (CIRES)* >>>> *University of Colorado at Boulder* >>>> ___ >>>> scikit-learn mailing list >>>> scikit-learn@python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >> ___ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> >> >> -- >> *PhD candidate & Research Assistant* >> *Cooperative Institute for Research in Environmental Sciences (CIRES)* >> *University of Colorado at Boulder* >> ___ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] No module named crluster.hierarchical
Dear all, I am writing this import: from sklearn.crluster.hierarchical import (_hc_cut, _TREE_BUILDERS, linkage_tree) But it gives this error: ImportError: No module named crluster.hierarchical Any clue? Best regards, -Noushin ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] No module named crluster.hierarchical
Thank you so much! On Sun, Aug 13, 2017 at 12:20 PM, Vlad Niculae wrote: > Looks like you're misspelling the word "cluster". > > Yours, > Vlad > > On Aug 13, 2017 12:19 PM, "Ariani A" wrote: > >> Dear all, >> >> I am writing this import: >> >> from sklearn.crluster.hierarchical import (_hc_cut, _TREE_BUILDERS, >> linkage_tree) >> But it gives this error: >> ImportError: No module named crluster.hierarchical >> >> Any clue? >> Best regards, >> -Noushin >> >> ___ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn