2013/2/6 Vinay B, vybe3...@gmail.com:
Hi
Almost there (I hope) , but not quite:
I put my code up at https://gist.github.com/balamuru/4726232 for
readability. Reading a directory of text files in chunks of 5, and returning
them in a dictionary (key= filename, value= text contents)
I wanted
So I tried your recommendations. The partial fit seems to operate to an
extent. Then BOOM! It looks very similar to the example in
http://scikit-learn.org/dev/auto_examples/document_clustering.html#example-document-clustering-py
.
Wonder what I'm doing wrong this time?
.
Relevant code
I updated scikit to the latest version.
The bug I reported earlier no longer exists. Now the minibatch k means
completes. Now, I have an error printing out the docs per cluster.
Complete code at https://gist.github.com/balamuru/4734765
Thanks in advance
Output
.
.
## counts: (10, 1)
##
There is probably an issue when accessing the `labels_` attribute if
you do `partial_fit` instead of `fit`. Instead of using label, you
should probably do another pass over the data and call `predict`
instead to compute the cluster membership info for each sample.
Hi,
I tried again . I feel there's something wrong I'm doing with my code so
far. In any case, the print loop I added was
doc_idx = 0
for cluster_doc_filename in file_names:
#predicted_cluster = km.predict(cluster_doc_filename)
predicted_cluster = km.predict(doc_idx)
Fellow sklearners,
I am working on a classification problem with an unbalanced data set and
have been successful using SVM classifiers with the class_weight option.
I have also tried Random Forests and am getting a decent ROC performance
but I am hoping to get a performance improvement by using
Hello,
You might achieve what you want by using sample weights when fitting
your forest (See the 'sample_weight' parameter). There is also a
'balance_weights' method from the preprocessing module that basically
generates sample weights for you, such that classes become balanced.
Thanks Gilles. This definitely helps. I am glad I asked. :-)
-Manish
On Feb 7, 2013, at 11:33 PM, Gilles Louppe g.lou...@gmail.com wrote:
Hello,
You might achieve what you want by using sample weights when fitting
your forest (See the 'sample_weight' parameter). There is also a