I will add my patch with in 3 to 4 days. I am done with everything.
except that I need to write some test classes.
Thanks
Pallavi
Robin Anil (JIRA) wrote:
[
[
https://issues.apache.org/jira/browse/MAHOUT-283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated MAHOUT-283:
-
Resolution: Fixed
Fix Version/s: (was: 0.3)
0.4
Assignee: Drew
[
https://issues.apache.org/jira/browse/MAHOUT-280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved MAHOUT-280.
--
Resolution: Won't Fix
Fix Version/s: (was: 0.3)
0.4
This may be
[
https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pallavi Palleti updated MAHOUT-153:
---
Attachment: Mahout-153.patch
Here is the patch for selecting initial clusters for a
[
https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831391#action_12831391
]
Pallavi Palleti commented on MAHOUT-153:
Forgot to mention. The above patch doesn't
In Fuzzy Kmeans, when the distance between centroid and the given point is
zero, then it should belong to that cluster with probability 1 and rest with
probability zero
[
https://issues.apache.org/jira/browse/MAHOUT-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pallavi Palleti updated MAHOUT-284:
---
Attachment: Mahout-284.patch
This patch fix the issue
In Fuzzy Kmeans, when the distance
[
https://issues.apache.org/jira/browse/MAHOUT-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831396#action_12831396
]
Jake Mannix commented on MAHOUT-237:
{code}
RandomAccessSparseVector vector =
[
https://issues.apache.org/jira/browse/MAHOUT-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831410#action_12831410
]
Sean Owen commented on MAHOUT-237:
--
I dunno, I think of it as exactly that flag, doesn't
[
https://issues.apache.org/jira/browse/MAHOUT-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831413#action_12831413
]
Jake Mannix commented on MAHOUT-237:
I think of it as that flag as well, but when doing
[
https://issues.apache.org/jira/browse/MAHOUT-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831420#action_12831420
]
Jake Mannix commented on MAHOUT-237:
I do notice that recently added to this set of
[
https://issues.apache.org/jira/browse/MAHOUT-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831428#action_12831428
]
Robin Anil commented on MAHOUT-237:
---
You just needed the count? You could always
I don't, but can offer alternatives --
Just have the user download the data set. I don't think this is a big burden.
Download the data set automatically.
These are free of legal and tarball-size problems.
On Tue, Feb 9, 2010 at 2:11 PM, Robin Anil robin.a...@gmail.com wrote:
I feel a need to
Make the maven test phase download this dataset once for all tests ? Is that
possible
On Tue, Feb 9, 2010 at 7:43 PM, Sean sro...@gmail.com wrote:
I don't, but can offer alternatives --
Just have the user download the data set. I don't think this is a big
burden.
Download the data set
Wrap up collocation and dictionary vectorizer integration
-
Key: MAHOUT-285
URL: https://issues.apache.org/jira/browse/MAHOUT-285
Project: Mahout
Issue Type: Improvement
Affects
Sure, how about a bunch of Apache project websites? The project name is the
category, i.e. Lucene, Tomcat, Hadoop, etc.
On Feb 9, 2010, at 9:11 AM, Robin Anil wrote:
I feel a need to check in a set of text documents to mahout. maybe 3-4
categories of documents 10 each.
can be used in
Yeah that sounds ok. Do we have the pure content without html ?
Robin
On Tue, Feb 9, 2010 at 8:24 PM, Grant Ingersoll gsing...@apache.org wrote:
Sure, how about a bunch of Apache project websites? The project name is
the category, i.e. Lucene, Tomcat, Hadoop, etc.
On Feb 9, 2010, at 9:11
On Feb 9, 2010, at 9:56 AM, Robin Anil wrote:
Yeah that sounds ok. Do we have the pure content without html ?
No, but I was just thinking yesterday that a really nice enhancement to the
Doc. Vectorizer would be to hook in Tika, such that one could M/R binary files
into Mahout vectors.
Yeah!. Tika looks great!. I bet Drew's patch to create a structured document
format via Avro should essentially go into Tika. Then we could really use
the Tika library to the full.
I should really spend time to explore Apache projects. I think we could
reuse a whole lot.
Robin
On Tue, Feb 9,
On Feb 9, 2010, at 10:24 AM, Robin Anil wrote:
Yeah!. Tika looks great!. I bet Drew's patch to create a structured document
format via Avro should essentially go into Tika. Then we could really use
the Tika library to the full.
Solr has code here that would be pretty simple to grab, but it's
[
https://issues.apache.org/jira/browse/MAHOUT-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil resolved MAHOUT-242.
---
Resolution: Fixed
Committed and resolved.
LLR Collocation Identifier
--
Need to be able to run classifiers from non-text input (such as ARFF data)
--
Key: MAHOUT-286
URL: https://issues.apache.org/jira/browse/MAHOUT-286
Project: Mahout
[
https://issues.apache.org/jira/browse/MAHOUT-286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Dunning updated MAHOUT-286:
---
Attachment: weka.log
mahout.log
Here are the original attachments Martin sent.
That was my first thought as well.
But I think a better answer is to mark the vector as stretchy so that it
reports the high water size as the actual size, but if you insert a non-zero
above that size, it will report the new high water mark thereafter.
This makes the code simple and clear. The
[
https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831622#action_12831622
]
Ted Dunning commented on MAHOUT-153:
I have been thinking about this problem a bit,
We (I) have had some problems with dependencies in the past. Some code
seemed very util, but some other things that seemed pretty core depended on
them.
I think that the real issue for me is that we have two meanings of utils.
One is generally useful stuff in core and the other is things that
[
https://issues.apache.org/jira/browse/MAHOUT-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831648#action_12831648
]
Ted Dunning commented on MAHOUT-227:
Is this going to be complete this week or next?
On Tue, Feb 9, 2010 at 12:20 PM, Ted Dunning ted.dunn...@gmail.com wrote:
I think that the real issue for me is that we have two meanings of utils.
One is generally useful stuff in core and the other is things that use
mahout to do cool things.
This is my problem too: *examples* is things
[
https://issues.apache.org/jira/browse/MAHOUT-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jake Mannix updated MAHOUT-180:
---
Attachment: MAHOUT-180.patch
Ok, ugly, dirty patch which needs to be cleaned up, but it does work,
On Feb 9, 2010, at 3:31 PM, Jake Mannix wrote:
On Tue, Feb 9, 2010 at 12:20 PM, Ted Dunning ted.dunn...@gmail.com wrote:
I think that the real issue for me is that we have two meanings of utils.
One is generally useful stuff in core and the other is things that use
mahout to do cool
Ahh, ok this makes sense.
Also as others pointed out, within 'core' are some 'small' utilities
used by core that are undeserving of their own module, e.g:
HadoopUtil. These generally go under the org.apache.mahout.common
package.
On Tue, Feb 9, 2010 at 3:43 PM, Grant Ingersoll
[
https://issues.apache.org/jira/browse/MAHOUT-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831678#action_12831678
]
Jeff Eastman commented on MAHOUT-270:
-
r908235 commits the Printable interface and
[
https://issues.apache.org/jira/browse/MAHOUT-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831680#action_12831680
]
Sean Owen commented on MAHOUT-237:
--
PS I think ted's suggestion that we need 'stretchable'
[
https://issues.apache.org/jira/browse/MAHOUT-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831678#action_12831678
]
Jeff Eastman edited comment on MAHOUT-270 at 2/9/10 9:39 PM:
-
[
https://issues.apache.org/jira/browse/MAHOUT-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831754#action_12831754
]
Sean Owen commented on MAHOUT-279:
--
Bah, it doesn't actually work in Hadoop, for reasons I
[
https://issues.apache.org/jira/browse/MAHOUT-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831837#action_12831837
]
zhao zhendong commented on MAHOUT-227:
--
So far, I didn't work on this parallel Binary
[
https://issues.apache.org/jira/browse/MAHOUT-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12831870#action_12831870
]
Robin Anil commented on MAHOUT-285:
---
In the Colloc driver why not run DocumentProcessor
There are some libaries in mahout only in very special place for only a few
classes. Cant we do without it? all these stats are courtesy of this
wonderful eclipse plugin STAN
http://stan4j.com/dependencies/dependency-analysis.html
Only 3 classes used for the EDU.oswego library.
The lovely named EDU.oswego.* stuff from Doug Lea's concurrent lib I
had tried really hard to figure out how to pull out when I first brought
colt
into the fold, but it turns out that these are parts of concurrent which
didn't make it into java.util.concurrent, and so actually aren't available
in
39 matches
Mail list logo