[
https://issues.apache.org/jira/browse/MAHOUT-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated MAHOUT-239:
-
Resolution: Fixed
Assignee: Benson Margulies
Status: Resolved (was: Patch Available)
[
https://issues.apache.org/jira/browse/MAHOUT-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Isabel Drost resolved MAHOUT-85.
Resolution: Fixed
Finally committed.
Perceptron/Winnow Trainer
-
Parallel version of Perceptron
--
Key: MAHOUT-240
URL: https://issues.apache.org/jira/browse/MAHOUT-240
Project: Mahout
Issue Type: Improvement
Components: Classification
Affects Versions: 0.3
Example for perceptron
--
Key: MAHOUT-241
URL: https://issues.apache.org/jira/browse/MAHOUT-241
Project: Mahout
Issue Type: Improvement
Components: Classification
Affects Versions: 0.3
Reporter:
I have been testing out the DictionaryVectorizer on 20news dataset. Its
writing out 2GB vector files for the 38MB dataset
This is what i am doing. Tell me where I am going wrong
First I create an infinite dimensional vector of size 10,
SparseVector vector = new SparseVector(key.toString(),
https://issues.apache.org/jira/secure/attachment/12429846/DictionaryVectorizer.patch
Reduce = PartialVectorGenerator Class
[
https://issues.apache.org/jira/browse/MAHOUT-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil updated MAHOUT-237:
--
Attachment: DictionaryVectorizer.patch
Some tidying up. Still the large output bug remains
[
https://issues.apache.org/jira/browse/MAHOUT-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798489#action_12798489
]
Grant Ingersoll commented on MAHOUT-85:
---
Why is PerceptronTrainingMapper empty? Are
Any clue why this is happening? I am running it over a small sample. I will
try and pin point the issue
On Sun, Jan 10, 2010 at 5:30 PM, Robin Anil robin.a...@gmail.com wrote:
https://issues.apache.org/jira/secure/attachment/12429846/DictionaryVectorizer.patch
Reduce =
Lot of zeros being printed in the Json string. Is that normal for an
infinite cardinality vector?
http://pastebin.com/m6ff5f0ef
Same is true if I type cast to a Vector
On Sun, Jan 10, 2010 at 8:08 PM, Grant Ingersoll gsing...@apache.orgwrote:
Have you dumped out the file? What's in it?
On Jan 10, 2010, at 9:43 AM, Robin Anil wrote:
Lot of zeros being printed in the Json string. Is that normal for an
infinite cardinality vector?
It shouldn't print them if you are using a SparseVector, but my guess is there
is something odd going on here when writing it out such that it is
I've noticed the same thing when looking at SparseVectors contained
withinthe results of ClusterDumper -- I didn't explore very far into why,
but it seems that the json representation of the SparseVector doesn't use a
map but instead uses parallel arrays of certain sizes. I'm not certain how
the
[
https://issues.apache.org/jira/browse/MAHOUT-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798499#action_12798499
]
Benson Margulies commented on MAHOUT-239:
-
Sean, I don't see the deletes.
[
https://issues.apache.org/jira/browse/MAHOUT-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benson Margulies reopened MAHOUT-239:
-
Assignee: Sean Owen (was: Benson Margulies)
These files still need to be deleted:
[
https://issues.apache.org/jira/browse/MAHOUT-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798503#action_12798503
]
Sean Owen commented on MAHOUT-239:
--
I don't see these files and see no diff from svn
[
https://issues.apache.org/jira/browse/MAHOUT-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benson Margulies updated MAHOUT-239:
Attachment: MAHOUT-239.diff
The previous one wasn't all there.
Complete set of open hash
[
https://issues.apache.org/jira/browse/MAHOUT-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798538#action_12798538
]
Benson Margulies commented on MAHOUT-239:
-
When I want back to my tree today, to my
The hash map classes define 'pairsSortedByValue'. In the case of a map
that delivers objects, this requires the value type to implement
comparable, which, of course, not all classes will. It seems insane to
only support these maps (e.g. OpenIntObjectHashMapT) for 'T extends
Comparable'. So I plan
I weakly vote for chucking it out.
On Sun, Jan 10, 2010 at 8:17 PM, Benson Margulies bimargul...@gmail.com wrote:
Colt brought us 'ObjectArrayList'. You might ask, what advantage does
it have over ArrayListT?
Agree with this as well.
On Sun, Jan 10, 2010 at 8:22 PM, Benson Margulies bimargul...@gmail.com wrote:
The hash map classes define 'pairsSortedByValue'. In the case of a map
that delivers objects, this requires the value type to implement
comparable, which, of course, not all classes will. It
[
https://issues.apache.org/jira/browse/MAHOUT-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798540#action_12798540
]
Sean Owen commented on MAHOUT-239:
--
Committed, but the deletes still appear to delete
[
https://issues.apache.org/jira/browse/MAHOUT-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved MAHOUT-239.
--
Resolution: Fixed
Assignee: Benson Margulies (was: Sean Owen)
Complete set of open hash maps
On Sun, Jan 10, 2010 at 6:43 AM, Robin Anil robin.a...@gmail.com wrote:
Lot of zeros being printed in the Json string. Is that normal for an
infinite cardinality vector?
http://pastebin.com/m6ff5f0ef
Same is true if I type cast to a Vector
Where did this JSON output come from, Robin? That
I take it that the point of this code is to allow filling an ArrayList with
null values efficiently. This might sometimes be useful, I suppose.
It sounds like you are saying that the virtue of the ObjectArrayList is that
we own it and can make this resizing method efficient. I don't see any
Totally agree with this. IllegalArgumentException seems made for this.
On Sun, Jan 10, 2010 at 12:22 PM, Benson Margulies bimargul...@gmail.comwrote:
So I plan to make it throw if the type happens not to
implement comparable.
--
Ted Dunning, CTO
DeepDyve
The only major cases that I know that we have are matrices of small
integers. The most impressive case is where the sparse matrix can only
contain 1 for non-zero values since you don't have to store the value at
all, just the index. If the indexes are sorted, then delta-PFOR coding or
some such
I don't know if it helps, but I have a sparse vector file that is based off a
1.8 MB Lucene index and it takes up 143 kb. Earlier, I had a Lucene index that
was several megs (20+) and the vectors only took 1 mb.
Have you tried debugging? If I can finish up my chapter tonight, I will try to
Ted,
I had much the same thoughts while driving to the grocery store after
sending that message.
Death to the class, and I'll clean up the implementation of the users
of the resize business.
--benson
On Sun, Jan 10, 2010 at 4:38 PM, Ted Dunning ted.dunn...@gmail.com wrote:
I take it that the
p.s.
.sdrawkcab si od I gnihtyreve os cibaraA tuoba gnikniht neeb evah I.
On Sun, Jan 10, 2010 at 4:38 PM, Ted Dunning ted.dunn...@gmail.com wrote:
I take it that the point of this code is to allow filling an ArrayList with
null values efficiently. This might sometimes be useful, I suppose.
Colt code is inconsistent in dealing with the following case:
iIntSomethingHashMap.keyOf(someValue)
Some code we got from them returns 0 if there is no such value, other
code returns MIN_VALUE. In floating-point land, it returns NAN.
Personally, I'd be inclined to nuke the entire API. It's
Nuke it if you don't use it.
On Sun, Jan 10, 2010 at 7:36 PM, Benson Margulies bimargul...@gmail.comwrote:
Personally, I'd be inclined to nuke the entire API. It's implemented
as the obvious iteration, and the caller can iterate for themselves
without creating a shoot-yourself opportunity for
LLR Collocation Identifier
--
Key: MAHOUT-242
URL: https://issues.apache.org/jira/browse/MAHOUT-242
Project: Mahout
Issue Type: New Feature
Affects Versions: 0.3
Reporter: Drew Farris
[
https://issues.apache.org/jira/browse/MAHOUT-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Drew Farris updated MAHOUT-242:
---
Attachment: mahout-colloc.tar.gz
LLR Collocation Identifier
--
[
https://issues.apache.org/jira/browse/MAHOUT-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798575#action_12798575
]
Robin Anil commented on MAHOUT-242:
---
* Try to stick to SequenceFileText,Text docid = for
34 matches
Mail list logo