[jira] Updated: (MAHOUT-239) Complete set of open hash maps with primitive types as both key and value

2010-01-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-239: - Resolution: Fixed Assignee: Benson Margulies Status: Resolved (was: Patch Available)

[jira] Resolved: (MAHOUT-85) Perceptron/Winnow Trainer

2010-01-10 Thread Isabel Drost (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Isabel Drost resolved MAHOUT-85. Resolution: Fixed Finally committed. Perceptron/Winnow Trainer -

[jira] Created: (MAHOUT-240) Parallel version of Perceptron

2010-01-10 Thread Isabel Drost (JIRA)
Parallel version of Perceptron -- Key: MAHOUT-240 URL: https://issues.apache.org/jira/browse/MAHOUT-240 Project: Mahout Issue Type: Improvement Components: Classification Affects Versions: 0.3

[jira] Created: (MAHOUT-241) Example for perceptron

2010-01-10 Thread Isabel Drost (JIRA)
Example for perceptron -- Key: MAHOUT-241 URL: https://issues.apache.org/jira/browse/MAHOUT-241 Project: Mahout Issue Type: Improvement Components: Classification Affects Versions: 0.3 Reporter:

SparseVectors writing out a lot of data

2010-01-10 Thread Robin Anil
I have been testing out the DictionaryVectorizer on 20news dataset. Its writing out 2GB vector files for the 38MB dataset This is what i am doing. Tell me where I am going wrong First I create an infinite dimensional vector of size 10, SparseVector vector = new SparseVector(key.toString(),

Re: SparseVectors writing out a lot of data

2010-01-10 Thread Robin Anil
https://issues.apache.org/jira/secure/attachment/12429846/DictionaryVectorizer.patch Reduce = PartialVectorGenerator Class

[jira] Updated: (MAHOUT-237) Map/Reduce Implementation of Document Vectorizer

2010-01-10 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-237: -- Attachment: DictionaryVectorizer.patch Some tidying up. Still the large output bug remains

[jira] Commented: (MAHOUT-85) Perceptron/Winnow Trainer

2010-01-10 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798489#action_12798489 ] Grant Ingersoll commented on MAHOUT-85: --- Why is PerceptronTrainingMapper empty? Are

Re: SparseVectors writing out a lot of data

2010-01-10 Thread Robin Anil
Any clue why this is happening? I am running it over a small sample. I will try and pin point the issue On Sun, Jan 10, 2010 at 5:30 PM, Robin Anil robin.a...@gmail.com wrote: https://issues.apache.org/jira/secure/attachment/12429846/DictionaryVectorizer.patch Reduce =

Re: SparseVectors writing out a lot of data

2010-01-10 Thread Robin Anil
Lot of zeros being printed in the Json string. Is that normal for an infinite cardinality vector? http://pastebin.com/m6ff5f0ef Same is true if I type cast to a Vector On Sun, Jan 10, 2010 at 8:08 PM, Grant Ingersoll gsing...@apache.orgwrote: Have you dumped out the file? What's in it?

Re: SparseVectors writing out a lot of data

2010-01-10 Thread Grant Ingersoll
On Jan 10, 2010, at 9:43 AM, Robin Anil wrote: Lot of zeros being printed in the Json string. Is that normal for an infinite cardinality vector? It shouldn't print them if you are using a SparseVector, but my guess is there is something odd going on here when writing it out such that it is

Re: SparseVectors writing out a lot of data

2010-01-10 Thread Drew Farris
I've noticed the same thing when looking at SparseVectors contained withinthe results of ClusterDumper -- I didn't explore very far into why, but it seems that the json representation of the SparseVector doesn't use a map but instead uses parallel arrays of certain sizes. I'm not certain how the

[jira] Commented: (MAHOUT-239) Complete set of open hash maps with primitive types as both key and value

2010-01-10 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798499#action_12798499 ] Benson Margulies commented on MAHOUT-239: - Sean, I don't see the deletes.

[jira] Reopened: (MAHOUT-239) Complete set of open hash maps with primitive types as both key and value

2010-01-10 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Margulies reopened MAHOUT-239: - Assignee: Sean Owen (was: Benson Margulies) These files still need to be deleted:

[jira] Commented: (MAHOUT-239) Complete set of open hash maps with primitive types as both key and value

2010-01-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798503#action_12798503 ] Sean Owen commented on MAHOUT-239: -- I don't see these files and see no diff from svn

[jira] Updated: (MAHOUT-239) Complete set of open hash maps with primitive types as both key and value

2010-01-10 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Margulies updated MAHOUT-239: Attachment: MAHOUT-239.diff The previous one wasn't all there. Complete set of open hash

[jira] Commented: (MAHOUT-239) Complete set of open hash maps with primitive types as both key and value

2010-01-10 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798538#action_12798538 ] Benson Margulies commented on MAHOUT-239: - When I want back to my tree today, to my

[math] Another taste question about collection

2010-01-10 Thread Benson Margulies
The hash map classes define 'pairsSortedByValue'. In the case of a map that delivers objects, this requires the value type to implement comparable, which, of course, not all classes will. It seems insane to only support these maps (e.g. OpenIntObjectHashMapT) for 'T extends Comparable'. So I plan

Re: [math] Question of taste: 'ObjectArrayList'

2010-01-10 Thread Sean Owen
I weakly vote for chucking it out. On Sun, Jan 10, 2010 at 8:17 PM, Benson Margulies bimargul...@gmail.com wrote: Colt brought us 'ObjectArrayList'. You might ask, what advantage does it have over ArrayListT?

Re: [math] Another taste question about collection

2010-01-10 Thread Sean Owen
Agree with this as well. On Sun, Jan 10, 2010 at 8:22 PM, Benson Margulies bimargul...@gmail.com wrote: The hash map classes define 'pairsSortedByValue'. In the case of a map that delivers objects, this requires the value type to implement comparable, which, of course, not all classes will. It

[jira] Commented: (MAHOUT-239) Complete set of open hash maps with primitive types as both key and value

2010-01-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798540#action_12798540 ] Sean Owen commented on MAHOUT-239: -- Committed, but the deletes still appear to delete

[jira] Resolved: (MAHOUT-239) Complete set of open hash maps with primitive types as both key and value

2010-01-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-239. -- Resolution: Fixed Assignee: Benson Margulies (was: Sean Owen) Complete set of open hash maps

Re: SparseVectors writing out a lot of data

2010-01-10 Thread Jake Mannix
On Sun, Jan 10, 2010 at 6:43 AM, Robin Anil robin.a...@gmail.com wrote: Lot of zeros being printed in the Json string. Is that normal for an infinite cardinality vector? http://pastebin.com/m6ff5f0ef Same is true if I type cast to a Vector Where did this JSON output come from, Robin? That

Re: [math] Question of taste: 'ObjectArrayList'

2010-01-10 Thread Ted Dunning
I take it that the point of this code is to allow filling an ArrayList with null values efficiently. This might sometimes be useful, I suppose. It sounds like you are saying that the virtue of the ObjectArrayList is that we own it and can make this resizing method efficient. I don't see any

Re: [math] Another taste question about collection

2010-01-10 Thread Ted Dunning
Totally agree with this. IllegalArgumentException seems made for this. On Sun, Jan 10, 2010 at 12:22 PM, Benson Margulies bimargul...@gmail.comwrote: So I plan to make it throw if the type happens not to implement comparable. -- Ted Dunning, CTO DeepDyve

Re: SparseVectors writing out a lot of data

2010-01-10 Thread Ted Dunning
The only major cases that I know that we have are matrices of small integers. The most impressive case is where the sparse matrix can only contain 1 for non-zero values since you don't have to store the value at all, just the index. If the indexes are sorted, then delta-PFOR coding or some such

Re: SparseVectors writing out a lot of data

2010-01-10 Thread Grant Ingersoll
I don't know if it helps, but I have a sparse vector file that is based off a 1.8 MB Lucene index and it takes up 143 kb. Earlier, I had a Lucene index that was several megs (20+) and the vectors only took 1 mb. Have you tried debugging? If I can finish up my chapter tonight, I will try to

Re: [math] Question of taste: 'ObjectArrayList'

2010-01-10 Thread Benson Margulies
Ted, I had much the same thoughts while driving to the grocery store after sending that message. Death to the class, and I'll clean up the implementation of the users of the resize business. --benson On Sun, Jan 10, 2010 at 4:38 PM, Ted Dunning ted.dunn...@gmail.com wrote: I take it that the

Re: [math] Question of taste: 'ObjectArrayList'

2010-01-10 Thread Benson Margulies
p.s. .sdrawkcab si od I gnihtyreve os cibaraA tuoba gnikniht neeb evah I. On Sun, Jan 10, 2010 at 4:38 PM, Ted Dunning ted.dunn...@gmail.com wrote: I take it that the point of this code is to allow filling an ArrayList with null values efficiently.  This might sometimes be useful, I suppose.

[math] no-such-integer value

2010-01-10 Thread Benson Margulies
Colt code is inconsistent in dealing with the following case: iIntSomethingHashMap.keyOf(someValue) Some code we got from them returns 0 if there is no such value, other code returns MIN_VALUE. In floating-point land, it returns NAN. Personally, I'd be inclined to nuke the entire API. It's

Re: [math] no-such-integer value

2010-01-10 Thread Ted Dunning
Nuke it if you don't use it. On Sun, Jan 10, 2010 at 7:36 PM, Benson Margulies bimargul...@gmail.comwrote: Personally, I'd be inclined to nuke the entire API. It's implemented as the obvious iteration, and the caller can iterate for themselves without creating a shoot-yourself opportunity for

[jira] Created: (MAHOUT-242) LLR Collocation Identifier

2010-01-10 Thread Drew Farris (JIRA)
LLR Collocation Identifier -- Key: MAHOUT-242 URL: https://issues.apache.org/jira/browse/MAHOUT-242 Project: Mahout Issue Type: New Feature Affects Versions: 0.3 Reporter: Drew Farris

[jira] Updated: (MAHOUT-242) LLR Collocation Identifier

2010-01-10 Thread Drew Farris (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Drew Farris updated MAHOUT-242: --- Attachment: mahout-colloc.tar.gz LLR Collocation Identifier --

[jira] Commented: (MAHOUT-242) LLR Collocation Identifier

2010-01-10 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798575#action_12798575 ] Robin Anil commented on MAHOUT-242: --- * Try to stick to SequenceFileText,Text docid = for