issues found

2009-08-31 Thread Ted Dunning
I just did an exercise of implementing a faster sparse vector. In the process, I uncovered a bunch of warts. I will be filing some Jiras and patches as soon as I can get to them, but here is a preview: a) most serialization code for vectors would write vectors with null names out as if they had

[jira] Updated: (MAHOUT-145) PartialData mapreduce Random Forests

2009-08-31 Thread Deneche A. Hakim (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim updated MAHOUT-145: Attachment: partial_August_31.patch * Corrected some bugs in the new code when testing in

Re: issues found

2009-08-31 Thread Grant Ingersoll
On Aug 31, 2009, at 2:55 AM, Ted Dunning wrote: I just did an exercise of implementing a faster sparse vector. In the process, I uncovered a bunch of warts. I will be filing some Jiras and patches as soon as I can get to them, but here is a preview: a) most serialization code for

[jira] Updated: (MAHOUT-157) Frequent Pattern Mining using Parallel FP-Growth

2009-08-31 Thread Robin Anil (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robin Anil updated MAHOUT-157: -- Attachment: MAHOUT-157-August-31.patch Performance Improvements in sequential version Frequent

Re: Request for brainstorming: how to represent item-item diffs efficiently

2009-08-31 Thread Ted Dunning
On Mon, Aug 31, 2009 at 8:29 AM, Sean Owen sro...@gmail.com wrote: And I think it is possible to do away with the two-level data structure organized by row, then column. I think. Again thinking of efficient ways to access by row without perhaps the overhead of storing by row. If this is for

Re: About Java Code Performance Profiling

2009-08-31 Thread Thilo Goetz
Robin Anil wrote: Please suggest some good performance profilers for Java. There is eclipse Test and Performance Platform. Which for some reason is screwed up with eclipse galileo Then there is JProbe which has great reviews but its not free Yourkit has also great reviews, there is an option

Re: Request for brainstorming: how to represent item-item diffs efficiently

2009-08-31 Thread Ted Dunning
Not really. With off-line use, you can amortize the cost of sorting across all items. This allows very tight encodings in some cases. In on-line (real-time actually) use you have the problem that you can't amortize anything because you have a worst case constraint. Essentially, this is the

[jira] Created: (MAHOUT-168) Need integer compression routines

2009-08-31 Thread Ted Dunning (JIRA)
Need integer compression routines - Key: MAHOUT-168 URL: https://issues.apache.org/jira/browse/MAHOUT-168 Project: Mahout Issue Type: Improvement Components: Matrix Reporter: Ted Dunning

Re: Request for brainstorming: how to represent item-item diffs efficiently

2009-08-31 Thread Ted Dunning
Ignoring the diff for the moment, you will probably win with some coding for the actual counts. My vision would be a row-coded complex matrix. Each row would have a list of id's, a list of counts, and a list of diffs. If you have your ids coded so that the most commonly rated items come first,

Re: About Java Code Performance Profiling

2009-08-31 Thread Sean Owen
I use JProfiler 5 just because I am used to it and ended up buying a license. Works well. I imagine free solutions work fine too. On Aug 31, 2009 4:39 PM, Robin Anil robin.a...@gmail.com wrote: Please suggest some good performance profilers for Java. There is eclipse Test and Performance

Re: About Java Code Performance Profiling

2009-08-31 Thread Grant Ingersoll
YourKit has an open source license option that asks you to mail them and tell them what project you work on. I've used both YourKit and JProfiler and both are quite good. I've also used the NetBeans one, which can now plugin into IntelliJ too, but I don't think it is as good as YourKit.

[jira] Commented: (MAHOUT-168) Need integer compression routines

2009-08-31 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749703#action_12749703 ] Grant Ingersoll commented on MAHOUT-168: Can we leverage some of Lucene's

[jira] Assigned: (MAHOUT-168) Need integer compression routines

2009-08-31 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning reassigned MAHOUT-168: -- Assignee: Ted Dunning Need integer compression routines -

[jira] Updated: (MAHOUT-168) Need integer compression routines

2009-08-31 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning updated MAHOUT-168: --- Status: Patch Available (was: Open) First version. 100% test coverage except for BitInputStream at

[jira] Commented: (MAHOUT-168) Need integer compression routines

2009-08-31 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749721#action_12749721 ] Ted Dunning commented on MAHOUT-168: I considered that, but it seemed easier to write