I just did an exercise of implementing a faster sparse vector. In the
process, I uncovered a bunch of warts. I will be filing some Jiras and
patches as soon as I can get to them, but here is a preview:
a) most serialization code for vectors would write vectors with null names
out as if they had
[
https://issues.apache.org/jira/browse/MAHOUT-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Deneche A. Hakim updated MAHOUT-145:
Attachment: partial_August_31.patch
* Corrected some bugs in the new code when testing in
On Aug 31, 2009, at 2:55 AM, Ted Dunning wrote:
I just did an exercise of implementing a faster sparse vector. In the
process, I uncovered a bunch of warts. I will be filing some Jiras
and
patches as soon as I can get to them, but here is a preview:
a) most serialization code for
[
https://issues.apache.org/jira/browse/MAHOUT-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robin Anil updated MAHOUT-157:
--
Attachment: MAHOUT-157-August-31.patch
Performance Improvements in sequential version
Frequent
On Mon, Aug 31, 2009 at 8:29 AM, Sean Owen sro...@gmail.com wrote:
And I think it is possible to do away with the two-level data structure
organized by row, then column. I think. Again thinking of efficient ways to
access by row without perhaps the overhead of storing by row.
If this is for
Robin Anil wrote:
Please suggest some good performance profilers for Java.
There is eclipse Test and Performance Platform. Which for some reason is
screwed up with eclipse galileo
Then there is JProbe which has great reviews but its not free
Yourkit has also great reviews, there is an option
Not really. With off-line use, you can amortize the cost of sorting across
all items. This allows very tight encodings in some cases. In on-line
(real-time actually) use you have the problem that you can't amortize
anything because you have a worst case constraint.
Essentially, this is the
Need integer compression routines
-
Key: MAHOUT-168
URL: https://issues.apache.org/jira/browse/MAHOUT-168
Project: Mahout
Issue Type: Improvement
Components: Matrix
Reporter: Ted Dunning
Ignoring the diff for the moment, you will probably win with some coding for
the actual counts. My vision would be a row-coded complex matrix. Each row
would have a list of id's, a list of counts, and a list of diffs.
If you have your ids coded so that the most commonly rated items come first,
I use JProfiler 5 just because I am used to it and ended up buying a
license. Works well. I imagine free solutions work fine too.
On Aug 31, 2009 4:39 PM, Robin Anil robin.a...@gmail.com wrote:
Please suggest some good performance profilers for Java.
There is eclipse Test and Performance
YourKit has an open source license option that asks you to mail them
and tell them what project you work on. I've used both YourKit and
JProfiler and both are quite good. I've also used the NetBeans one,
which can now plugin into IntelliJ too, but I don't think it is as
good as YourKit.
[
https://issues.apache.org/jira/browse/MAHOUT-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749703#action_12749703
]
Grant Ingersoll commented on MAHOUT-168:
Can we leverage some of Lucene's
[
https://issues.apache.org/jira/browse/MAHOUT-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Dunning reassigned MAHOUT-168:
--
Assignee: Ted Dunning
Need integer compression routines
-
[
https://issues.apache.org/jira/browse/MAHOUT-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Dunning updated MAHOUT-168:
---
Status: Patch Available (was: Open)
First version. 100% test coverage except for BitInputStream at
[
https://issues.apache.org/jira/browse/MAHOUT-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749721#action_12749721
]
Ted Dunning commented on MAHOUT-168:
I considered that, but it seemed easier to write
15 matches
Mail list logo