Re: [VOTE] Release 14.1, RC7

2020-09-26 Thread Jake Mannix
Howdy all. Andrew asked me to take a little time and verify the release so we could get another PMC +1, so I tried to dust off my Mahout skills and help out... but frankly, the "getting started" from a binary distribution docs are pretty hard for _me_ to follow. I start on the main page and

Re: Proposal: scala DSL module for Mahout linear algebra.

2013-07-27 Thread Jake Mannix
On Fri, Jul 26, 2013 at 11:56 PM, Nick Pentreath nick.pentre...@gmail.comwrote: Thanks for the update on that PR I will definitely take a look. I wonder if they will run into the exact same Colt issues as mahout did?! Yeah, that's pretty strange, Colt is totally abandoned, and had lots of

Re: Proposal: scala DSL module for Mahout linear algebra.

2013-07-27 Thread Jake Mannix
I think my main concern is one of readability and hidden information: I really _don't_ like having to know _anything_ about associativity rules, and I'm not sure that catering to R users (*or* matlab users) is what we want to do. Maybe I'm thinking in a different direction with my scala

Re: Proposal: scala DSL module for Mahout linear algebra.

2013-07-27 Thread Jake Mannix
*by scalars*, but Hadamard products on matrices? I guess it _happens_, but I'm not sure I've ever done it, or if I have, it's pretty darn rare. On Sat, Jul 27, 2013 at 8:00 AM, Jake Mannix jake.man...@gmail.com wrote: I think my main concern is one of readability and hidden information: I

Re: Proposal: scala DSL module for Mahout linear algebra.

2013-07-27 Thread Jake Mannix
On Sat, Jul 27, 2013 at 1:53 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Can you show me some examples of where I'd *want* to do the wrong thing from an associativity standpoint? 5 - x where x is a vector, is kinda weird. But maybe you're subtracting off a mean or something, but then

Re: Proposal: scala DSL module for Mahout linear algebra.

2013-07-26 Thread Jake Mannix
Woohoo! Awesome, I've forked you, and I'll start digging in soon. At a high level, this looks great. Not so sure about so many operators - I don't know that we really need to have such a weighty syntax (a %*% b), java devs are going to be much more familiar with simply doing a.times(b), and I

Re: Proposal: scala DSL module for Mahout linear algebra.

2013-07-26 Thread Jake Mannix
On Fri, Jul 26, 2013 at 5:07 AM, Ted Dunning ted.dunn...@gmail.com wrote: This sounds great in principle. I haven't seen any details yet (haven't had time to look). Is there a strong reason to go with the R syntax for multiplication instead of the matlab convention that a*b means

Re: Proposal: scala DSL module for Mahout linear algebra.

2013-07-26 Thread Jake Mannix
I'm on your branch (dev-0.9.x-scala) but only doing a mvn install inside of the new module - maybe I need to do it from the top level? On Fri, Jul 26, 2013 at 7:23 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Jul 26, 2013 12:57 AM, Jake Mannix jake.man...@gmail.com wrote: Woohoo

Re: Proposal: scala DSL module for Mahout linear algebra.

2013-07-26 Thread Jake Mannix
- in mahout.math.MatrixOpsTest Running mahout.math.VectorOpsTest Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 sec - in mahout.math.VectorOpsTest Results : Tests run: 0, Failures: 0, Errors: 0, Skipped: 0 On Fri, Jul 26, 2013 at 8:35 AM, Jake Mannix jake.man...@gmail.com wrote: I'm

Re: Proposal: scala DSL module for Mahout linear algebra.

2013-07-26 Thread Jake Mannix
pushed on your branch to github? On Fri, Jul 26, 2013 at 12:16 PM, Dmitriy Lyubimov dlie...@gmail.comwrote: On Fri, Jul 26, 2013 at 8:40 AM, Jake Mannix jake.man...@gmail.com wrote: Yep, that fixed it. Are there any real tests

Re: Proposal: scala DSL module for Mahout linear algebra.

2013-07-26 Thread Jake Mannix
awesome, working now, test results popping up! On Fri, Jul 26, 2013 at 12:47 PM, Dmitriy Lyubimov dlie...@gmail.comwrote: yes On Fri, Jul 26, 2013 at 12:39 PM, Jake Mannix jake.man...@gmail.com wrote: pushed on your branch to github? On Fri, Jul 26, 2013 at 12:16 PM, Dmitriy

Re: [VOTE] Release Mahout 0.8

2013-07-19 Thread Jake Mannix
+1 from me, I used the jars to run some LDA (on a couple hundred million documents) on the work cluster (1.0.something small), and it worked fine. Other clustering example (with reuters) also worked as expected. On Thu, Jul 18, 2013 at 11:27 AM, Suneel Marthi suneel_mar...@yahoo.comwrote: +1

Re: Mahout release process

2013-07-10 Thread Jake Mannix
So quick question: is an intentional side-effect of the current release process that when we build on trunk now, we build artifacts named e.g. mahout-examples-0.9-SNAPSHOT-job.jar ? On Wed, Jul 10, 2013 at 2:33 AM, Sean Owen sro...@gmail.com wrote: Yes you can do all of this in a branch, which

Re: Mahout release process

2013-07-10 Thread Jake Mannix
at 10:54 AM, Jake Mannix jake.man...@gmail.com wrote: So quick question: is an intentional side-effect of the current release process that when we build on trunk now, we build artifacts named e.g. mahout-examples-0.9-SNAPSHOT-job.jar ? On Wed, Jul 10, 2013 at 2:33 AM, Sean Owen sro

--libjars deployment?

2013-07-05 Thread Jake Mannix
I forget, I know our default deployment is via the shaded monojar, but do we also have an option somewhere to allow running with --libjars instead? Much more rsync-friendly for rapid prototyping (esp. when on slow remote connections). -- -jake

Re: Mahout vectors/matrices/solvers on spark

2013-07-05 Thread Jake Mannix
On Fri, Jul 5, 2013 at 1:15 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: For anyone good at scala DSLs, the following is the puzzle i can't seem to figure at the moment. I mentioned before that I implemented assignment notations to a row or a block, e.g. for a row vector : A(5,::) :=

Re: --libjars deployment?

2013-07-05 Thread Jake Mannix
into why. It just seems unreasonably efficient to be otherwise. On Fri, Jul 5, 2013 at 1:23 AM, Jake Mannix jake.man...@gmail.com wrote: I forget, I know our default deployment is via the shaded monojar, but do we also have an option somewhere to allow running with --libjars instead? Much

Re: --libjars deployment?

2013-07-05 Thread Jake Mannix
: On Fri, Jul 5, 2013 at 7:19 AM, Jake Mannix jake.man...@gmail.com wrote: But also: Monster Jars Considered Harmful, so I should dredge up a deploy flag or something which allows us to run seamlessly with small jar + libjars instead (so people [incl. me] can tack on their own jars

Re: Code Freeze for 0.8

2013-07-05 Thread Jake Mannix
+1 On Fri, Jul 5, 2013 at 8:47 AM, Ted Dunning ted.dunn...@gmail.com wrote: +1 On Fri, Jul 5, 2013 at 7:43 AM, Suneel Marthi suneel_mar...@yahoo.com wrote: +1 From: Grant Ingersoll gsing...@apache.org To: dev@mahout.apache.org

Re: 0.8 progress

2013-06-28 Thread Jake Mannix
I can run LDA on Twitter's cluster, on both reuters and some real data, as well as LR/SGD. On Fri, Jun 28, 2013 at 11:51 AM, Grant Ingersoll gsing...@apache.orgwrote: We really should setup a VM that we can run a couple of nodes (perhaps at ASF?) on that we can share w/ everyone that makes it

[jira] [Commented] (MAHOUT-1268) Wrong output directory for CVB

2013-06-25 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692848#comment-13692848 ] Jake Mannix commented on MAHOUT-1268: - +1 Wrong output directory

Re: Mahout vectors/matrices/solvers on spark

2013-06-24 Thread Jake Mannix
if it is neither a row nor a colimn? How can i tell what exactly it is i am iterating over? On Jun 19, 2013 12:21 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Wed, Jun 19, 2013 at 5:29 AM, Jake Mannix jake.man...@gmail.com wrote: Question #2: which in-core solvers

[jira] [Commented] (MAHOUT-1268) Wrong output directory for CVB

2013-06-24 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692745#comment-13692745 ] Jake Mannix commented on MAHOUT-1268: - has this been tested with cluster_reuters.sh

Re: Vectors with 64bit indices?

2013-06-19 Thread Jake Mannix
long keys are super useful for rows in a matrix (ids for documents), and basically free in terms of memory (only one per document), but then for symmetry we really do need them in the columns (keying on e.g. termId), which is a not-insubstantial cost, but possibly worth it. Our vectors would be

Re: Vectors with 64bit indices?

2013-06-19 Thread Jake Mannix
not. On Wed, Jun 19, 2013 at 9:22 PM, Jake Mannix jake.man...@gmail.com wrote: long keys are super useful for rows in a matrix (ids for documents), and basically free in terms of memory (only one per document), but then for symmetry we really do need them in the columns (keying on e.g. termId

Re: Mahout vectors/matrices/solvers on spark

2013-06-18 Thread Jake Mannix
On Tue, Jun 18, 2013 at 6:14 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Hello, so i finally got around to actually do it. I want to get Mahout sparse vectors and matrices (DRMs) and rebuild some solvers using spark and Bagel /scala. I also want to use in-core solvers that run directly

[jira] [Commented] (MAHOUT-1266) Two minor problems in DistributedRowMatrix using MatrixMultiplication

2013-06-18 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687584#comment-13687584 ] Jake Mannix commented on MAHOUT-1266: - As mentioned in the javadocs for the method

Re: (Bi-)Weekly/Monthly Dev Sessions

2013-06-12 Thread Jake Mannix
Awesome idea. Biweekly is great. I'm normally PST, but I'll be working from UTC+1:00 from June 22-Aug 29, so I'm listing my availability for the summer given the french timezone. On Wed, Jun 12, 2013 at 6:23 AM, Grant Ingersoll gsing...@apache.orgwrote: On Jun 12, 2013, at 8:41 AM, Shannon

Re: (Bi-)Weekly/Monthly Dev Sessions

2013-06-12 Thread Jake Mannix
Wow, a lot of Seattleites, I should organize a Mahout MeetUp / Hackathon when I get back from europe at the end of the summer! On Wed, Jun 12, 2013 at 10:44 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: Bi-weekly is good for me; I'm in Seattle and just filled out the poll. Great

[jira] [Commented] (MAHOUT-1147) CVB Bug in CVB0Driver causes doc/topic distributions to be trained on random matrix

2013-06-10 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679777#comment-13679777 ] Jake Mannix commented on MAHOUT-1147: - So I'm running cluster-reuters.sh

[jira] [Commented] (MAHOUT-1147) CVB Bug in CVB0Driver causes doc/topic distributions to be trained on random matrix

2013-06-10 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679829#comment-13679829 ] Jake Mannix commented on MAHOUT-1147: - Totally fresh checkout, HADOOP_HOME is set (I

[jira] [Commented] (MAHOUT-1147) CVB Bug in CVB0Driver causes doc/topic distributions to be trained on random matrix

2013-06-10 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679843#comment-13679843 ] Jake Mannix commented on MAHOUT-1147: - Hmmm: 13/06/10 12:58:44 INFO cvb.CVB0Driver

Re: [jira] [Created] (MAHOUT-1147) CVB Bug in CVB0Driver causes doc/topic distributions to be trained on random matrix

2013-06-10 Thread Jake Mannix
Assignee: Jake Mannix Labels: bug, cvb, fix, suggestion Fix For: 0.8 Attachments: MAHOUT-1147.patch, MAHOUT-1147.patch Original Estimate: 24h Remaining Estimate: 24h Problem: When training doc/topic model no paths for the term/topic model

[jira] [Commented] (MAHOUT-1147) CVB Bug in CVB0Driver causes doc/topic distributions to be trained on random matrix

2013-06-10 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680218#comment-13680218 ] Jake Mannix commented on MAHOUT-1147: - So it looks like I've got the bits you mention

[jira] [Updated] (MAHOUT-1147) CVB Bug in CVB0Driver causes doc/topic distributions to be trained on random matrix

2013-06-10 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated MAHOUT-1147: Resolution: Fixed Status: Resolved (was: Patch Available) Committed revision 1491694

[jira] [Commented] (MAHOUT-1147) CVB Bug in CVB0Driver causes doc/topic distributions to be trained on random matrix

2013-06-10 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680230#comment-13680230 ] Jake Mannix commented on MAHOUT-1147: - Ok, this bug is totally reproducible, but also

Re: Suggested 0.8 Code Freeze Date

2013-06-03 Thread Jake Mannix
+1 Although does anyone else want to take a crack at the release, so that more of us get some experience with that? On Mon, Jun 3, 2013 at 2:14 AM, Dan Filimon dangeorge.fili...@gmail.comwrote: +1 On Jun 3, 2013, at 0:26, Grant Ingersoll gsing...@apache.org wrote: I'd like to suggest a

[jira] [Commented] (MAHOUT-1147) CVB Bug in CVB0Driver causes doc/topic distributions to be trained on random matrix

2013-06-02 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672604#comment-13672604 ] Jake Mannix commented on MAHOUT-1147: - Excellent, I'll look this over later tonight

[jira] [Commented] (MAHOUT-874) Extract Writables into a separate module to allow smaller dependencies

2013-06-01 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672149#comment-13672149 ] Jake Mannix commented on MAHOUT-874: So marking hadoop as provided is nice, a smaller

[jira] [Commented] (MAHOUT-1236) Need a cleaned up serialized format for Vectors to handle names and all other kinds of things

2013-06-01 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672188#comment-13672188 ] Jake Mannix commented on MAHOUT-1236: - Why protobufs? Why not thrift or avro? Maybe

[jira] [Commented] (MAHOUT-1236) Need a cleaned up serialized format for Vectors to handle names and all other kinds of things

2013-06-01 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672192#comment-13672192 ] Jake Mannix commented on MAHOUT-1236: - Thrift leaves off optional fields pretty well

[jira] [Resolved] (MAHOUT-684) Topics regularization for LDA

2013-06-01 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix resolved MAHOUT-684. Resolution: Won't Fix This patch applies to the original LDA we had in Mahout 0.5 or so

[jira] [Commented] (MAHOUT-684) Topics regularization for LDA

2013-06-01 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672252#comment-13672252 ] Jake Mannix commented on MAHOUT-684: This code is based on the old LDA impl we had

[jira] [Commented] (MAHOUT-1225) Sets and maps incorrectly clear() their state arrays (potential endless loops)

2013-06-01 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672318#comment-13672318 ] Jake Mannix commented on MAHOUT-1225: - What exactly did you end up submitting Robin

[jira] [Commented] (MAHOUT-1026) Add LDA (CVB implementation) to the cluster_reuters.sh example script

2013-05-31 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671461#comment-13671461 ] Jake Mannix commented on MAHOUT-1026: - Hey Suneel - have you tested it, does it yield

[jira] [Commented] (MAHOUT-1026) Add LDA (CVB implementation) to the cluster_reuters.sh example script

2013-05-31 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671462#comment-13671462 ] Jake Mannix commented on MAHOUT-1026: - also, I think we can call it lda in the option

[jira] [Commented] (MAHOUT-1227) Vector.iterateNonZero() is super-clumsy to use: add IterableElement allNonZero()

2013-05-24 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13666421#comment-13666421 ] Jake Mannix commented on MAHOUT-1227: - Committing this in about an hour unless I hear

[jira] [Resolved] (MAHOUT-1227) Vector.iterateNonZero() is super-clumsy to use: add IterableElement allNonZero()

2013-05-24 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix resolved MAHOUT-1227. - Resolution: Fixed Committed revision 1486122. Vector.iterateNonZero

Re: Comments on MAHOUT-1227 ?

2013-05-24 Thread Jake Mannix
This has been submitted. I suggest everyone who's got changes checked out update sometime soon, to minimize merge conflicts. On Fri, May 24, 2013 at 2:17 AM, Shannon Quinn squ...@gatech.edu wrote: LGTM! On 5/23/13 10:06 PM, Jake Mannix wrote: It's done, patch passes tests

[jira] [Commented] (MAHOUT-1225) Sets and maps incorrectly clear() their state arrays (potential endless loops)

2013-05-23 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665174#comment-13665174 ] Jake Mannix commented on MAHOUT-1225: - Wait, was this not _exactly_ the bug in https

[jira] [Commented] (MAHOUT-1225) Sets and maps incorrectly clear() their state arrays (potential endless loops)

2013-05-23 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665188#comment-13665188 ] Jake Mannix commented on MAHOUT-1225: - Ah yes, we merged collections back into math

[jira] [Commented] (MAHOUT-1225) Sets and maps incorrectly clear() their state arrays (potential endless loops)

2013-05-23 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665335#comment-13665335 ] Jake Mannix commented on MAHOUT-1225: - To build from trunk (which is what we all do

[jira] [Commented] (MAHOUT-1225) Sets and maps incorrectly clear() their state arrays (potential endless loops)

2013-05-23 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665361#comment-13665361 ] Jake Mannix commented on MAHOUT-1225: - I'm not sure everyone's hadoop cluster

[jira] [Updated] (MAHOUT-1227) Vector.iterateNonZero() is super-clumsy to use: add IterableElement allNonZero()

2013-05-23 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated MAHOUT-1227: Description: Currently, our codebase is littered with the following: {code} IteratorElement

[jira] [Created] (MAHOUT-1227) Vector.iterateNonZero() is super-clumsy to use: add IterableElement allNonZero()

2013-05-23 Thread Jake Mannix (JIRA)
Jake Mannix created MAHOUT-1227: --- Summary: Vector.iterateNonZero() is super-clumsy to use: add IterableElement allNonZero() Key: MAHOUT-1227 URL: https://issues.apache.org/jira/browse/MAHOUT-1227

[jira] [Updated] (MAHOUT-1227) Vector.iterateNonZero() is super-clumsy to use: add IterableElement allNonZero()

2013-05-23 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated MAHOUT-1227: Attachment: MAHOUT-1227.diff initial, non-invasive additional methods

[jira] [Commented] (MAHOUT-1227) Vector.iterateNonZero() is super-clumsy to use: add IterableElement allNonZero()

2013-05-23 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665667#comment-13665667 ] Jake Mannix commented on MAHOUT-1227: - You like: {code} for (Element e : vector

[jira] [Commented] (MAHOUT-1227) Vector.iterateNonZero() is super-clumsy to use: add IterableElement allNonZero()

2013-05-23 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665671#comment-13665671 ] Jake Mannix commented on MAHOUT-1227: - because if so, we currently allow

[jira] [Commented] (MAHOUT-1227) Vector.iterateNonZero() is super-clumsy to use: add IterableElement allNonZero()

2013-05-23 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665680#comment-13665680 ] Jake Mannix commented on MAHOUT-1227: - in fact, as I dig through all the cases, I

[jira] [Commented] (MAHOUT-1227) Vector.iterateNonZero() is super-clumsy to use: add IterableElement allNonZero()

2013-05-23 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665684#comment-13665684 ] Jake Mannix commented on MAHOUT-1227: - Another case I'm not sure about is in your

[jira] [Commented] (MAHOUT-1227) Vector.iterateNonZero() is super-clumsy to use: add IterableElement allNonZero()

2013-05-23 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665712#comment-13665712 ] Jake Mannix commented on MAHOUT-1227: - ah good to know. It'll get fixed

[jira] [Commented] (MAHOUT-1227) Vector.iterateNonZero() is super-clumsy to use: add IterableElement allNonZero()

2013-05-23 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665811#comment-13665811 ] Jake Mannix commented on MAHOUT-1227: - egads, we Matrix (which extends VectorIterable

[jira] [Commented] (MAHOUT-1227) Vector.iterateNonZero() is super-clumsy to use: add IterableElement allNonZero()

2013-05-23 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665945#comment-13665945 ] Jake Mannix commented on MAHOUT-1227: - Tests pass for diff at https

Re: ASF Board Report for May 2013 is now due

2013-05-06 Thread Jake Mannix
Hey Mahout-devs, Looks like it's time for a board report again, and since I missed last month, we've got two months to report on, so if you've got things you want to add to the report (talks, important features of development we've completed recently, etc), feel free to edit the wiki (Isabel

[jira] [Resolved] (MAHOUT-1197) AbstractVector#cross is only appropriately efficient for dense vectors

2013-04-26 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix resolved MAHOUT-1197. - Resolution: Fixed AbstractVector#cross is only appropriately efficient for dense vectors

[jira] [Commented] (MAHOUT-1047) CVB hangs after completion

2013-04-24 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13640832#comment-13640832 ] Jake Mannix commented on MAHOUT-1047: - So in general, I think this is the right

[jira] [Created] (MAHOUT-1197) AbstractVector#cross is only appropriately efficient for dense vectors

2013-04-24 Thread Jake Mannix (JIRA)
Jake Mannix created MAHOUT-1197: --- Summary: AbstractVector#cross is only appropriately efficient for dense vectors Key: MAHOUT-1197 URL: https://issues.apache.org/jira/browse/MAHOUT-1197 Project: Mahout

[jira] [Commented] (MAHOUT-1047) CVB hangs after completion

2013-04-24 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641097#comment-13641097 ] Jake Mannix commented on MAHOUT-1047: - Ah yes. So the critical new lines are: [code

[jira] [Commented] (MAHOUT-1047) CVB hangs after completion

2013-04-24 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641140#comment-13641140 ] Jake Mannix commented on MAHOUT-1047: - Well, the iterations are only when

[jira] [Commented] (MAHOUT-1197) AbstractVector#cross is only appropriately efficient for dense vectors

2013-04-24 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641313#comment-13641313 ] Jake Mannix commented on MAHOUT-1197: - The big issue is the loop over row from 0

[jira] [Updated] (MAHOUT-1197) AbstractVector#cross is only appropriately efficient for dense vectors

2013-04-24 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated MAHOUT-1197: Attachment: MAHOUT-1197.diff simple fix which should work for both dense and sparse subclasses

[jira] [Commented] (MAHOUT-1160) Add performant iterators to primitive collections

2013-04-22 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638193#comment-13638193 ] Jake Mannix commented on MAHOUT-1160: - Yeah, looks like this is closeable, thanks

[jira] [Closed] (MAHOUT-1160) Add performant iterators to primitive collections

2013-04-22 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix closed MAHOUT-1160. --- fixed with MAHOUT-1190 Add performant iterators to primitive collections

Re: Vector and Matrices - The Next Gen

2013-04-19 Thread Jake Mannix
They're very unsafe, and it gets really complicated to make them both highly performant and thread safe, and like Ted says: just synchronize at a higher level. You're never dealing with one Vector and want to max out all 8 cores on that one vector, you're looking at millions of vectors - give

Re: Odd vector iteration behavior

2013-04-15 Thread Jake Mannix
It should be pretty easy to check via a new unit test if this iteration / changing values interleaved operation works. It's hard to tell if indexOfInsertion() is implemented completely safely by inspection. On Mon, Apr 15, 2013 at 10:50 AM, Robin Anil robin.a...@gmail.com wrote: On second

Re: Odd vector iteration behavior

2013-04-15 Thread Jake Mannix
Ah, this was the one corner case I was worried about - we do special-case setting to 0, as meaning remove from the hashmap, yes. What's the TL;DR of what you did to work around this? Should we allow this? Even if it's through the Vector.Element instance, should it be ok? If so, how to handle?

Re: Odd vector iteration behavior

2013-04-15 Thread Jake Mannix
of space the vector was taking up. But I can see the argument that it really should return what it says it returns, if that is relied upon. Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. On Mon, Apr 15, 2013 at 1:50 PM, Jake Mannix jake.man...@gmail.com wrote: Ah

Re: Odd vector iteration behavior

2013-04-15 Thread Jake Mannix
if the element is nonzero. Killing iteration would be really really bad, from a useability standpoint. In fact, I've been moving in the other direction: https://reviews.apache.org/r/9867/ adds iterators to the basic collection interface! On Mon, Apr 15, 2013 at 2:08 PM, Jake Mannix

[jira] [Commented] (MAHOUT-1191) Cleanup Vector Benchmarks make it less variable

2013-04-15 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632343#comment-13632343 ] Jake Mannix commented on MAHOUT-1191: - as you can see... It looks like SASV is still

[jira] [Commented] (MAHOUT-1191) Cleanup Vector Benchmarks make it less variable

2013-04-15 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632356#comment-13632356 ] Jake Mannix commented on MAHOUT-1191: - Ah, yes, nevermind, comparing 2nd to 3rd

[jira] [Commented] (MAHOUT-1191) Cleanup Vector Benchmarks make it less variable

2013-04-15 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632363#comment-13632363 ] Jake Mannix commented on MAHOUT-1191: - Ok, so I'm trying to wrap my head around _how_

[jira] [Commented] (MAHOUT-1047) CVB hangs after completion

2013-04-13 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631091#comment-13631091 ] Jake Mannix commented on MAHOUT-1047: - So this should be in either ModelTrainer

Re: Odd vector iteration behavior

2013-04-12 Thread Jake Mannix
This looks very wrong. The iterators for SASV extend guava's AbstractIterator, but they do reuse the NonDefaultElement instance internally. It *looks* like we're correctly satisfying the AbstractIterator#computeNext() contract, but we must not be if we're mutating on multiple hasNext() calls...

Re: Odd vector iteration behavior

2013-04-12 Thread Jake Mannix
I think requiring the caller to know to copy/clone the element to be allowed to call hasNext() multiple times is extremely non-intuitive. Having the caller know that it's dangerous / not allowed to hang onto an element without copying while continuing to iterate (e.g. when looking for the largest

[jira] [Commented] (MAHOUT-1190) SequentialAccessSparseVector function assignment is very slow

2013-04-11 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629031#comment-13629031 ] Jake Mannix commented on MAHOUT-1190: - Sequential access is a slow format

Re: Horrible performance with SequentialAccessSparseVector

2013-04-10 Thread Jake Mannix
SequentialAccessSparseVector should really *never* be used in a mutating way. You should use RandomAccessSparseVector if you're going to mutate, and then *freeze* the results in a SASV when you're done mutating it and you expect to be using it for only dot() and other read-only operations which

Re: Horrible performance with SequentialAccessSparseVector

2013-04-10 Thread Jake Mannix
In the existing code, assign() comes from AbstractVector and if the function is not PLUS or PLUS_ABS, it does this: for (int i = 0; i size; i++) { setQuick(i, function.apply(getQuick(i), other.getQuick(i))); } Yeah, this has been a nasty nasty fact forever, and I should read your patch

[jira] [Created] (MAHOUT-1186) OpenKeyTypeObjectHashMap#clear() has been broken forever.

2013-04-05 Thread Jake Mannix (JIRA)
Jake Mannix created MAHOUT-1186: --- Summary: OpenKeyTypeObjectHashMap#clear() has been broken forever. Key: MAHOUT-1186 URL: https://issues.apache.org/jira/browse/MAHOUT-1186 Project: Mahout

[jira] [Updated] (MAHOUT-1186) OpenKeyTypeObjectHashMap#clear() has been broken forever.

2013-04-05 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated MAHOUT-1186: Attachment: MAHOUT-1186.diff Unit test in this patch *fails* on trunk. Passes with the fix

[jira] [Commented] (MAHOUT-1186) OpenKeyTypeObjectHashMap#clear() has been broken forever.

2013-04-05 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13623885#comment-13623885 ] Jake Mannix commented on MAHOUT-1186: - Thanks for catching this, Andy. Slipped

[jira] [Resolved] (MAHOUT-1186) OpenKeyTypeObjectHashMap#clear() has been broken forever.

2013-04-05 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix resolved MAHOUT-1186. - Resolution: Fixed OpenKeyTypeObjectHashMap#clear() has been broken forever

Re: Cloudera ML: New Open Source Libraries and Tools for Data Scientists

2013-03-29 Thread Jake Mannix
Josh posted to the Crunch list about this: the idea was to intentionally *not* make Crunch depend on Mahout, nor Mahout depend on Crunch, but have a new project which depended on both. On Fri, Mar 29, 2013 at 5:49 AM, Ted Dunning ted.dunn...@gmail.com wrote: Pity that they don't bother to

[REPORT] Apache Mahout

2013-03-16 Thread Jake Mannix
=== Apache Mahout Status Report: March 2013 === Apache Mahout provides implementations of machine learning algorithms (collaborative filtering, clustering, classification, and more) for large-scale data, mostly via Hadoop-based implementations. Issues: Sean Owen wishes to leave the Mahout PMC

Review Request: Basic Iterable for OpenKeyTypeValueTypeHashMap

2013-03-12 Thread Jake Mannix
/9867/diff/ Testing --- mvn test in math module Thanks, Jake Mannix

Re: Review Request: Basic Iterable for OpenKeyTypeValueTypeHashMap

2013-03-12 Thread Jake Mannix
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9867/#review17714 --- - Jake Mannix On March 12, 2013, 4:40 a.m., Jake Mannix wrote

[jira] [Commented] (MAHOUT-1160) Add performant iterators to primitive collections

2013-03-12 Thread Jake Mannix (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600215#comment-13600215 ] Jake Mannix commented on MAHOUT-1160: - RB: https://reviews.apache.org/r/9867

Re: Review Request: Basic Iterable for OpenKeyTypeValueTypeHashMap

2013-03-12 Thread Jake Mannix
1455393 trunk/math/src/test/java-templates/org/apache/mahout/math/map/OpenKeyTypeValueTypeHashMapTest.java.t 1455393 Diff: https://reviews.apache.org/r/9867/diff/ Testing --- mvn test in math module Thanks, Jake Mannix

Re: mahout collections updates

2013-03-12 Thread Jake Mannix
Why would you say fastutil more than hppc? Currently all we use in Mahout is lists and hashmaps, and we don't even currently have proper iteration over the latter, so we certainly don't depend on Collections compatibility... On Tue, Mar 12, 2013 at 12:03 PM, Dawid Weiss

Re: mahout collections updates

2013-03-12 Thread Jake Mannix
On Tue, Mar 12, 2013 at 12:52 PM, Dawid Weiss dawid.we...@cs.put.poznan.plwrote: Why would you say fastutil more than hppc? Oh, I like HPPC very much -- although I wrote it so I may not be completely objective here :) And seriously I recommended fastutil because Mahout is primarily

Re: mahout collections updates

2013-03-12 Thread Jake Mannix
But then where does it slow down? It just wraps a double[] On Tuesday, March 12, 2013, Sebastian Schelter wrote: I looked into DenseVector and it doesn't use any primitive collections, so ignore my last mail :) On 12.03.2013 22:16, Sebastian Schelter wrote: As a sidenote: I was kinda

  1   2   3   4   >