Howdy all. Andrew asked me to take a little time and verify the release so
we could get another PMC +1, so I tried to dust off my Mahout skills and
help out... but frankly, the "getting started" from a binary distribution
docs are pretty hard for _me_ to follow.
I start on the main page and
On Fri, Jul 26, 2013 at 11:56 PM, Nick Pentreath
nick.pentre...@gmail.comwrote:
Thanks for the update on that PR I will definitely take a look.
I wonder if they will run into the exact same Colt issues as mahout did?!
Yeah, that's pretty strange, Colt is totally abandoned, and had lots of
I think my main concern is one of readability and hidden information: I
really _don't_ like having to know _anything_ about associativity rules,
and I'm not sure that catering to R users (*or* matlab users) is what we
want to do. Maybe I'm thinking in a different direction with my scala
*by scalars*, but Hadamard products on matrices? I guess it _happens_,
but I'm not sure I've ever done it, or if I have, it's pretty darn rare.
On Sat, Jul 27, 2013 at 8:00 AM, Jake Mannix jake.man...@gmail.com
wrote:
I think my main concern is one of readability and hidden information: I
On Sat, Jul 27, 2013 at 1:53 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:
Can you show me some examples of where I'd *want* to do the wrong thing
from an associativity standpoint? 5 - x where x is a vector, is kinda
weird.
But maybe you're subtracting off a mean or something, but then
Woohoo! Awesome, I've forked you, and I'll start digging in soon. At a
high level, this looks great. Not so sure about so many operators - I
don't know that we really need to have such a weighty syntax (a %*% b),
java devs are going to be much more familiar with simply doing a.times(b),
and I
On Fri, Jul 26, 2013 at 5:07 AM, Ted Dunning ted.dunn...@gmail.com wrote:
This sounds great in principle. I haven't seen any details yet (haven't
had time to look).
Is there a strong reason to go with the R syntax for multiplication instead
of the matlab convention that a*b means
I'm on your branch (dev-0.9.x-scala) but only doing a mvn install inside
of the new module - maybe I need to do it from the top level?
On Fri, Jul 26, 2013 at 7:23 AM, Dmitriy Lyubimov dlie...@gmail.com wrote:
On Jul 26, 2013 12:57 AM, Jake Mannix jake.man...@gmail.com wrote:
Woohoo
-
in mahout.math.MatrixOpsTest
Running mahout.math.VectorOpsTest
Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 sec -
in mahout.math.VectorOpsTest
Results :
Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
On Fri, Jul 26, 2013 at 8:35 AM, Jake Mannix jake.man...@gmail.com wrote:
I'm
pushed on your branch to github?
On Fri, Jul 26, 2013 at 12:16 PM, Dmitriy Lyubimov dlie...@gmail.comwrote:
On Fri, Jul 26, 2013 at 8:40 AM, Jake Mannix jake.man...@gmail.com
wrote:
Yep, that fixed it. Are there any real tests
awesome, working now, test results popping up!
On Fri, Jul 26, 2013 at 12:47 PM, Dmitriy Lyubimov dlie...@gmail.comwrote:
yes
On Fri, Jul 26, 2013 at 12:39 PM, Jake Mannix jake.man...@gmail.com
wrote:
pushed on your branch to github?
On Fri, Jul 26, 2013 at 12:16 PM, Dmitriy
+1 from me, I used the jars to run some LDA (on a couple hundred million
documents) on the work cluster (1.0.something small), and it worked fine.
Other clustering example (with reuters) also worked as expected.
On Thu, Jul 18, 2013 at 11:27 AM, Suneel Marthi suneel_mar...@yahoo.comwrote:
+1
So quick question: is an intentional side-effect of the current release
process that when we build on trunk now, we build artifacts named e.g.
mahout-examples-0.9-SNAPSHOT-job.jar ?
On Wed, Jul 10, 2013 at 2:33 AM, Sean Owen sro...@gmail.com wrote:
Yes you can do all of this in a branch, which
at 10:54 AM, Jake Mannix jake.man...@gmail.com
wrote:
So quick question: is an intentional side-effect of the current release
process that when we build on trunk now, we build artifacts named e.g.
mahout-examples-0.9-SNAPSHOT-job.jar ?
On Wed, Jul 10, 2013 at 2:33 AM, Sean Owen sro
I forget, I know our default deployment is via the shaded monojar, but do
we also have an option somewhere to allow running with --libjars instead?
Much more rsync-friendly for rapid prototyping (esp. when on slow
remote connections).
--
-jake
On Fri, Jul 5, 2013 at 1:15 AM, Dmitriy Lyubimov dlie...@gmail.com wrote:
For anyone good at scala DSLs, the following is the puzzle i can't seem to
figure at the moment.
I mentioned before that I implemented assignment notations to a row or a
block, e.g. for a row vector : A(5,::) :=
into why. It just seems unreasonably efficient to be
otherwise.
On Fri, Jul 5, 2013 at 1:23 AM, Jake Mannix jake.man...@gmail.com wrote:
I forget, I know our default deployment is via the shaded monojar, but
do
we also have an option somewhere to allow running with --libjars instead?
Much
:
On Fri, Jul 5, 2013 at 7:19 AM, Jake Mannix jake.man...@gmail.com wrote:
But also: Monster Jars Considered Harmful, so I should dredge up a deploy
flag or something which allows us to run seamlessly with small jar +
libjars instead (so people [incl. me] can tack on their own jars
+1
On Fri, Jul 5, 2013 at 8:47 AM, Ted Dunning ted.dunn...@gmail.com wrote:
+1
On Fri, Jul 5, 2013 at 7:43 AM, Suneel Marthi suneel_mar...@yahoo.com
wrote:
+1
From: Grant Ingersoll gsing...@apache.org
To: dev@mahout.apache.org
I can run LDA on Twitter's cluster, on both reuters and some real data,
as well as LR/SGD.
On Fri, Jun 28, 2013 at 11:51 AM, Grant Ingersoll gsing...@apache.orgwrote:
We really should setup a VM that we can run a couple of nodes (perhaps at
ASF?) on that we can share w/ everyone that makes it
[
https://issues.apache.org/jira/browse/MAHOUT-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692848#comment-13692848
]
Jake Mannix commented on MAHOUT-1268:
-
+1
Wrong output directory
if it is
neither
a row nor a colimn? How can i tell what exactly it is i am iterating
over?
On Jun 19, 2013 12:21 AM, Ted Dunning ted.dunn...@gmail.com
wrote:
On Wed, Jun 19, 2013 at 5:29 AM, Jake Mannix
jake.man...@gmail.com
wrote:
Question #2: which in-core solvers
[
https://issues.apache.org/jira/browse/MAHOUT-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692745#comment-13692745
]
Jake Mannix commented on MAHOUT-1268:
-
has this been tested with cluster_reuters.sh
long keys are super useful for rows in a matrix (ids for documents), and
basically free in terms of memory (only one per document), but then for
symmetry we really do need them in the columns (keying on e.g. termId),
which is a not-insubstantial cost, but possibly worth it.
Our vectors would be
not.
On Wed, Jun 19, 2013 at 9:22 PM, Jake Mannix jake.man...@gmail.com
wrote:
long keys are super useful for rows in a matrix (ids for documents), and
basically free in terms of memory (only one per document), but then for
symmetry we really do need them in the columns (keying on e.g. termId
On Tue, Jun 18, 2013 at 6:14 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:
Hello,
so i finally got around to actually do it.
I want to get Mahout sparse vectors and matrices (DRMs) and rebuild some
solvers using spark and Bagel /scala.
I also want to use in-core solvers that run directly
[
https://issues.apache.org/jira/browse/MAHOUT-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687584#comment-13687584
]
Jake Mannix commented on MAHOUT-1266:
-
As mentioned in the javadocs for the method
Awesome idea.
Biweekly is great. I'm normally PST, but I'll be working from UTC+1:00
from June 22-Aug 29, so I'm listing my availability for the summer given
the french timezone.
On Wed, Jun 12, 2013 at 6:23 AM, Grant Ingersoll gsing...@apache.orgwrote:
On Jun 12, 2013, at 8:41 AM, Shannon
Wow, a lot of Seattleites, I should organize a Mahout MeetUp / Hackathon
when I get back from europe at the end of the summer!
On Wed, Jun 12, 2013 at 10:44 AM, Andrew Musselman
andrew.mussel...@gmail.com wrote:
Bi-weekly is good for me; I'm in Seattle and just filled out the poll.
Great
[
https://issues.apache.org/jira/browse/MAHOUT-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679777#comment-13679777
]
Jake Mannix commented on MAHOUT-1147:
-
So I'm running cluster-reuters.sh
[
https://issues.apache.org/jira/browse/MAHOUT-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679829#comment-13679829
]
Jake Mannix commented on MAHOUT-1147:
-
Totally fresh checkout, HADOOP_HOME is set (I
[
https://issues.apache.org/jira/browse/MAHOUT-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13679843#comment-13679843
]
Jake Mannix commented on MAHOUT-1147:
-
Hmmm:
13/06/10 12:58:44 INFO cvb.CVB0Driver
Assignee: Jake Mannix
Labels: bug, cvb, fix, suggestion
Fix For: 0.8
Attachments: MAHOUT-1147.patch, MAHOUT-1147.patch
Original Estimate: 24h
Remaining Estimate: 24h
Problem:
When training doc/topic model no paths for the term/topic model
[
https://issues.apache.org/jira/browse/MAHOUT-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680218#comment-13680218
]
Jake Mannix commented on MAHOUT-1147:
-
So it looks like I've got the bits you mention
[
https://issues.apache.org/jira/browse/MAHOUT-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jake Mannix updated MAHOUT-1147:
Resolution: Fixed
Status: Resolved (was: Patch Available)
Committed revision 1491694
[
https://issues.apache.org/jira/browse/MAHOUT-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680230#comment-13680230
]
Jake Mannix commented on MAHOUT-1147:
-
Ok, this bug is totally reproducible, but also
+1
Although does anyone else want to take a crack at the release, so that more
of us get some experience with that?
On Mon, Jun 3, 2013 at 2:14 AM, Dan Filimon dangeorge.fili...@gmail.comwrote:
+1
On Jun 3, 2013, at 0:26, Grant Ingersoll gsing...@apache.org wrote:
I'd like to suggest a
[
https://issues.apache.org/jira/browse/MAHOUT-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672604#comment-13672604
]
Jake Mannix commented on MAHOUT-1147:
-
Excellent, I'll look this over later tonight
[
https://issues.apache.org/jira/browse/MAHOUT-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672149#comment-13672149
]
Jake Mannix commented on MAHOUT-874:
So marking hadoop as provided is nice, a smaller
[
https://issues.apache.org/jira/browse/MAHOUT-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672188#comment-13672188
]
Jake Mannix commented on MAHOUT-1236:
-
Why protobufs? Why not thrift or avro? Maybe
[
https://issues.apache.org/jira/browse/MAHOUT-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672192#comment-13672192
]
Jake Mannix commented on MAHOUT-1236:
-
Thrift leaves off optional fields pretty well
[
https://issues.apache.org/jira/browse/MAHOUT-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jake Mannix resolved MAHOUT-684.
Resolution: Won't Fix
This patch applies to the original LDA we had in Mahout 0.5 or so
[
https://issues.apache.org/jira/browse/MAHOUT-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672252#comment-13672252
]
Jake Mannix commented on MAHOUT-684:
This code is based on the old LDA impl we had
[
https://issues.apache.org/jira/browse/MAHOUT-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672318#comment-13672318
]
Jake Mannix commented on MAHOUT-1225:
-
What exactly did you end up submitting Robin
[
https://issues.apache.org/jira/browse/MAHOUT-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671461#comment-13671461
]
Jake Mannix commented on MAHOUT-1026:
-
Hey Suneel - have you tested it, does it yield
[
https://issues.apache.org/jira/browse/MAHOUT-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13671462#comment-13671462
]
Jake Mannix commented on MAHOUT-1026:
-
also, I think we can call it lda in the option
[
https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13666421#comment-13666421
]
Jake Mannix commented on MAHOUT-1227:
-
Committing this in about an hour unless I hear
[
https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jake Mannix resolved MAHOUT-1227.
-
Resolution: Fixed
Committed revision 1486122.
Vector.iterateNonZero
This has been submitted.
I suggest everyone who's got changes checked out update sometime soon, to
minimize merge conflicts.
On Fri, May 24, 2013 at 2:17 AM, Shannon Quinn squ...@gatech.edu wrote:
LGTM!
On 5/23/13 10:06 PM, Jake Mannix wrote:
It's done, patch passes tests
[
https://issues.apache.org/jira/browse/MAHOUT-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665174#comment-13665174
]
Jake Mannix commented on MAHOUT-1225:
-
Wait, was this not _exactly_ the bug in
https
[
https://issues.apache.org/jira/browse/MAHOUT-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665188#comment-13665188
]
Jake Mannix commented on MAHOUT-1225:
-
Ah yes, we merged collections back into math
[
https://issues.apache.org/jira/browse/MAHOUT-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665335#comment-13665335
]
Jake Mannix commented on MAHOUT-1225:
-
To build from trunk (which is what we all do
[
https://issues.apache.org/jira/browse/MAHOUT-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665361#comment-13665361
]
Jake Mannix commented on MAHOUT-1225:
-
I'm not sure everyone's hadoop cluster
[
https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jake Mannix updated MAHOUT-1227:
Description:
Currently, our codebase is littered with the following:
{code}
IteratorElement
Jake Mannix created MAHOUT-1227:
---
Summary: Vector.iterateNonZero() is super-clumsy to use: add
IterableElement allNonZero()
Key: MAHOUT-1227
URL: https://issues.apache.org/jira/browse/MAHOUT-1227
[
https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jake Mannix updated MAHOUT-1227:
Attachment: MAHOUT-1227.diff
initial, non-invasive additional methods
[
https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665667#comment-13665667
]
Jake Mannix commented on MAHOUT-1227:
-
You like:
{code}
for (Element e : vector
[
https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665671#comment-13665671
]
Jake Mannix commented on MAHOUT-1227:
-
because if so, we currently allow
[
https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665680#comment-13665680
]
Jake Mannix commented on MAHOUT-1227:
-
in fact, as I dig through all the cases, I
[
https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665684#comment-13665684
]
Jake Mannix commented on MAHOUT-1227:
-
Another case I'm not sure about is in your
[
https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665712#comment-13665712
]
Jake Mannix commented on MAHOUT-1227:
-
ah good to know. It'll get fixed
[
https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665811#comment-13665811
]
Jake Mannix commented on MAHOUT-1227:
-
egads, we Matrix (which extends VectorIterable
[
https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13665945#comment-13665945
]
Jake Mannix commented on MAHOUT-1227:
-
Tests pass for diff at https
Hey Mahout-devs,
Looks like it's time for a board report again, and since I missed last
month, we've got two months to report on, so if you've got things you want
to add to the report (talks, important features of development we've
completed recently, etc), feel free to edit the wiki (Isabel
[
https://issues.apache.org/jira/browse/MAHOUT-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jake Mannix resolved MAHOUT-1197.
-
Resolution: Fixed
AbstractVector#cross is only appropriately efficient for dense vectors
[
https://issues.apache.org/jira/browse/MAHOUT-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13640832#comment-13640832
]
Jake Mannix commented on MAHOUT-1047:
-
So in general, I think this is the right
Jake Mannix created MAHOUT-1197:
---
Summary: AbstractVector#cross is only appropriately efficient for
dense vectors
Key: MAHOUT-1197
URL: https://issues.apache.org/jira/browse/MAHOUT-1197
Project: Mahout
[
https://issues.apache.org/jira/browse/MAHOUT-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641097#comment-13641097
]
Jake Mannix commented on MAHOUT-1047:
-
Ah yes. So the critical new lines are:
[code
[
https://issues.apache.org/jira/browse/MAHOUT-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641140#comment-13641140
]
Jake Mannix commented on MAHOUT-1047:
-
Well, the iterations are only when
[
https://issues.apache.org/jira/browse/MAHOUT-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641313#comment-13641313
]
Jake Mannix commented on MAHOUT-1197:
-
The big issue is the loop over row from 0
[
https://issues.apache.org/jira/browse/MAHOUT-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jake Mannix updated MAHOUT-1197:
Attachment: MAHOUT-1197.diff
simple fix which should work for both dense and sparse subclasses
[
https://issues.apache.org/jira/browse/MAHOUT-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638193#comment-13638193
]
Jake Mannix commented on MAHOUT-1160:
-
Yeah, looks like this is closeable, thanks
[
https://issues.apache.org/jira/browse/MAHOUT-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jake Mannix closed MAHOUT-1160.
---
fixed with MAHOUT-1190
Add performant iterators to primitive collections
They're very unsafe, and it gets really complicated to make them
both highly performant and thread safe, and like Ted says: just synchronize
at a higher level. You're never dealing with one Vector and want to max out
all 8 cores on that one vector, you're looking at millions of vectors - give
It should be pretty easy to check via a new unit test if this iteration /
changing
values interleaved operation works. It's hard to tell
if indexOfInsertion() is
implemented completely safely by inspection.
On Mon, Apr 15, 2013 at 10:50 AM, Robin Anil robin.a...@gmail.com wrote:
On second
Ah, this was the one corner case I was worried about - we do special-case
setting to 0,
as meaning remove from the hashmap, yes.
What's the TL;DR of what you did to work around this? Should we allow
this? Even
if it's through the Vector.Element instance, should it be ok? If so, how
to handle?
of space the vector was taking
up. But I can see the argument that it really should return what it says it
returns, if that is relied upon.
Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
On Mon, Apr 15, 2013 at 1:50 PM, Jake Mannix jake.man...@gmail.com
wrote:
Ah
if the
element is nonzero.
Killing iteration would be really really bad, from a useability standpoint.
In fact,
I've been moving in the other direction: https://reviews.apache.org/r/9867/
adds iterators to the basic collection interface!
On Mon, Apr 15, 2013 at 2:08 PM, Jake Mannix
[
https://issues.apache.org/jira/browse/MAHOUT-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632343#comment-13632343
]
Jake Mannix commented on MAHOUT-1191:
-
as you can see...
It looks like SASV is still
[
https://issues.apache.org/jira/browse/MAHOUT-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632356#comment-13632356
]
Jake Mannix commented on MAHOUT-1191:
-
Ah, yes, nevermind, comparing 2nd to 3rd
[
https://issues.apache.org/jira/browse/MAHOUT-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632363#comment-13632363
]
Jake Mannix commented on MAHOUT-1191:
-
Ok, so I'm trying to wrap my head around _how_
[
https://issues.apache.org/jira/browse/MAHOUT-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631091#comment-13631091
]
Jake Mannix commented on MAHOUT-1047:
-
So this should be in either ModelTrainer
This looks very wrong. The iterators for SASV extend guava's
AbstractIterator, but they do reuse the NonDefaultElement instance
internally. It *looks* like we're correctly satisfying the
AbstractIterator#computeNext() contract, but we must not be if we're
mutating on multiple hasNext() calls...
I think requiring the caller to know to copy/clone the element to be allowed
to call hasNext() multiple times is extremely non-intuitive. Having the
caller
know that it's dangerous / not allowed to hang onto an element without
copying while continuing to iterate (e.g. when looking for the largest
[
https://issues.apache.org/jira/browse/MAHOUT-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629031#comment-13629031
]
Jake Mannix commented on MAHOUT-1190:
-
Sequential access is a slow format
SequentialAccessSparseVector should really *never* be used in a
mutating way. You should use RandomAccessSparseVector if you're
going to mutate, and then *freeze* the results in a SASV when you're
done mutating it and you expect to be using it for only dot() and other
read-only operations which
In the existing code, assign() comes from AbstractVector and if the
function is not PLUS or PLUS_ABS, it does this:
for (int i = 0; i size; i++) {
setQuick(i, function.apply(getQuick(i), other.getQuick(i)));
}
Yeah, this has been a nasty nasty fact forever, and I should read your
patch
Jake Mannix created MAHOUT-1186:
---
Summary: OpenKeyTypeObjectHashMap#clear() has been broken forever.
Key: MAHOUT-1186
URL: https://issues.apache.org/jira/browse/MAHOUT-1186
Project: Mahout
[
https://issues.apache.org/jira/browse/MAHOUT-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jake Mannix updated MAHOUT-1186:
Attachment: MAHOUT-1186.diff
Unit test in this patch *fails* on trunk. Passes with the fix
[
https://issues.apache.org/jira/browse/MAHOUT-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13623885#comment-13623885
]
Jake Mannix commented on MAHOUT-1186:
-
Thanks for catching this, Andy. Slipped
[
https://issues.apache.org/jira/browse/MAHOUT-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jake Mannix resolved MAHOUT-1186.
-
Resolution: Fixed
OpenKeyTypeObjectHashMap#clear() has been broken forever
Josh posted to the Crunch list about this: the idea was to intentionally
*not* make Crunch depend on Mahout, nor Mahout depend on Crunch, but have a
new project which depended on both.
On Fri, Mar 29, 2013 at 5:49 AM, Ted Dunning ted.dunn...@gmail.com wrote:
Pity that they don't bother to
=== Apache Mahout Status Report: March 2013 ===
Apache Mahout provides implementations of machine learning algorithms
(collaborative filtering, clustering, classification, and
more) for large-scale data, mostly via Hadoop-based
implementations.
Issues:
Sean Owen wishes to leave the Mahout PMC
/9867/diff/
Testing
---
mvn test in math module
Thanks,
Jake Mannix
---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9867/#review17714
---
- Jake Mannix
On March 12, 2013, 4:40 a.m., Jake Mannix wrote
[
https://issues.apache.org/jira/browse/MAHOUT-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13600215#comment-13600215
]
Jake Mannix commented on MAHOUT-1160:
-
RB: https://reviews.apache.org/r/9867
1455393
trunk/math/src/test/java-templates/org/apache/mahout/math/map/OpenKeyTypeValueTypeHashMapTest.java.t
1455393
Diff: https://reviews.apache.org/r/9867/diff/
Testing
---
mvn test in math module
Thanks,
Jake Mannix
Why would you say fastutil more than hppc?
Currently all we use in Mahout is lists and hashmaps, and we don't
even currently have proper iteration over the latter, so we certainly
don't depend on Collections compatibility...
On Tue, Mar 12, 2013 at 12:03 PM, Dawid Weiss
On Tue, Mar 12, 2013 at 12:52 PM, Dawid Weiss
dawid.we...@cs.put.poznan.plwrote:
Why would you say fastutil more than hppc?
Oh, I like HPPC very much -- although I wrote it so I may not be
completely objective here :)
And seriously I recommended fastutil because Mahout is primarily
But then where does it slow down? It just wraps a double[]
On Tuesday, March 12, 2013, Sebastian Schelter wrote:
I looked into DenseVector and it doesn't use any primitive collections,
so ignore my last mail :)
On 12.03.2013 22:16, Sebastian Schelter wrote:
As a sidenote: I was kinda
1 - 100 of 390 matches
Mail list logo