[
https://issues.apache.org/jira/browse/MAHOUT-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213483#comment-13213483
]
Frank Scholten commented on MAHOUT-944:
---
Saving to Lucene indexes is a different use
Welcome aboard!
On Feb 21, 2012, at 6:11 PM, Jeff Eastman wrote:
The Project Management Committee (PMC) for Apache Mahout has asked Tom
Pierceto become a committer and we are pleased to announce that he has
accepted. Being a committer enables easier contribution to the project since
there
On recent threads on the dev@ list, and discussions off-list, it's pretty
clear that we need to have cleanup be a priority for the next release.
How about this for a formal proposal:
- The 0.7 release will have issues (both new and on JIRA) be primarily
focused on bugfixes / cleanup /
Aye say I.
Sent from my iPhone
On Feb 22, 2012, at 4:24 AM, Jake Mannix jake.man...@gmail.com wrote:
If we're able to wrap this release up cleanly and get quickly moving on to
new features again, maybe we can try this on a more regular basis, with
even releases being feature-work, and odd
We've collected a fair bit of Hadoop utils over the years. I am finding them
generally useful in other projects. Would it make sense to either split them
out to a standalone jar and/or donate them upstream to Hadoop itself?
I'm thinking the things like:
Seq File iterators and potentially the
I say aye.
iPhone'd
On Feb 22, 2012, at 9:35, Ted Dunning ted.dunn...@gmail.com wrote:
Aye say I.
Sent from my iPhone
On Feb 22, 2012, at 4:24 AM, Jake Mannix jake.man...@gmail.com wrote:
If we're able to wrap this release up cleanly and get quickly moving on to
new features
I think its fine to let them live in integration here rather than a new
module. The iterators could be useful upstream yes and maybe a few more
bits. The AbstractJob might still be a little too app specific.
On Feb 22, 2012 2:37 PM, Grant Ingersoll gsing...@apache.org wrote:
We've collected a
Thanks everyone!
-tom
On Wed, Feb 22, 2012 at 6:07 AM, Grant Ingersoll gsing...@apache.org wrote:
Welcome aboard!
On Feb 21, 2012, at 6:11 PM, Jeff Eastman wrote:
The Project Management Committee (PMC) for Apache Mahout has asked Tom
Pierceto become a committer and we are pleased to
I'll go for aye aye maties, shall we aim for end of May?
On 2/22/12 7:41 AM, Shannon Quinn wrote:
I say aye.
iPhone'd
On Feb 22, 2012, at 9:35, Ted Dunningted.dunn...@gmail.com wrote:
Aye say I.
Sent from my iPhone
On Feb 22, 2012, at 4:24 AM, Jake Mannixjake.man...@gmail.com wrote:
Hi Saikat,
I agree with Paritosh, that a great place to begin would be to write
some unit tests. This will familiarize you with the code base and help
us a lot with our 0.7 housekeeping release. The new clustering
classification components are going to unify many - but not all - of the
But is integration published as a separate jar?
Sent from my iPhone
On Feb 22, 2012, at 6:52 AM, Sean Owen sro...@gmail.com wrote:
I think its fine to let them live in integration here rather than a new
module. The iterators could be useful upstream yes and maybe a few more
bits. The
Separate jar does mean separate maven artifact but the dependency mechanism
should handle that and the new artifacts should be very stable.
Sent from my iPhone
On Feb 22, 2012, at 6:54 AM, Jake Mannix jake.man...@gmail.com wrote:
On Wed, Feb 22, 2012 at 6:37 AM, Grant Ingersoll
So I haven't looked super-carefully at the clustering refactoring work, can
someone give a little overview of what
the plan is?
The NewLDA stuff is technically in clustering and generally works by
taking in SeqFileIW,VW documents as the training corpus, and spits out
two things: SeqFileIW,VW of a
Jeff,I'm pretty excited to help out with this, so as a starter can you point me
to where I should begin my readings of the code, I havent looked too closely
but are there certain classes in the clustering area where this refactoring
effort is centered around.
Regards
Date: Wed, 22 Feb 2012
On Wed, Feb 22, 2012 at 10:32 AM, Jeff Eastman
j...@windwardsolutions.comwrote:
This refactoring is focused on some of the iterative clustering algorithms
which, in each iteration, load a prior set of clusters ( e.g. clusters-0)
and process each input vector against them to produce a posterior
[
https://issues.apache.org/jira/browse/MAHOUT-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitriy Lyubimov updated MAHOUT-817:
Attachment: MAHOUT-817-RC1.patch
refreshing the attached patch (called RC1) to correspond
On 2/22/12 11:58 AM, Jake Mannix wrote:
On Wed, Feb 22, 2012 at 10:32 AM, Jeff Eastman
j...@windwardsolutions.comwrote:
This refactoring is focused on some of the iterative clustering algorithms
which, in each iteration, load a prior set of clusters ( e.g. clusters-0)
and process each input
I would also like to see if we can put an all reduce implementation into this
effort. The idea is that we can use a map only job that does all iteration
internally. I think that this could result in more than an order of magnitude
speed up for our clustering codes. It could also provide
[
https://issues.apache.org/jira/browse/MAHOUT-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214091#comment-13214091
]
Jeff Eastman commented on MAHOUT-929:
-
Hey Paritosh, why don't you take over this
Hi Saikat,
Glad you're excited. Paritosh offered one suggestion below. You could
look at TestKmeansClustering for patterns you could use to test the
ClusterClassificationMapper and Driver in MR mode. That should be
straightforward, but please coordinate with Paritosh so you don't
duplicate
Yes perfect I'll look at those and begin readings there and figure out next
steps.Thanks again for your help in starting this effort.
Date: Wed, 22 Feb 2012 16:25:27 -0700
From: j...@windwardsolutions.com
To: dev@mahout.apache.org
Subject: Re: Helping out with the .7 release
Hi Saikat,
Hey Ted,
Could you elaborate on this approach? I don't grok how an all reduce
implementation can be done with a map-only job, or how a mapper could
do all iteration[s] internally.
I've just gotten the ClusterIterator working in MR mode and it does what
I thought we'd been talking about
See https://builds.apache.org/job/Mahout-Quality/changes
Hi,
working on PCA section in SSVD usage .
Just to confirm, if we run and svd over input with mean subtracted,
then U matrix presents original data points converted to PCA space,
right?
thanks.
-d
[
https://issues.apache.org/jira/browse/MAHOUT-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214175#comment-13214175
]
Hudson commented on MAHOUT-933:
---
Integrated in Mahout-Quality #1362 (See
Hi Dmitriy,
Just a few comments:
--the computed factors are approximate A \approx U\SigmaV^{T}
-- the projection steps seemed transposed to me but they are consistent
throughout ie.
(2) \tilde{u} = \tilde{c}_{r} V \Sigma^{-1}
p. 3: transpose \xi to emphasize row vector
- 'mean of all
All reduce is a non map reduce primitive stolen from mpi. It is used, for
example, in vw to accumulate gradient information without additional Map reduce
iterations.
The all reduce operation works by building a tree of all tasks. A state is sent
up the tree from the leaves. Each internal node
Got any code that does this I could look at?
On 2/22/12 9:23 PM, Ted Dunning wrote:
All reduce is a non map reduce primitive stolen from mpi. It is used, for
example, in vw to accumulate gradient information without additional Map reduce
iterations.
The all reduce operation works by building
Only vw itself.
Sent from my iPhone
On Feb 22, 2012, at 9:01 PM, Jeff Eastman j...@windwardsolutions.com wrote:
Got any code that does this I could look at?
On 2/22/12 9:23 PM, Ted Dunning wrote:
All reduce is a non map reduce primitive stolen from mpi. It is used, for
example, in vw to
See https://builds.apache.org/job/Mahout-Quality/1363/changes
[
https://issues.apache.org/jira/browse/MAHOUT-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214318#comment-13214318
]
Hudson commented on MAHOUT-933:
---
Integrated in Mahout-Quality #1363 (See
[
https://issues.apache.org/jira/browse/MAHOUT-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paritosh Ranjan reassigned MAHOUT-929:
--
Assignee: Paritosh Ranjan (was: Jeff Eastman)
Refactor Clustering (Vector
[
https://issues.apache.org/jira/browse/MAHOUT-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214329#comment-13214329
]
Paritosh Ranjan commented on MAHOUT-929:
Assigned to myself.
I think cluster
[
https://issues.apache.org/jira/browse/MAHOUT-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214329#comment-13214329
]
Paritosh Ranjan edited comment on MAHOUT-929 at 2/23/12 6:33 AM:
Refactor KMeans Clustering into a separate post process with outlier pruning
Key: MAHOUT-981
URL: https://issues.apache.org/jira/browse/MAHOUT-981
Project: Mahout
Refactor Canopy Clustering into a separate post process with outlier pruning
Key: MAHOUT-982
URL: https://issues.apache.org/jira/browse/MAHOUT-982
Project: Mahout
36 matches
Mail list logo