Re: Renumbering new releases

2019-04-30 Thread Dmitriy Lyubimov
I am ok with 1.0 On Thu, Apr 18, 2019 at 6:10 PM Andrew Musselman wrote: > We've been discussing moving to a 1.0 release for a few years now. This > past quarter we had a comment on our board report about whether we would > consider getting out of the 0.x releases. > > I think it makes sense

Re: Hangouts

2018-07-31 Thread Dmitriy Lyubimov
I am on vacation this week fyi On Tue, Jul 31, 2018 at 11:36 AM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > Cool, I'll shoot for something on Friday early Pacific time and put an > invite in here; looking forward to it! > > On Sat, Jul 28, 2018 at 9:26 AM Shannon Quinn wrote: > > >

Re: Congrats Palumbo and Holden

2018-05-02 Thread Dmitriy Lyubimov
Congrats! On Wed, May 2, 2018 at 1:25 PM, Trevor Grant wrote: > Both were just elected new ASF members!! > > https://s.apache.org/D6iz >

Re: Board Report

2018-04-24 Thread Dmitriy Lyubimov
LGTM -d On Tue, Apr 24, 2018 at 9:48 AM, Andrew Palumbo wrote: > Hello all, > The Mahout PMC would like to involve the community more in filling out > board reports. This will hopefully help us to learn some of the needs of > Mahout devs and users. > > >

Re: Backlog - Reordered Top->Down

2018-03-06 Thread Dmitriy Lyubimov
Thank you. On Sat, Mar 3, 2018 at 1:45 PM, André Santi Oliveira wrote: > Things which were in the backlog were organized (top->down) using criteria: > > - Fix Version(s) > - Priority > - Type > > > If you don't agree with the order of some things which are there, feel

Re: Updating Wikipedia

2018-02-19 Thread Dmitriy Lyubimov
I think Suneel was modifying it... On Sun, Feb 18, 2018 at 7:02 AM, Trevor Grant wrote: > Is anyone good at Wikipedia? > > We're still listed as being primarily running on Hadoop there. > > https://en.wikipedia.org/wiki/Apache_Mahout > > If anyone has some skills/time-

Re: MathJax not renedering on Website

2017-09-12 Thread Dmitriy Lyubimov
PS http://docs.mathjax.org/en/latest/start.html "We retired our self-hosted CDN at cdn.mathjax.org in April, 2017. We recommend using cdnjs.com which uses the same provider. [...]" On Tue, Sep 12, 2017 at 9:52 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > mathjax we use

Re: MathJax not renedering on Website

2017-09-12 Thread Dmitriy Lyubimov
mathjax we use runs on cdn. We can either re-host it ourselves to avoid the dependency (and also shield from incompatibilities introduced by version update), or follow whatever they use. their currently advertised cdn location seems to be (which indeed looks different to me from before)

Re: [DISCUSS} New feature - DRM and in-core matrix sort and required test suites for modules.

2017-09-05 Thread Dmitriy Lyubimov
ote that also > on the tree nodes, but that only makes sense if we have "consumer" > operators that care about sortedness, of which we have none at the moment > (it possible that we will, perhaps). I am just saying this problem may > benefit from some more broad thinking of the issue i

Re: [DISCUSS} New feature - DRM and in-core matrix sort and required test suites for modules.

2017-09-05 Thread Dmitriy Lyubimov
preserve/mess it up etc. On Tue, Sep 5, 2017 at 3:01 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > In general, +1, don't see why not. > > Q -- is it something that you have encountered while doing algebra? I.e., > do you need the sorted DRM to continue algebraic operat

Re: [DISCUSS} New feature - DRM and in-core matrix sort and required test suites for modules.

2017-09-05 Thread Dmitriy Lyubimov
In general, +1, don't see why not. Q -- is it something that you have encountered while doing algebra? I.e., do you need the sorted DRM to continue algebraic operations between optimizer barriers, or you just need an RDD as the outcome of all this? if it is just an RDD, then you could just do a

Re: Looking for help with a talk

2017-08-10 Thread Dmitriy Lyubimov
yes sure, as befire, i can review On Fri, Aug 4, 2017 at 1:12 AM, Isabel Drost-Fromm wrote: > Hi, > > I have a first draft of a narrative and slide deck. If anyone has time it > would be lovely to bounce some ideas back and forth, have the draft of the > deck reviewed. > > >

Re: Proposal for changing Mahout's Git branching rules

2017-07-20 Thread Dmitriy Lyubimov
the dev branch that she will then be asked to merge to. Which WILL catch the unsuspecting contributor unawares. They will find they'd have a significant divergence to overcome in order to attain the mergeability of their work. On Thu, Jul 20, 2017 at 9:06 AM, Dmitriy Lyubimov <dlie...@gmail.

Re: Proposal for changing Mahout's Git branching rules

2017-07-20 Thread Dmitriy Lyubimov
On Fri, Jun 23, 2017 at 8:23 AM, Pat Ferrel wrote: > I don’t know where to start here. Git flow does not address the merge > conflict problems you talk about. They have nothing to do with the process > and are made no easier or harder by following it. > I thought i did

Re: [DISCUSS] Naming convention for multiple spark/scala combos

2017-07-07 Thread Dmitriy Lyubimov
it would seem 2nd option is preferable if doable. Any option that has most desirable combinations prebuilt, is preferable i guess. Spark itself also releases tons of hadoop profile binary variations. so i don't have to build one myself. On Fri, Jul 7, 2017 at 8:57 AM, Trevor Grant

Re: Density based Clustering in Mahout

2017-07-06 Thread Dmitriy Lyubimov
it too (after all it is just a bunch of bytes after serialization, can't get any more basic than that). On Thu, Jul 6, 2017 at 11:21 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > > > On Thu, Jul 6, 2017 at 9:45 AM, Trevor Grant <trevor.d.gr...@gmail.com> > wrote: > &

Re: Density based Clustering in Mahout

2017-07-06 Thread Dmitriy Lyubimov
On Thu, Jul 6, 2017 at 9:45 AM, Trevor Grant wrote: > To Dmitriy's point (2)- I think it is acceptable to create an R-Tree > structure, that will exist only within the algorithm for doing in-core > operations, (or maybe it lives slightly outside of the algorithm so we >

Re: Density based Clustering in Mahout

2017-07-05 Thread Dmitriy Lyubimov
to similar effects as the probabilistic sketches techniques mentioned in the book, but with much more headache for the bang. I eventually turned to solving problems pre-sketched one way or another (including for density clustering problems). On Wed, Jul 5, 2017 at 8:59 AM, Dmitriy Lyubimov <d

Re: Proposal for changing Mahout's Git branching rules

2017-06-22 Thread Dmitriy Lyubimov
... when working on github, any deviation from github commonly accepted PR flows imo would be a fatal wound to the process. On Thu, Jun 22, 2017 at 4:13 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > should read > > And then you will face the dilemma whether to ask people t

Re: Proposal for changing Mahout's Git branching rules

2017-06-22 Thread Dmitriy Lyubimov
errors. On Thu, Jun 22, 2017 at 2:48 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > > > On Wed, Jun 21, 2017 at 3:00 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > >> Which is an option part of git flow but maybe take a look at a better >> explanation than

Re: Proposal for changing Mahout's Git branching rules

2017-06-22 Thread Dmitriy Lyubimov
On Wed, Jun 21, 2017 at 3:00 PM, Pat Ferrel wrote: > Which is an option part of git flow but maybe take a look at a better > explanation than mine: http://nvie.com/posts/a-successful-git-branching- > model/ > > I

Re: Proposal for changing Mahout's Git branching rules

2017-06-21 Thread Dmitriy Lyubimov
On Wed, Jun 21, 2017 at 2:17 PM, Pat Ferrel wrote: Since merges are done by committers, it’s easy to retarget a contributor’s > PRs but committers would PR against develop, IMO it is anything but easy to resolve conflicts, let alone somebody else's. Spark just asks me to

Re: Looking for help with a talk

2017-06-21 Thread Dmitriy Lyubimov
Isabel, you obviously always can call me with any questions. -D On Sat, May 27, 2017 at 2:01 PM, Isabel Drost-Fromm wrote: > Hi, > > I've been invited to give a machine learning centric keynote at FrOSCon > (free and open source conference, the little sister of FOSDEM,

Re: [DISCUSS] remove getMahoutHome from sparkbindings

2017-06-21 Thread Dmitriy Lyubimov
i think the idea was that mahout has control over application jars and guarantees that the minimum amount of mahout jars is added to the application. In order to do that, it needs to be able to figure the home (even if it is an uber jar, as some devs have been pushing for for some time now, so it

Re: new committer: Dustin Vanstee

2017-06-21 Thread Dmitriy Lyubimov
welcome! On Wed, Jun 21, 2017 at 2:07 PM, Pat Ferrel wrote: > Welcome Dustin! > > Nice work so far, much needed. > > > On Jun 21, 2017, at 12:08 PM, Andrew Palumbo wrote: > > Welcome Dustin! > > > > Sent from my Verizon Wireless 4G LTE smartphone > >

Re: Proposal for changing Mahout's Git branching rules

2017-06-21 Thread Dmitriy Lyubimov
so people need to make sure their PR merges to develop instead of master? Do they need to PR against develop branch, and if not, who is responsible for confict resolution then that is to arise from diffing and merging into different targets? On Tue, Jun 20, 2017 at 10:09 AM, Pat Ferrel

Re: Redesign

2017-05-10 Thread Dmitriy Lyubimov
I am actually totally blown away! Thanks! On Wed, May 10, 2017 at 10:32 AM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > Massive thanks to Trevor and Dustin for the work redesigning/implementing > the website; for the last mile of look and feel I reached out to a designer > who's

[jira] [Commented] (MAHOUT-1946) ViennaCL not being picked up by JNI

2017-05-09 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003767#comment-16003767 ] Dmitriy Lyubimov commented on MAHOUT-1946: -- it seems so. load library failure should equal

[jira] [Commented] (MAHOUT-1946) ViennaCL not being picked up by JNI

2017-05-09 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003640#comment-16003640 ] Dmitriy Lyubimov commented on MAHOUT-1946: -- Native backends should be able to recover from

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

2017-04-21 Thread Dmitriy Lyubimov
BHARAT < h2016...@pilani.bits-pilani.ac.in> wrote: > One is the cluster ID of the Index to which the data point should be > assigned. > As per what is given in this book Apache-Mahout-Mapreduce-Dmitriy-Lyubimov > <http://www.amazon.in/Apache-Mahout-Mapreduce-Dmitriy- >

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

2017-04-12 Thread Dmitriy Lyubimov
} > >//calculating the sum of squared distance between the points(Vectors) > def ssr(a: Vector, b: Vector): Double = { > (a - b) ^= 2 sum > } > > //method used to create (1|D) > def addCentriodColumn(arg: Array[Double]): Array[Double] = { > val newArr

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

2017-03-31 Thread Dmitriy Lyubimov
ps1 this assumes row-wise construction of A based on training set of m n-dimensional points. ps2 since we are doing multiple passes over A it may make sense to make sure it is committed to spark cache (by using checkpoint api), if spark is used On Fri, Mar 31, 2017 at 10:53 AM, Dmitriy Lyubimov

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

2017-03-31 Thread Dmitriy Lyubimov
create the augmented matrix A := (0|D) which you have > > mentioned. > > > > > > On Fri, Mar 31, 2017 at 10:10 PM, Dmitriy Lyubimov <dlie...@gmail.com> > > wrote: > > > >> was my reply for your post on @user has been a bit confusing? > >> > &g

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

2017-03-31 Thread Dmitriy Lyubimov
was my reply for your post on @user has been a bit confusing? On Fri, Mar 31, 2017 at 8:40 AM, KHATWANI PARTH BHARAT < h2016...@pilani.bits-pilani.ac.in> wrote: > Sir, > I am trying to write the kmeans clustering algorithm using Mahout Samsara > but i am bit confused > about how to leverage

Re: Marketing

2017-03-24 Thread Dmitriy Lyubimov
On Fri, Mar 24, 2017 at 8:27 AM, Pat Ferrel wrote: > The multiple backend support is such a waste of time IMO. The DSL and GPU > support is super important and should be made even more distributed. The > current (as I understand it) single threaded GPU per VM is only the

Re: Contributing an algorithm for samsara

2017-03-03 Thread Dmitriy Lyubimov
can be reasonably convinced it is actually better that way than vanilla R contract, but IMO it would be really useful to retain 100% compatibility there since it is one of ideas there -- retain R-like-ness with these things. On Fri, Mar 3, 2017 at 12:31 PM, Dmitriy Lyubimov <dlie...@gmail.com>

Re: Contributing an algorithm for samsara

2017-03-03 Thread Dmitriy Lyubimov
On Fri, Mar 3, 2017 at 4:09 AM, Jim Jagielski wrote: > >> >> > >> > >> > 3) On the feature extraction per R like formula can you elaborate more >> here, are you talking about feature extraction using R like dataframes and >> operators? >> > > Yes. I would start doing generic

Re: Contributing an algorithm for samsara

2017-03-03 Thread Dmitriy Lyubimov
I am getting a liittle bit lost who asked what here, inline. On Fri, Mar 3, 2017 at 4:09 AM, Jim Jagielski wrote: > > > Would it make sense to keep them as-is, and "pull them out", as > it were, should they prove to be wanted/needed by the other algo users? > I would hope it

Re: Contributing an algorithm for samsara

2017-02-17 Thread Dmitriy Lyubimov
in particular, this is the samsara implementation of double-weighed als : https://github.com/apache/mahout/pull/14/files#diff-0fbeb8b848ed0c5e3f782c72569cf626 On Fri, Feb 17, 2017 at 1:33 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > Jim, > > if ALS is of interest, and as far

Re: Contributing an algorithm for samsara

2017-02-17 Thread Dmitriy Lyubimov
Jim, if ALS is of interest, and as far as weighed ALS is concerned (since we already have trivial regularized ALS in the "decompositions" package), here's uncommitted samsara-compatible patch from a while back: https://issues.apache.org/jira/browse/MAHOUT-1365 it combines weights on both data

[jira] [Commented] (MAHOUT-1940) Provide a Java API to SimilarityAnalysis and any other needed APIs

2017-02-14 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866853#comment-15866853 ] Dmitriy Lyubimov commented on MAHOUT-1940: -- Normally, one who is writing in Java, does not have

Re: [jira] [Updated] (MAHOUT-1939) fastutil version clash with spark distributions

2017-02-13 Thread Dmitriy Lyubimov
rsion clash with spark distributions > > --- > > > > Key: MAHOUT-1939 > > URL: https://issues.apache.org/jira/browse/MAHOUT-1939 > > Project: Mahout > > Issue Type: Bug > >Reporter: Dmitriy Lyubimov > >

Re: [jira] [Commented] (MAHOUT-1939) fastutil version clash with spark distributions

2017-02-13 Thread Dmitriy Lyubimov
> > > fastutil version clash with spark distributions > > --- > > > > Key: MAHOUT-1939 > > URL: https://issues.apache.org/jira/browse/MAHOUT-1939 > > Project:

[jira] [Comment Edited] (MAHOUT-1939) fastutil version clash with spark distributions

2017-02-10 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862045#comment-15862045 ] Dmitriy Lyubimov edited comment on MAHOUT-1939 at 2/11/17 12:33 AM

[jira] [Comment Edited] (MAHOUT-1939) fastutil version clash with spark distributions

2017-02-10 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862045#comment-15862045 ] Dmitriy Lyubimov edited comment on MAHOUT-1939 at 2/11/17 12:16 AM

[jira] [Commented] (MAHOUT-1939) fastutil version clash with spark distributions

2017-02-10 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862045#comment-15862045 ] Dmitriy Lyubimov commented on MAHOUT-1939: -- perhaps mahout should include fast-util in a shaded

[jira] [Created] (MAHOUT-1939) fastutil version clash with spark distributions

2017-02-10 Thread Dmitriy Lyubimov (JIRA)
Dmitriy Lyubimov created MAHOUT-1939: Summary: fastutil version clash with spark distributions Key: MAHOUT-1939 URL: https://issues.apache.org/jira/browse/MAHOUT-1939 Project: Mahout

Re: Intro from a lurker

2017-02-09 Thread Dmitriy Lyubimov
Jim, let me start by stating it's an (unexpected on my side) honor. Are you willing to get hands-on at this point in numerical problems (or have resources that can get hands-on)? Short modern Mahout story (as short as it is possible to be short) Most nagging problem: lack of support by industry

[jira] [Resolved] (MAHOUT-1916) mahout bug

2017-01-26 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov resolved MAHOUT-1916. -- Resolution: Invalid > mahout bug > -- > > Key:

Re: request to point out for my error

2017-01-11 Thread Dmitriy Lyubimov
so what's the error? On Wed, Jan 11, 2017 at 10:02 AM, Andrew Palumbo wrote: > Welcome Cherryko. > > > > Sent from my Verizon Wireless 4G LTE smartphone > > > Original message > From: Suneel Marthi > Date: 01/11/2017 9:30 AM (GMT-08:00)

Re: Newbie. Getting Started with Mahout. Found an issue in the documentation + Potential Fix

2016-12-21 Thread Dmitriy Lyubimov
we use something called Apache CMS.. We also host documentation via github (gh-pages branch) On Sat, Dec 17, 2016 at 7:53 PM, Abhishek Goswami wrote: > Hi Folks. > > I made some progress getting started with using Jekyll. I am fairly new to > Jekyll, so to start off I

[jira] [Commented] (MAHOUT-1856) Create a framework for new Mahout Clustering, Classification, and Optimization Algorithms

2016-12-21 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768521#comment-15768521 ] Dmitriy Lyubimov commented on MAHOUT-1856: -- one thing -- we usually squash working braches

Re: Git branching policy

2016-12-16 Thread Dmitriy Lyubimov
we work with much simpler PR model which i don't think fits this very well. i take it PRs will have to be the "feature" branches and will have to be posted against that develop branch instead of the master. This will complicate things unnecessarily IMO. It probably is ok model for tightly knit

[jira] [Commented] (MAHOUT-1892) Can't broadcast vector in Mahout-Shell

2016-11-15 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668009#comment-15668009 ] Dmitriy Lyubimov commented on MAHOUT-1892: -- Shell is a mystery. Obviously it tries to drag

Re: [DISCUSS] More meaningful error when running on Spark 2.0

2016-11-15 Thread Dmitriy Lyubimov
+1 on version checking. And, there's a little bug as well. this error is technically generated by something like dense(Set.empty[Vector]), i.e., it cannot form a matrix out of an empty collection of vectors. While this is true, i suppose it needs a `require(...)` insert there to generate a more

[jira] [Comment Edited] (MAHOUT-1884) Allow specification of dimensions of a DRM

2016-10-04 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546663#comment-15546663 ] Dmitriy Lyubimov edited comment on MAHOUT-1884 at 10/4/16 9:10 PM

[jira] [Comment Edited] (MAHOUT-1884) Allow specification of dimensions of a DRM

2016-10-04 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546663#comment-15546663 ] Dmitriy Lyubimov edited comment on MAHOUT-1884 at 10/4/16 9:09 PM

[jira] [Comment Edited] (MAHOUT-1884) Allow specification of dimensions of a DRM

2016-10-04 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546663#comment-15546663 ] Dmitriy Lyubimov edited comment on MAHOUT-1884 at 10/4/16 9:08 PM

[jira] [Commented] (MAHOUT-1884) Allow specification of dimensions of a DRM

2016-10-04 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546663#comment-15546663 ] Dmitriy Lyubimov commented on MAHOUT-1884: -- drmWrap is not internal in the least (which is why

[jira] [Commented] (MAHOUT-1884) Allow specification of dimensions of a DRM

2016-10-03 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543437#comment-15543437 ] Dmitriy Lyubimov commented on MAHOUT-1884: -- Which api is this about specifically? wrapping

Re: [jira] [Created] (MAHOUT-1884) Allow specification of dimensions of a DRM

2016-10-03 Thread Dmitriy Lyubimov
this has been covered by drwWrap() signature from the very beginning. I vote this as non-issue. On Sun, Oct 2, 2016 at 11:51 PM, Sebastian Schelter (JIRA) wrote: > Sebastian Schelter created MAHOUT-1884: > -- > > Summary:

Re: Hi, How could I get involved into mahout?

2016-09-26 Thread Dmitriy Lyubimov
lab > and, honestly, I only have basic level of Java programing skill. But I > believe I could learn more about how to use Java by reading the codebase of > mahout, trust me ;). > > Best Regards, > MikeLing > > 2016-09-22 6:12 GMT+08:00 Dmitriy Lyubimov <dlie...@gmail.co

Re: Hi, How could I get involved into mahout?

2016-09-21 Thread Dmitriy Lyubimov
ps another way to approach it, which in fact seems to be most common motivator here, is to start with a pragmatic problem one already has at hand. Abstract tinkering rarely produces strategically useful contributions, it seems. On Wed, Sep 21, 2016 at 3:09 PM, Dmitriy Lyubimov <dlie...@gmail.

Re: Hi, How could I get involved into mahout?

2016-09-21 Thread Dmitriy Lyubimov
if you can tell us about your background a little bit, perhaps we could have ideas. frankly we have a pretty sprawling roadmap. At least a set of ideas. It's frankly more than we can realistically do, we can use help, yes. On Sat, Sep 17, 2016 at 8:52 AM, Tiramisu Ling

Re: Recommenders and MABs

2016-09-21 Thread Dmitriy Lyubimov
there's been a great blog on that somewhere on richrelevance blog... But i have a vague feeling based on what you are saying it may be all old news to you... [1] http://engineering.richrelevance.com/bandits-recommendation-systems/ and there's more in the series On Sat, Sep 17, 2016 at 3:10 PM,

Re: Machine Learning algorithm implementation

2016-09-21 Thread Dmitriy Lyubimov
We primarily think in platform-independent, R-like way now. http://mahout.apache.org/users/sparkbindings/home.html We hope it should be a good news for algebraic algorithm implementers like you. Samsara is mapped into spark, flink and H20 as it stands (no mapreduce, you are correct in that). We

Re: Mahout distro Size

2016-09-06 Thread Dmitriy Lyubimov
t least it used to) to produce > tons of lib-managed deps as the result of its build, they probably still > have? > > > Do you mean using something like Spark's dependency resolver? > > ________ > From: Dmitriy Lyubimov <dlie...@gmail.com>

Re: Mahout distro Size

2016-09-06 Thread Dmitriy Lyubimov
PS i probably should not say "probably definitely" next to each other. Definitely just definitely :) On Tue, Sep 6, 2016 at 1:46 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > 2 + 1 > 3 + 1 > > 4: other projects do something too. spark (at least it used to) to p

Re: Mahout distro Size

2016-09-06 Thread Dmitriy Lyubimov
2 + 1 3 + 1 4: other projects do something too. spark (at least it used to) to produce tons of lib-managed deps as the result of its build, they probably still have? On the other hand, the samsara only dependencies are really light. backends are really always "provided", and the rest of it is

Re: What's the ETA for next Mahout release

2016-08-18 Thread Dmitriy Lyubimov
t; Can we call a vote for a maintenance release with the lucene compatibility > upgraded to 5.x from 4.x. > > I am assuming it should deserve it's own release. What do the other devs > think about a maintenance release. > > On Thu, Aug 18, 2016 at 1:07 PM, Dmitriy Lyubimov

Re: What's the ETA for next Mahout release

2016-08-18 Thread Dmitriy Lyubimov
There's no consensus. There are a couple features coming that would inspire 0.13 but it's hard to tell when they are going to be completed due to uncertainties in contributor's schedule. current estimate september-october for the 0.13. Maintenance releases could be cut pretty much at will

Re: Traits for a mahout algorithm Library.

2016-07-21 Thread Dmitriy Lyubimov
On Thu, Jul 21, 2016 at 12:35 PM, Trevor Grant wrote: > > > Finally, re data-frames. Why not leave it as vectors and matrices? > Short answer: because (imo) data frames are not vectors and matrices. Longer argumentation: Some capabilities expected of data frames are

Re: Traits for a mahout algorithm Library.

2016-07-21 Thread Dmitriy Lyubimov
sk-learn learner, transformer and predictor features sound good to me, tried-and-proven most importantly imo we need strong established type system and not repeat what i view as a problem in some other offerings. If the type system is strict and limited in size, then there's much less need in

Re: Location of JARs

2016-06-02 Thread Dmitriy Lyubimov
of things." -Virgil* > > > On Thu, Jun 2, 2016 at 12:23 PM, Dmitriy Lyubimov <dlie...@gmail.com> > wrote: > > > i already looked. my main concern is that it meddles with spark > interpreter > > code too much which may create friction with spark interpreters in

Re: Location of JARs

2016-06-02 Thread Dmitriy Lyubimov
> https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > On Wed, Jun 1, 2016 at 12:48 PM, Dmitriy Lyubimov <dlie...@gmail.co

Re: Location of JARs

2016-06-01 Thread Dmitriy Lyubimov
On Wed, Jun 1, 2016 at 10:46 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > > > On Wed, Jun 1, 2016 at 7:47 AM, Trevor Grant <trevor.d.gr...@gmail.com> > wrote: > >> >> Other approaches? >> >> For background, Zeppelin starts a Spark Shell and

Re: Location of JARs

2016-06-01 Thread Dmitriy Lyubimov
On Wed, Jun 1, 2016 at 7:47 AM, Trevor Grant wrote: > > Other approaches? > > For background, Zeppelin starts a Spark Shell and we need to make sure all > of the required Mahout jars get loaded in the class path when spark starts. > The question is where do all of these

Re: Location of JARs

2016-06-01 Thread Dmitriy Lyubimov
I am just going to give you some design intents in the existing code. as far as i can recollect, mahout context gives complete flexibility. You can control the behavior but various degrees of overriding the default behavior and doing more or less work on context setup on your own. (I assume we

Re: Hadoop 1 Support Going forward

2016-05-26 Thread Dmitriy Lyubimov
I think this is not an issue of our choice, but an issue of capability. As far as i have witnessed during past 2 years, capability to do anything with MR is at the very least lacking on the grandest of scales, if not completely gone from this project. On Thu, May 26, 2016 at 12:37 PM, Andrew

Re: Future Mahout - Zeppelin work

2016-05-20 Thread Dmitriy Lyubimov
ar/9070e0f6916a0fbda7a5) > > > >>>>> > > > >>>>> All of this doesn't necessarily require any changing of the > > Zeppelin > > > >>>> source > > > >>>>> code, and isn't very intrusive or difficult to set up, I'll ma

Re: Future Mahout - Zeppelin work

2016-05-19 Thread Dmitriy Lyubimov
still in reply to the blog: i wish though zeppelin had a true mahout interpreter. all it basically requires is to reuse spark settings but execute proper imports and context init, and provide this "tablify" routine somehow. On Thu, May 19, 2016 at 1:55 PM, Dmitriy Lyubimov <dlie

Re: Future Mahout - Zeppelin work

2016-05-19 Thread Dmitriy Lyubimov
Trevor, terrific job on zeppelin post btw. thanks! On Thu, May 19, 2016 at 1:54 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > Trevor, left a comment on your blog before realizing i should've really be > commenting here... > > -d > > On Wed, May 18, 2016 at 9:0

Re: Future Mahout - Zeppelin work

2016-05-19 Thread Dmitriy Lyubimov
Trevor, left a comment on your blog before realizing i should've really be commenting here... -d On Wed, May 18, 2016 at 9:05 PM, Andrew Palumbo wrote: > In mahout 0.13 well be looking row reduction methods other than just > sampling to transform DRM -> matrix so that it

[jira] [Comment Edited] (MAHOUT-1791) Automatic threading for java based mmul in the front end and the backend.

2016-05-04 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271648#comment-15271648 ] Dmitriy Lyubimov edited comment on MAHOUT-1791 at 5/5/16 12:03 AM

[jira] [Commented] (MAHOUT-1791) Automatic threading for java based mmul in the front end and the backend.

2016-05-04 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271648#comment-15271648 ] Dmitriy Lyubimov commented on MAHOUT-1791: -- experiments show that native solvers + 2x backend

Re: stochastic nature

2016-05-02 Thread Dmitriy Lyubimov
think we ever override that. On Mon, May 2, 2016 at 9:39 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > by probabilistic algorithms i mostly mean inference involving monte carlo > type mechanisms (Gibbs sampling LDA which i think might still be part of > our MR collection migh

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-05-02 Thread Dmitriy Lyubimov
graph = graft, sorry. Graft just the AtB class into 0.12 codebase. On Mon, May 2, 2016 at 9:06 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > ok. > > Nikaash, > could you perhaps do one more experiment and graph the 0.10 a'b code into > 0.12 code (or whatever branch y

Re: stochastic nature

2016-05-02 Thread Dmitriy Lyubimov
stic algorithms. If mahout doesn’t use probabilistic > algos then how does it accomplish a degree of optimal parallelization ? > Wouldn’t you need randomization to spread out the processing of tasks. > > > On May 2, 2016, at 12:13 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: &

Re: stochastic nature

2016-05-02 Thread Dmitriy Lyubimov
yes mahout has stochastic svd and pca which are described at length in the samsara book. The book examples in Andrew Palumbo's github also contain an example of computing k-means|| sketch. if you mean _probabilistic_ algorithms, although i have done some things outside the public domain, nothing

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-05-02 Thread Dmitriy Lyubimov
stuff in the BLAS optimizer Dmitriy put in. It is something like 10x >> faster than a similar algo in Hadoop MR. This particular calc and >> generalization is not in any other Spark or now Flink lib that I know of. >> >> >> On Apr 29, 2016, at 11:24 AM, Dmitriy Lyubimov

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-04-29 Thread Dmitriy Lyubimov
this fixes your problem, also if > you want to create a Jira it will guarantee I don’t forget. > > > On Apr 29, 2016, at 9:23 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > > yes -- i would do it as an optional option -- just like par does -- do > nothing; try auto, or tr

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-04-29 Thread Dmitriy Lyubimov
n to comment out the .par(auto=true) and see > if it makes a difference? > > > On Apr 29, 2016, at 8:53 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > > can you please look into spark UI and write down how many split the job > generates in the first stage of the

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-04-29 Thread Dmitriy Lyubimov
I was replying to Nikaash. Sorry -- list keeps rejecting replies because of the size, i had to remove the content On Fri, Apr 29, 2016 at 9:05 AM, Khurrum Nasim <khurrum.na...@useitc.com> wrote: > Is that for me Dimitry ? > > > > > On Apr 29, 2016, at 11:53 AM

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-04-29 Thread Dmitriy Lyubimov
can you please look into spark UI and write down how many split the job generates in the first stage of the pipeline, or anywhere else there's signficant variation in # of splits in both cases? the row similarity is a very short pipeline (in comparison with what would normally be on average). so

Re: Mahout contributions

2016-04-28 Thread Dmitriy Lyubimov
there might be a concept of "contrib" sub project with totally separate code tree, some asf projects do that. that way it is easy to keep it around if it turns out to be useful, and easy to strip off if it becomes unsupported (sorry for pragmatic cynicism) On Thu, Apr 28, 2016 at 2:48 PM, Khurrum

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Dmitriy Lyubimov
Prakash, (1) to be clear, the ASF trademark and branding policy is not to endorse views of the 3rd party publications and to ask 3rd party writers to do a disclosure that their views are not endorsed by ASF project. To that end, ASF project can't really tell you that some publication is

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-04-28 Thread Dmitriy Lyubimov
(sorry for repetition, the list rejects my previous replies due to quoted message size) "Auto" just reclusters the input per given _configured cluster capacity_ (there's some safe guard there though i think that doesn't blow up # of splits if the initial number of splits is ridiculously small

Re: About reuters-fkmeans-centroids

2016-04-28 Thread Dmitriy Lyubimov
Prakash, if you are using any Mahout Mapreduce algorithm for research, please make sure to make this disclosure: all Mahout MapReduce algorithms are officially not supported and deprecated since February, 2014 (IIRC). I can dig up a specific issue regarding this. There also has been an

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-04-27 Thread Dmitriy Lyubimov
0.11 targets 1.3+. I don't quite have anything on top of my head affecting A'B specifically, but i think there were some chanages affecting in-memory multiplication (which is of course used in distributed A'B). I am not in particular familiar or remember details of row similarity on top of my

Re: Congratulations to our new Chair

2016-04-20 Thread Dmitriy Lyubimov
congrats! On Wed, Apr 20, 2016 at 4:55 PM, Suneel Marthi wrote: > Please join me in congratulating Andrew Palumbo on becoming our new Project > Chair. > > As for me, it was a pleasure to serve as Chair starting with the Mahout > 0.10.0 release and ending with the recent

  1   2   3   4   5   6   7   8   9   10   >