Re: Renumbering new releases

2019-04-30 Thread Dmitriy Lyubimov
I am ok with 1.0 On Thu, Apr 18, 2019 at 6:10 PM Andrew Musselman wrote: > We've been discussing moving to a 1.0 release for a few years now. This > past quarter we had a comment on our board report about whether we would > consider getting out of the 0.x releases. > > I think it makes sense es

Re: Hangouts

2018-07-31 Thread Dmitriy Lyubimov
I am on vacation this week fyi On Tue, Jul 31, 2018 at 11:36 AM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > Cool, I'll shoot for something on Friday early Pacific time and put an > invite in here; looking forward to it! > > On Sat, Jul 28, 2018 at 9:26 AM Shannon Quinn wrote: > > >

Re: Congrats Palumbo and Holden

2018-05-02 Thread Dmitriy Lyubimov
Congrats! On Wed, May 2, 2018 at 1:25 PM, Trevor Grant wrote: > Both were just elected new ASF members!! > > https://s.apache.org/D6iz >

Re: Board Report

2018-04-24 Thread Dmitriy Lyubimov
LGTM -d On Tue, Apr 24, 2018 at 9:48 AM, Andrew Palumbo wrote: > Hello all, > The Mahout PMC would like to involve the community more in filling out > board reports. This will hopefully help us to learn some of the needs of > Mahout devs and users. > > > https://docs.google.com/document/d/1q7nO

Re: Backlog - Reordered Top->Down

2018-03-06 Thread Dmitriy Lyubimov
Thank you. On Sat, Mar 3, 2018 at 1:45 PM, André Santi Oliveira wrote: > Things which were in the backlog were organized (top->down) using criteria: > > - Fix Version(s) > - Priority > - Type > > > If you don't agree with the order of some things which are there, feel free > to move around

Re: Updating Wikipedia

2018-02-19 Thread Dmitriy Lyubimov
I think Suneel was modifying it... On Sun, Feb 18, 2018 at 7:02 AM, Trevor Grant wrote: > Is anyone good at Wikipedia? > > We're still listed as being primarily running on Hadoop there. > > https://en.wikipedia.org/wiki/Apache_Mahout > > If anyone has some skills/time- an update would be cool...

Re: MathJax not renedering on Website

2017-09-12 Thread Dmitriy Lyubimov
PS http://docs.mathjax.org/en/latest/start.html "We retired our self-hosted CDN at cdn.mathjax.org in April, 2017. We recommend using cdnjs.com which uses the same provider. [...]" On Tue, Sep 12, 2017 at 9:52 AM, Dmitriy Lyubimov wrote: > mathjax we use runs on cdn. We can eith

Re: MathJax not renedering on Website

2017-09-12 Thread Dmitriy Lyubimov
mathjax we use runs on cdn. We can either re-host it ourselves to avoid the dependency (and also shield from incompatibilities introduced by version update), or follow whatever they use. their currently advertised cdn location seems to be (which indeed looks different to me from before) https://c

Re: [DISCUSS} New feature - DRM and in-core matrix sort and required test suites for modules.

2017-09-05 Thread Dmitriy Lyubimov
haps). I am just saying this problem may > benefit from some more broad thinking of the issue in optimization tree > sense, i.e., why we do it, which things will use it and which things will > preserve/mess it up etc. > > > Agreed re: more broad thinking yes- just getting the c

Re: [DISCUSS} New feature - DRM and in-core matrix sort and required test suites for modules.

2017-09-05 Thread Dmitriy Lyubimov
hings will preserve/mess it up etc. On Tue, Sep 5, 2017 at 3:01 PM, Dmitriy Lyubimov wrote: > In general, +1, don't see why not. > > Q -- is it something that you have encountered while doing algebra? I.e., > do you need the sorted DRM to continue algebraic operations between >

Re: [DISCUSS} New feature - DRM and in-core matrix sort and required test suites for modules.

2017-09-05 Thread Dmitriy Lyubimov
In general, +1, don't see why not. Q -- is it something that you have encountered while doing algebra? I.e., do you need the sorted DRM to continue algebraic operations between optimizer barriers, or you just need an RDD as the outcome of all this? if it is just an RDD, then you could just do a s

Re: Looking for help with a talk

2017-08-10 Thread Dmitriy Lyubimov
yes sure, as befire, i can review On Fri, Aug 4, 2017 at 1:12 AM, Isabel Drost-Fromm wrote: > Hi, > > I have a first draft of a narrative and slide deck. If anyone has time it > would be lovely to bounce some ideas back and forth, have the draft of the > deck reviewed. > > > Isabel > >

Re: Proposal for changing Mahout's Git branching rules

2017-07-20 Thread Dmitriy Lyubimov
d the dev branch that she will then be asked to merge to. Which WILL catch the unsuspecting contributor unawares. They will find they'd have a significant divergence to overcome in order to attain the mergeability of their work. On Thu, Jul 20, 2017 at 9:06 AM, Dmitriy Lyubimov wrote: &g

Re: Proposal for changing Mahout's Git branching rules

2017-07-20 Thread Dmitriy Lyubimov
On Fri, Jun 23, 2017 at 8:23 AM, Pat Ferrel wrote: > I don’t know where to start here. Git flow does not address the merge > conflict problems you talk about. They have nothing to do with the process > and are made no easier or harder by following it. > I thought i did demonstrate that it does m

Re: [DISCUSS] Naming convention for multiple spark/scala combos

2017-07-07 Thread Dmitriy Lyubimov
it would seem 2nd option is preferable if doable. Any option that has most desirable combinations prebuilt, is preferable i guess. Spark itself also releases tons of hadoop profile binary variations. so i don't have to build one myself. On Fri, Jul 7, 2017 at 8:57 AM, Trevor Grant wrote: > Hey a

Re: Density based Clustering in Mahout

2017-07-06 Thread Dmitriy Lyubimov
too (after all it is just a bunch of bytes after serialization, can't get any more basic than that). On Thu, Jul 6, 2017 at 11:21 AM, Dmitriy Lyubimov wrote: > > > On Thu, Jul 6, 2017 at 9:45 AM, Trevor Grant > wrote: > >> To Dmitriy's point (2)- I think it is

Re: Density based Clustering in Mahout

2017-07-06 Thread Dmitriy Lyubimov
On Thu, Jul 6, 2017 at 9:45 AM, Trevor Grant wrote: > To Dmitriy's point (2)- I think it is acceptable to create an R-Tree > structure, that will exist only within the algorithm for doing in-core > operations, (or maybe it lives slightly outside of the algorithm so we > don't need to recreate tre

Re: Density based Clustering in Mahout

2017-07-05 Thread Dmitriy Lyubimov
ed to similar effects as the probabilistic sketches techniques mentioned in the book, but with much more headache for the bang. I eventually turned to solving problems pre-sketched one way or another (including for density clustering problems). On Wed, Jul 5, 2017 at 8:59 AM, Dmitriy Lyubimov

Re: Density based Clustering in Mahout

2017-07-05 Thread Dmitriy Lyubimov
(1) I abandoned any attempts at DBScan and implemented another density algorithm itself (can't say which, subject to patent restrictions). The reason being, i couldn't immediately figure how to parallelize it efficiently (aside from data structure discussions), the base algorithm is inherently iter

Re: Proposal for changing Mahout's Git branching rules

2017-06-22 Thread Dmitriy Lyubimov
... when working on github, any deviation from github commonly accepted PR flows imo would be a fatal wound to the process. On Thu, Jun 22, 2017 at 4:13 PM, Dmitriy Lyubimov wrote: > should read > > And then you will face the dilemma whether to ask people to resolve merge > issues w.r.

Re: Proposal for changing Mahout's Git branching rules

2017-06-22 Thread Dmitriy Lyubimov
and plain errors. On Thu, Jun 22, 2017 at 2:48 PM, Dmitriy Lyubimov wrote: > > > On Wed, Jun 21, 2017 at 3:00 PM, Pat Ferrel wrote: > >> Which is an option part of git flow but maybe take a look at a better >> explanation than mine: http://nvie.com/posts/a-succes >> sf

Re: Proposal for changing Mahout's Git branching rules

2017-06-22 Thread Dmitriy Lyubimov
On Wed, Jun 21, 2017 at 3:00 PM, Pat Ferrel wrote: > Which is an option part of git flow but maybe take a look at a better > explanation than mine: http://nvie.com/posts/a-successful-git-branching- > model/ > > I still don’t see how this c

Re: Proposal for changing Mahout's Git branching rules

2017-06-21 Thread Dmitriy Lyubimov
PS. but i see the rational. to have stable fixes to get into release. perhaps named release branches is still a way to go if one cuts them early enough. On Wed, Jun 21, 2017 at 2:25 PM, Dmitriy Lyubimov wrote: > > > On Wed, Jun 21, 2017 at 2:17 PM, Pat Ferrel wrote: > > Since

Re: Proposal for changing Mahout's Git branching rules

2017-06-21 Thread Dmitriy Lyubimov
On Wed, Jun 21, 2017 at 2:17 PM, Pat Ferrel wrote: Since merges are done by committers, it’s easy to retarget a contributor’s > PRs but committers would PR against develop, IMO it is anything but easy to resolve conflicts, let alone somebody else's. Spark just asks me to resolve them myself. But

Re: Looking for help with a talk

2017-06-21 Thread Dmitriy Lyubimov
Isabel, you obviously always can call me with any questions. -D On Sat, May 27, 2017 at 2:01 PM, Isabel Drost-Fromm wrote: > Hi, > > I've been invited to give a machine learning centric keynote at FrOSCon > (free and open source conference, the little sister of FOSDEM, roughly 2500 > attendees o

Re: [DISCUSS] remove getMahoutHome from sparkbindings

2017-06-21 Thread Dmitriy Lyubimov
i think the idea was that mahout has control over application jars and guarantees that the minimum amount of mahout jars is added to the application. In order to do that, it needs to be able to figure the home (even if it is an uber jar, as some devs have been pushing for for some time now, so it m

Re: new committer: Dustin Vanstee

2017-06-21 Thread Dmitriy Lyubimov
welcome! On Wed, Jun 21, 2017 at 2:07 PM, Pat Ferrel wrote: > Welcome Dustin! > > Nice work so far, much needed. > > > On Jun 21, 2017, at 12:08 PM, Andrew Palumbo wrote: > > Welcome Dustin! > > > > Sent from my Verizon Wireless 4G LTE smartphone > > > Original message > From:

Re: Proposal for changing Mahout's Git branching rules

2017-06-21 Thread Dmitriy Lyubimov
so people need to make sure their PR merges to develop instead of master? Do they need to PR against develop branch, and if not, who is responsible for confict resolution then that is to arise from diffing and merging into different targets? On Tue, Jun 20, 2017 at 10:09 AM, Pat Ferrel wrote: >

Re: Redesign

2017-05-10 Thread Dmitriy Lyubimov
I am actually totally blown away! Thanks! On Wed, May 10, 2017 at 10:32 AM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > Massive thanks to Trevor and Dustin for the work redesigning/implementing > the website; for the last mile of look and feel I reached out to a designer > who's inter

[jira] [Commented] (MAHOUT-1946) ViennaCL not being picked up by JNI

2017-05-09 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003767#comment-16003767 ] Dmitriy Lyubimov commented on MAHOUT-1946: -- it seems so. load library fai

[jira] [Commented] (MAHOUT-1946) ViennaCL not being picked up by JNI

2017-05-09 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003640#comment-16003640 ] Dmitriy Lyubimov commented on MAHOUT-1946: -- Native backends should be abl

Re: Welcome New Committer Nikolay Sakharnykh

2017-05-01 Thread Dmitriy Lyubimov
Welcome!! On Wed, Apr 26, 2017 at 8:05 PM, Nikolai Sakharnykh wrote: > Hello everyone, > > I’m sorry for some delay with my introduction, have been swamped with > other projects recently ☺ > > Having worked at NVIDIA for around 8 years I have seen GPUs to evolve from > specialized graphics proce

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

2017-04-21 Thread Dmitriy Lyubimov
BHARAT < h2016...@pilani.bits-pilani.ac.in> wrote: > One is the cluster ID of the Index to which the data point should be > assigned. > As per what is given in this book Apache-Mahout-Mapreduce-Dmitriy-Lyubimov > <http://www.amazon.in/Apache-Mahout-Mapreduce-Dmitriy- > Lyub

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

2017-04-14 Thread Dmitriy Lyubimov
ks which you have mentioned above > > > Thanks & Regards > Parth Khatwani > > > > > > On Thu, Apr 13, 2017 at 6:24 PM, KHATWANI PARTH BHARAT < > h2016...@pilani.bits-pilani.ac.in> wrote: > > > Dmitriy Sir, > > I have Created a github branch Git

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

2017-04-12 Thread Dmitriy Lyubimov
} > } > index > } > >//calculating the sum of squared distance between the points(Vectors) > def ssr(a: Vector, b: Vector): Double = { > (a - b) ^= 2 sum > } > > //method used to create (1|D) > def addCentriodColumn(arg: Array[Double]): A

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

2017-03-31 Thread Dmitriy Lyubimov
ps1 this assumes row-wise construction of A based on training set of m n-dimensional points. ps2 since we are doing multiple passes over A it may make sense to make sure it is committed to spark cache (by using checkpoint api), if spark is used On Fri, Mar 31, 2017 at 10:53 AM, Dmitriy Lyubimov

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

2017-03-31 Thread Dmitriy Lyubimov
ure out the way ahead. > > Like how to create the augmented matrix A := (0|D) which you have > > mentioned. > > > > > > On Fri, Mar 31, 2017 at 10:10 PM, Dmitriy Lyubimov > > wrote: > > > >> was my reply for your post on @user has been a bit confusing? &

Re: Trying to write the KMeans Clustering Using "Apache Mahout Samsara"

2017-03-31 Thread Dmitriy Lyubimov
was my reply for your post on @user has been a bit confusing? On Fri, Mar 31, 2017 at 8:40 AM, KHATWANI PARTH BHARAT < h2016...@pilani.bits-pilani.ac.in> wrote: > Sir, > I am trying to write the kmeans clustering algorithm using Mahout Samsara > but i am bit confused > about how to leverage Distr

Re: Native CUDA support

2017-03-27 Thread Dmitriy Lyubimov
thanks. JCuda sounds good. :) On Fri, Mar 10, 2017 at 9:06 AM, Nikolai Sakharnykh wrote: > Hello everyone, > > We're actively working on adding native CUDA support to Apache Mahout. > Currently, GPU acceleration is enabled through ViennaCL ( > http://viennacl.sourceforge.net/). ViennaCL is a li

Re: Marketing

2017-03-24 Thread Dmitriy Lyubimov
On Fri, Mar 24, 2017 at 8:27 AM, Pat Ferrel wrote: > The multiple backend support is such a waste of time IMO. The DSL and GPU > support is super important and should be made even more distributed. The > current (as I understand it) single threaded GPU per VM is only the first > step in what will

Re: Contributing an algorithm for samsara

2017-03-03 Thread Dmitriy Lyubimov
can be reasonably convinced it is actually better that way than vanilla R contract, but IMO it would be really useful to retain 100% compatibility there since it is one of ideas there -- retain R-like-ness with these things. On Fri, Mar 3, 2017 at 12:31 PM, Dmitriy Lyubimov wrote: > > &g

Re: Contributing an algorithm for samsara

2017-03-03 Thread Dmitriy Lyubimov
On Fri, Mar 3, 2017 at 4:09 AM, Jim Jagielski wrote: > >> >> > >> > >> > 3) On the feature extraction per R like formula can you elaborate more >> here, are you talking about feature extraction using R like dataframes and >> operators? >> > > Yes. I would start doing generic formula parser and the

Re: Contributing an algorithm for samsara

2017-03-03 Thread Dmitriy Lyubimov
I am getting a liittle bit lost who asked what here, inline. On Fri, Mar 3, 2017 at 4:09 AM, Jim Jagielski wrote: > > > Would it make sense to keep them as-is, and "pull them out", as > it were, should they prove to be wanted/needed by the other algo users? > I would hope it is of some help (es

Re: Contributing an algorithm for samsara

2017-02-17 Thread Dmitriy Lyubimov
in particular, this is the samsara implementation of double-weighed als : https://github.com/apache/mahout/pull/14/files#diff-0fbeb8b848ed0c5e3f782c72569cf626 On Fri, Feb 17, 2017 at 1:33 PM, Dmitriy Lyubimov wrote: > Jim, > > if ALS is of interest, and as far as weighed ALS is

Re: Contributing an algorithm for samsara

2017-02-17 Thread Dmitriy Lyubimov
Jim, if ALS is of interest, and as far as weighed ALS is concerned (since we already have trivial regularized ALS in the "decompositions" package), here's uncommitted samsara-compatible patch from a while back: https://issues.apache.org/jira/browse/MAHOUT-1365 it combines weights on both data poi

[jira] [Commented] (MAHOUT-1940) Provide a Java API to SimilarityAnalysis and any other needed APIs

2017-02-14 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866853#comment-15866853 ] Dmitriy Lyubimov commented on MAHOUT-1940: -- Normally, one who is writin

Re: [jira] [Updated] (MAHOUT-1939) fastutil version clash with spark distributions

2017-02-13 Thread Dmitriy Lyubimov
> > > > Key: MAHOUT-1939 > > URL: https://issues.apache.org/jira/browse/MAHOUT-1939 > > Project: Mahout > > Issue Type: Bug > >Reporter: Dmitriy Lyubimov > >Priority: Blocker >

Re: [jira] [Commented] (MAHOUT-1939) fastutil version clash with spark distributions

2017-02-13 Thread Dmitriy Lyubimov
t; > > > fastutil version clash with spark distributions > > --- > > > > Key: MAHOUT-1939 > > URL: https://issues.apache.org/jira/browse/MAHOUT-1939 > > Proje

[jira] [Comment Edited] (MAHOUT-1939) fastutil version clash with spark distributions

2017-02-10 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862045#comment-15862045 ] Dmitriy Lyubimov edited comment on MAHOUT-1939 at 2/11/17 12:3

[jira] [Comment Edited] (MAHOUT-1939) fastutil version clash with spark distributions

2017-02-10 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862045#comment-15862045 ] Dmitriy Lyubimov edited comment on MAHOUT-1939 at 2/11/17 12:1

[jira] [Commented] (MAHOUT-1939) fastutil version clash with spark distributions

2017-02-10 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862045#comment-15862045 ] Dmitriy Lyubimov commented on MAHOUT-1939: -- perhaps mahout should include

[jira] [Created] (MAHOUT-1939) fastutil version clash with spark distributions

2017-02-10 Thread Dmitriy Lyubimov (JIRA)
Dmitriy Lyubimov created MAHOUT-1939: Summary: fastutil version clash with spark distributions Key: MAHOUT-1939 URL: https://issues.apache.org/jira/browse/MAHOUT-1939 Project: Mahout

Re: Intro from a lurker

2017-02-09 Thread Dmitriy Lyubimov
Jim, let me start by stating it's an (unexpected on my side) honor. Are you willing to get hands-on at this point in numerical problems (or have resources that can get hands-on)? Short modern Mahout story (as short as it is possible to be short) Most nagging problem: lack of support by industry a

[jira] [Resolved] (MAHOUT-1916) mahout bug

2017-01-26 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov resolved MAHOUT-1916. -- Resolution: Invalid > mahout bug > -- > > Key:

Re: request to point out for my error

2017-01-11 Thread Dmitriy Lyubimov
so what's the error? On Wed, Jan 11, 2017 at 10:02 AM, Andrew Palumbo wrote: > Welcome Cherryko. > > > > Sent from my Verizon Wireless 4G LTE smartphone > > > Original message > From: Suneel Marthi > Date: 01/11/2017 9:30 AM (GMT-08:00) > To: mahout > Cc: apailn...@gmail.com

[jira] [Commented] (MAHOUT-1882) SequentialAccessSparseVector inerateNonZeros is incorrect.

2017-01-09 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812862#comment-15812862 ] Dmitriy Lyubimov commented on MAHOUT-1882: -- the fix is as easy as filterin

Re: Newbie. Getting Started with Mahout. Found an issue in the documentation + Potential Fix

2016-12-21 Thread Dmitriy Lyubimov
we use something called Apache CMS.. We also host documentation via github (gh-pages branch) On Sat, Dec 17, 2016 at 7:53 PM, Abhishek Goswami wrote: > Hi Folks. > > I made some progress getting started with using Jekyll. I am fairly new to > Jekyll, so to start off I tried porting my existing

[jira] [Commented] (MAHOUT-1856) Create a framework for new Mahout Clustering, Classification, and Optimization Algorithms

2016-12-21 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768521#comment-15768521 ] Dmitriy Lyubimov commented on MAHOUT-1856: -- one thing -- we usually sq

Re: Git branching policy

2016-12-16 Thread Dmitriy Lyubimov
we work with much simpler PR model which i don't think fits this very well. i take it PRs will have to be the "feature" branches and will have to be posted against that develop branch instead of the master. This will complicate things unnecessarily IMO. It probably is ok model for tightly knit tea

[jira] [Commented] (MAHOUT-1892) Can't broadcast vector in Mahout-Shell

2016-11-15 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668009#comment-15668009 ] Dmitriy Lyubimov commented on MAHOUT-1892: -- Shell is a mystery. Obviousl

Re: [DISCUSS] More meaningful error when running on Spark 2.0

2016-11-15 Thread Dmitriy Lyubimov
+1 on version checking. And, there's a little bug as well. this error is technically generated by something like dense(Set.empty[Vector]), i.e., it cannot form a matrix out of an empty collection of vectors. While this is true, i suppose it needs a `require(...)` insert there to generate a more m

[jira] [Comment Edited] (MAHOUT-1884) Allow specification of dimensions of a DRM

2016-10-04 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546663#comment-15546663 ] Dmitriy Lyubimov edited comment on MAHOUT-1884 at 10/4/16 9:1

[jira] [Comment Edited] (MAHOUT-1884) Allow specification of dimensions of a DRM

2016-10-04 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546663#comment-15546663 ] Dmitriy Lyubimov edited comment on MAHOUT-1884 at 10/4/16 9:0

[jira] [Comment Edited] (MAHOUT-1884) Allow specification of dimensions of a DRM

2016-10-04 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546663#comment-15546663 ] Dmitriy Lyubimov edited comment on MAHOUT-1884 at 10/4/16 9:0

[jira] [Commented] (MAHOUT-1884) Allow specification of dimensions of a DRM

2016-10-04 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546663#comment-15546663 ] Dmitriy Lyubimov commented on MAHOUT-1884: -- drmWrap is not internal in

[jira] [Commented] (MAHOUT-1884) Allow specification of dimensions of a DRM

2016-10-03 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543437#comment-15543437 ] Dmitriy Lyubimov commented on MAHOUT-1884: -- Which api is this a

Re: [jira] [Created] (MAHOUT-1884) Allow specification of dimensions of a DRM

2016-10-03 Thread Dmitriy Lyubimov
this has been covered by drwWrap() signature from the very beginning. I vote this as non-issue. On Sun, Oct 2, 2016 at 11:51 PM, Sebastian Schelter (JIRA) wrote: > Sebastian Schelter created MAHOUT-1884: > -- > > Summary: Allow specification o

Re: Hi, How could I get involved into mahout?

2016-09-26 Thread Dmitriy Lyubimov
y, I only have basic level of Java programing skill. But I > believe I could learn more about how to use Java by reading the codebase of > mahout, trust me ;). > > Best Regards, > MikeLing > > 2016-09-22 6:12 GMT+08:00 Dmitriy Lyubimov : > > > ps another way to approach i

Re: Hi, How could I get involved into mahout?

2016-09-21 Thread Dmitriy Lyubimov
ps another way to approach it, which in fact seems to be most common motivator here, is to start with a pragmatic problem one already has at hand. Abstract tinkering rarely produces strategically useful contributions, it seems. On Wed, Sep 21, 2016 at 3:09 PM, Dmitriy Lyubimov wrote: > if

Re: Hi, How could I get involved into mahout?

2016-09-21 Thread Dmitriy Lyubimov
if you can tell us about your background a little bit, perhaps we could have ideas. frankly we have a pretty sprawling roadmap. At least a set of ideas. It's frankly more than we can realistically do, we can use help, yes. On Sat, Sep 17, 2016 at 8:52 AM, Tiramisu Ling wrote: > Hey everyone, I'm

Re: Recommenders and MABs

2016-09-21 Thread Dmitriy Lyubimov
there's been a great blog on that somewhere on richrelevance blog... But i have a vague feeling based on what you are saying it may be all old news to you... [1] http://engineering.richrelevance.com/bandits-recommendation-systems/ and there's more in the series On Sat, Sep 17, 2016 at 3:10 PM, Pa

Re: Machine Learning algorithm implementation

2016-09-21 Thread Dmitriy Lyubimov
We primarily think in platform-independent, R-like way now. http://mahout.apache.org/users/sparkbindings/home.html We hope it should be a good news for algebraic algorithm implementers like you. Samsara is mapped into spark, flink and H20 as it stands (no mapreduce, you are correct in that). We

Re: Mahout distro Size

2016-09-06 Thread Dmitriy Lyubimov
to) to produce > tons of lib-managed deps as the result of its build, they probably still > have? > > > Do you mean using something like Spark's dependency resolver? > > ________ > From: Dmitriy Lyubimov > Sent: Tuesday, September 6, 2016 4:46:2

Re: Mahout distro Size

2016-09-06 Thread Dmitriy Lyubimov
PS i probably should not say "probably definitely" next to each other. Definitely just definitely :) On Tue, Sep 6, 2016 at 1:46 PM, Dmitriy Lyubimov wrote: > 2 + 1 > 3 + 1 > > 4: other projects do something too. spark (at least it used to) to produce > tons of lib-mana

Re: Mahout distro Size

2016-09-06 Thread Dmitriy Lyubimov
2 + 1 3 + 1 4: other projects do something too. spark (at least it used to) to produce tons of lib-managed deps as the result of its build, they probably still have? On the other hand, the samsara only dependencies are really light. backends are really always "provided", and the rest of it is fai

Re: What's the ETA for next Mahout release

2016-08-18 Thread Dmitriy Lyubimov
: > Can we call a vote for a maintenance release with the lucene compatibility > upgraded to 5.x from 4.x. > > I am assuming it should deserve it's own release. What do the other devs > think about a maintenance release. > > On Thu, Aug 18, 2016 at 1:07 PM, Dmitriy Lyubimov

Re: What's the ETA for next Mahout release

2016-08-18 Thread Dmitriy Lyubimov
There's no consensus. There are a couple features coming that would inspire 0.13 but it's hard to tell when they are going to be completed due to uncertainties in contributor's schedule. current estimate september-october for the 0.13. Maintenance releases could be cut pretty much at will (assumi

Re: Traits for a mahout algorithm Library.

2016-07-21 Thread Dmitriy Lyubimov
On Thu, Jul 21, 2016 at 12:35 PM, Trevor Grant wrote: > > > Finally, re data-frames. Why not leave it as vectors and matrices? > Short answer: because (imo) data frames are not vectors and matrices. Longer argumentation: Some capabilities expected of data frames are as follows. DFs are colum

Re: Traits for a mahout algorithm Library.

2016-07-21 Thread Dmitriy Lyubimov
sk-learn learner, transformer and predictor features sound good to me, tried-and-proven most importantly imo we need strong established type system and not repeat what i view as a problem in some other offerings. If the type system is strict and limited in size, then there's much less need in data

Re: Location of JARs

2016-06-02 Thread Dmitriy Lyubimov
ea this afternoon / remainder of > week. > > > > Trevor Grant > Data Scientist > https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Vir

Re: Location of JARs

2016-06-02 Thread Dmitriy Lyubimov
//github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > On Wed, Jun 1, 2016 at 12:48 PM, Dmitriy Lyubimov > wrote: > > > On Wed,

Re: Location of JARs

2016-06-01 Thread Dmitriy Lyubimov
On Wed, Jun 1, 2016 at 10:46 AM, Dmitriy Lyubimov wrote: > > > On Wed, Jun 1, 2016 at 7:47 AM, Trevor Grant > wrote: > >> >> Other approaches? >> >> For background, Zeppelin starts a Spark Shell and we need to make sure all >> of the required Mahout

Re: Location of JARs

2016-06-01 Thread Dmitriy Lyubimov
On Wed, Jun 1, 2016 at 7:47 AM, Trevor Grant wrote: > > Other approaches? > > For background, Zeppelin starts a Spark Shell and we need to make sure all > of the required Mahout jars get loaded in the class path when spark starts. > The question is where do all of these JARs relatively live. > H

Re: Location of JARs

2016-06-01 Thread Dmitriy Lyubimov
I am just going to give you some design intents in the existing code. as far as i can recollect, mahout context gives complete flexibility. You can control the behavior but various degrees of overriding the default behavior and doing more or less work on context setup on your own. (I assume we are

Re: Hadoop 1 Support Going forward

2016-05-26 Thread Dmitriy Lyubimov
I think this is not an issue of our choice, but an issue of capability. As far as i have witnessed during past 2 years, capability to do anything with MR is at the very least lacking on the grandest of scales, if not completely gone from this project. On Thu, May 26, 2016 at 12:37 PM, Andrew Palum

Re: Future Mahout - Zeppelin work

2016-05-20 Thread Dmitriy Lyubimov
> > >>> and > > > >>>>> python ( > https://gist.github.com/andershammar/9070e0f6916a0fbda7a5) > > > >>>>> > > > >>>>> All of this doesn't necessarily require any changing of the > > Zepp

Re: Future Mahout - Zeppelin work

2016-05-19 Thread Dmitriy Lyubimov
still in reply to the blog: i wish though zeppelin had a true mahout interpreter. all it basically requires is to reuse spark settings but execute proper imports and context init, and provide this "tablify" routine somehow. On Thu, May 19, 2016 at 1:55 PM, Dmitriy Lyubimov wrote:

Re: Future Mahout - Zeppelin work

2016-05-19 Thread Dmitriy Lyubimov
Trevor, terrific job on zeppelin post btw. thanks! On Thu, May 19, 2016 at 1:54 PM, Dmitriy Lyubimov wrote: > Trevor, left a comment on your blog before realizing i should've really be > commenting here... > > -d > > On Wed, May 18, 2016 at 9:05 PM, Andrew Palumbo >

Re: Future Mahout - Zeppelin work

2016-05-19 Thread Dmitriy Lyubimov
Trevor, left a comment on your blog before realizing i should've really be commenting here... -d On Wed, May 18, 2016 at 9:05 PM, Andrew Palumbo wrote: > In mahout 0.13 well be looking row reduction methods other than just > sampling to transform DRM -> matrix so that it fits in memory. This i

[jira] [Comment Edited] (MAHOUT-1791) Automatic threading for java based mmul in the front end and the backend.

2016-05-04 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271648#comment-15271648 ] Dmitriy Lyubimov edited comment on MAHOUT-1791 at 5/5/16 12:0

[jira] [Commented] (MAHOUT-1791) Automatic threading for java based mmul in the front end and the backend.

2016-05-04 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271648#comment-15271648 ] Dmitriy Lyubimov commented on MAHOUT-1791: -- experiments show that na

Re: stochastic nature

2016-05-02 Thread Dmitriy Lyubimov
s, i don't think we ever override that. On Mon, May 2, 2016 at 9:39 AM, Dmitriy Lyubimov wrote: > by probabilistic algorithms i mostly mean inference involving monte carlo > type mechanisms (Gibbs sampling LDA which i think might still be part of > our MR collection might be an

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-05-02 Thread Dmitriy Lyubimov
graph = graft, sorry. Graft just the AtB class into 0.12 codebase. On Mon, May 2, 2016 at 9:06 AM, Dmitriy Lyubimov wrote: > ok. > > Nikaash, > could you perhaps do one more experiment and graph the 0.10 a'b code into > 0.12 code (or whatever branch you say is not working t

Re: stochastic nature

2016-05-02 Thread Dmitriy Lyubimov
esn’t use probabilistic > algos then how does it accomplish a degree of optimal parallelization ? > Wouldn’t you need randomization to spread out the processing of tasks. > > > On May 2, 2016, at 12:13 PM, Dmitriy Lyubimov wrote: > > > > yes mahout has stochastic svd and pca wh

Re: stochastic nature

2016-05-02 Thread Dmitriy Lyubimov
yes mahout has stochastic svd and pca which are described at length in the samsara book. The book examples in Andrew Palumbo's github also contain an example of computing k-means|| sketch. if you mean _probabilistic_ algorithms, although i have done some things outside the public domain, nothing h

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-05-02 Thread Dmitriy Lyubimov
Dmitriy put in. It is something like 10x >> faster than a similar algo in Hadoop MR. This particular calc and >> generalization is not in any other Spark or now Flink lib that I know of. >> >> >> On Apr 29, 2016, at 11:24 AM, Dmitriy Lyubimov wrote: >> >>

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-04-29 Thread Dmitriy Lyubimov
rridden to None to imitate pre 0.11, or passed in >> when the app knows better. >> >> Nikaash, are you in a position to comment out the .par(auto=true) and see >> if it makes a difference? >> >> >> On Apr 29, 2016, at 8:53 AM, Dmitriy Lyubimov wrote: >&g

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-04-29 Thread Dmitriy Lyubimov
e) and see > if it makes a difference? > > > On Apr 29, 2016, at 8:53 AM, Dmitriy Lyubimov wrote: > > can you please look into spark UI and write down how many split the job > generates in the first stage of the pipeline, or anywhere else there's > signficant variation

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-04-29 Thread Dmitriy Lyubimov
I was replying to Nikaash. Sorry -- list keeps rejecting replies because of the size, i had to remove the content On Fri, Apr 29, 2016 at 9:05 AM, Khurrum Nasim wrote: > Is that for me Dimitry ? > > > > > On Apr 29, 2016, at 11:53 AM, Dmitriy Lyubimov > wrote: > > &

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-04-29 Thread Dmitriy Lyubimov
can you please look into spark UI and write down how many split the job generates in the first stage of the pipeline, or anywhere else there's signficant variation in # of splits in both cases? the row similarity is a very short pipeline (in comparison with what would normally be on average). so o

  1   2   3   4   5   6   7   8   9   10   >