Re: Training Failure in Clustering : Execution mode singlenode

2017-07-29 Thread Matthias Boehm
Hi Krishna, I just gave Kmeans a try with your parameters and it runs fine with hybrid_spark (default) through spark submit. However, I'm able to reproduce the issue when forcing it into singlenode. Thanks for catching this - I'll take care of it. Regards, Matthias On Sat, Jul 29, 2017 at 9:34

Re: If a Vector is in Another Vector

2017-07-09 Thread Matthias Boehm
well, there are different flavors to this problem: 1) If you're only interested in integer values and in testing if all values of A appear in B, you can do the following: A = seq(3,4) B = seq(2,11) N = min(nrow(A),nrow(B)) M = min(max(A),max(B)) I1 = table(A,1,M,1)!=0; I2 = table(B,1,M,1)!=0;

Re: Decaying performance of SystemML

2017-07-11 Thread Matthias Boehm
without any specifics of scripts or datasets, it's unfortunately, hard if not impossible to help you here. However, note that the memory configuration seems wrong. Why would you configure the driver and executors with 2TB if you only have 256GB per node. Maybe you observe an issue of swapping.

Re: Spark Core

2017-07-12 Thread Matthias Boehm
Well, we explicitly cleanup all intermediates that are no longer used. You can use -explain to output the runtime plan, which includes rmvar (remove variable), cpvar (copy variable) and mvvar (move variable) instructions that internally cleanup intermediates. This cleanup removes data from memory,

Re: spark hybrid mode on HDFS

2017-07-17 Thread Matthias Boehm
well, at a high-level, resource negotiation and distributed storage are orthogonal concepts. Yarn, Mesos, Standalone, and Kubernetes are resource schedulers, which you can configure via master and a separate deploy mode (client/cluster). Under the covers of the HDFS API, you can also use various

Re: Apache SystemML 1.0 Release build

2017-07-12 Thread Matthias Boehm
here is the list of open tasks (disregarding open bugs), that I think would be good to have in SystemML 1.0: SYSTEMML-1284 Code generation -> all remaining subtasks SYSTEMML-1299 Language features 1.0 -> 1301, 1304/1307, 1306, 1316, 1426, 1444 SYSTEMML-1308 Runtime reatures 1.0 -> 1313, 1637

Re: Built-in functions and UDFs

2017-06-30 Thread Matthias Boehm
yes, I absolutely agree - we should add special handling of functions with single outputs. Quick background: Functions are initially split into separate statement blocks so they can bind their outputs (potentially many) directly to logical variable names because Hop DAGs cannot represent multiple

Re: Training Failure in Clustering : Execution mode singlenode

2017-07-29 Thread Matthias Boehm
ok this has been fixed in master - it was essentially an issue of size propagation for ctable with two sequence inputs. Regards, Matthias On Sat, Jul 29, 2017 at 1:57 PM, Matthias Boehm <mboe...@googlemail.com> wrote: > Hi Krishna, > > I just gave Kmeans a try with your paramet

Merging sequences of last-level statement blocks

2017-08-06 Thread Matthias Boehm
Hi all, we see a lot of scripts where conditional statement blocks split DAGs of operations. After constant folding of if predicates, unnecessary branches are already removed (which is important for size propagation) but we don't merge sequences of statement blocks yet. Consider the following

Re: Numerical accuracy of DML.

2017-08-19 Thread Matthias Boehm
Good question - let me separate the somewhat orthogonal aspects to it. First, for descriptive statistics such as sum, mean, skewness, or kurtosis, we already use numerically stable implementations based on Kahan Plus (see org.apache.sysml.runtime.functionobjects.KahanPlus if your interested).

Re: Comparing scikit-learn, Mahout Samsara and SystemML

2017-06-05 Thread Matthias Boehm
s”. > > Mahout Samsara is based on Scala. PredictionIO (predictionio.incubator. > apache.org) algorithms are based on Mahout Samsara and Scala. I asked > Mr. Matthias Boehm at a conference how one could compare Mahout Samsara to > SystemML. From what I understood, Samsara needs "

Re: Rework inter-procedural analysis

2017-06-14 Thread Matthias Boehm
kul Jindal <naku...@gmail.com> wrote: > > > > Hi Matthias, > > > > If its not too much trouble, could you please create a design document > for > > this change. > > This will help the rest of the contributors work on this component as > well. >

Re: Parfor loop interdependencies

2017-06-14 Thread Matthias Boehm
Generally, the parfor dependency analysis applies a series of tests including traditional techniques from high-performance compilers combined with additional rules for common cases. This dependency analysis tries to proof that there are no loop-carried dependencies - so yes, false positives can

Re: Unexpected Executor Crash

2017-06-16 Thread Matthias Boehm
or > committing reserved memory. > > > I can send my Spark and YARN configurations as well if that would be > useful. Thanks a lot for your help. > > > Best, > > > Anthony > > On Thu, Jun 15, 2017 at 3:00 PM, Anthony Thomas <ahtho...@eng.ucsd.edu> > w

Re: Parfor loop interdependencies

2017-06-16 Thread Matthias Boehm
github.com/dusenberrymw > LinkedIn: linkedin.com/in/mikedusenberry > > Sent from my iPhone. > > > > On Jun 14, 2017, at 9:08 PM, Matthias Boehm <mboe...@googlemail.com> > wrote: > > > > Generally, the parfor dependency analysis applies a se

Handling of File URI Schemes

2017-06-09 Thread Matthias Boehm
just FYI: in the process of making SystemML work smoothly w/ object stores, we fixed our handling of file URI schemes. So now, you can read/write from/to different file systems, independent of the configured default fs implementations. For example you can now do something like this A =

Re: On the need for Parameter Server. ( A Model Parallel Construct )

2017-06-19 Thread Matthias Boehm
Well, at a high-level, we could emulate synchronous model parallelism via our existing parfor construct out of the box. If this is sufficient from an algorithm perspective, I would be in favor of making any necessary improvements there instead of introducing a new construct for parameter servers.

Re: PCA data gen script, no output

2017-09-16 Thread Matthias Boehm
a closer look. Regards, Matthias On Fri, Sep 15, 2017 at 11:52 PM, Krishna Kalyan <krishnakaly...@gmail.com> wrote: > Thank you so much for trying this Matthias. I will try this again with > absolute path. > > Regards, > Krishna > > On Sat, Sep 16, 2017 at 12:

Re: PCA data gen script, no output

2017-09-16 Thread Matthias Boehm
> > On Sat, Sep 16, 2017 at 7:24 AM, Matthias Boehm <mboe...@googlemail.com> > wrote: > > > well, I don't think any HDFS fs implementation resolves '~' - so it has > > probably created a directory called '~/open-source/scripts/PCA_data' in > > your user path

Consistency SystemML configuration properties

2017-09-16 Thread Matthias Boehm
Currently, our SystemML configuration properties use an inconsistent prefix scheme. For example, some properties use the prefix 'dml' (e.g., dml.yarn.appmaster), others 'systemml' (e.g., systemml.stats.finegrained), and yet others no prefix at all (e.g., localtmpdir). We discussed this before but

Re: [DISCUSS] R-Interface to SystemML

2017-09-21 Thread Matthias Boehm
I pretty much agree with Niketan and Deron. In general, it would be useful to provide an R API as well. However, I'm a bit concerned for two reasons: * Looking over the github repo, apparently R4ML is not under active development/maintenance anymore (last commit Jul 20). So who would be willing

Re: file location for `import org.apache.sysml.parser.dml.DmlParser.WhileStatementContext`

2017-09-23 Thread Matthias Boehm
I guess this is a just a delayed message, right? Ted was right, it's a generated class, and hence not in the repo. However, when you build SystemML, the generated source ends up in the src directory of the org.apache.sysml.parser.dml package as well. Note that for new builtin functions such as

Re: Jenkins build became unstable: SystemML-DailyTest #1247

2017-09-20 Thread Matthias Boehm
just FYI: using the exact parameters of the failing test (randomly generated), I was able to reproduce this now locally - so we should be able to fix this flaky test once and for all. Regards, Matthias On Wed, Sep 20, 2017 at 2:21 PM, wrote: > See

Re: Minor script changes for SVM with `MLContext`, `spark_submit` etc.

2017-10-04 Thread Matthias Boehm
as mentioned on PR-673, I'm probably not the right person to comment on algorithm or API-related changes, but I'll try to have a look tomorrow. Regards, Matthias On Tue, Oct 3, 2017 at 6:52 AM, Janardhan Pulivarthi < janardhan.pulivar...@gmail.com> wrote: > Hi Matthias, > > Based on your

Re: Jenkins build is still unstable: SystemML-DailyTest #1297

2017-10-15 Thread Matthias Boehm
thanks Ted - this was my fault. A recent change toward a more aggressive use of CSR revealed a number of hidden issues, which I already fixed last night - so, the next build should be fine. Regards, Matthias On Sun, Oct 15, 2017 at 6:36 AM, Ted Yu wrote: > *There seems to

Re: [QUESTION] XOR operations in SystemML. Thanks.

2017-09-07 Thread Matthias Boehm
hich of the type` *pow(a, 2) `*), > to be consistent with the other symbols of dml. > > In this simple case: > 1. ` *a (+) b (+) c (+) d(+)...* ` > 2. ` *xor(xor(a, b), c)..) ` (*sorry, if I written this syntax wrongly) > > Your word will be final. > > Thanks, > Janardhan &

Re: Parfor optimizer getting stuck

2017-09-07 Thread Matthias Boehm
thanks again for catching these issues Rajarshi. I'd like to briefly summarize their resolutions. ad 1) Most likely this was caused by a configuration issue - specifically, the default parallelism being set to less than the number of executors, leading to the number of cores per executor getting

Re: Gentle ping for help on all my PRs. Thanks.

2017-08-23 Thread Matthias Boehm
that's a fair point and we should all help to move these forward. I'll make another pass over 3 and 4 today. Regards, Matthias On Wed, Aug 23, 2017 at 9:30 AM, Janardhan Pulivarthi < janardhan.pulivar...@gmail.com> wrote: > Dear committers, > > I am feeling that my contributions are not in an

Re: JMLC and l2-svm.dml

2017-08-28 Thread Matthias Boehm
ot; }, new String[] { "predicted_y" }, false): is the code I posted and I don't understand how it differs from what you said on point 2: "So please change it as follows: conn.prepareScript(conn.readScript(dml), new String[] {"w", "X"}, new String[] {"predicted_

Re: SystemML 0.15 Release Candidate build (Not 1.0 release)

2017-09-02 Thread Matthias Boehm
As it turns out, SystemML master computes correct results, whereas SystemML 0.14 compiled invalid instructions for the initialization of centroids. The details are in the jira and we can go ahead with the 0.15 release. Regards, Matthias On Fri, Sep 1, 2017 at 11:40 PM, Matthias Boehm <m

Re: Memory estimates equal to zero

2017-09-04 Thread Matthias Boehm
Hi Nantia, that's a good question - if your input data is very small, there is nothing to worry about. For the explain output, the memory estimates are rounded to MB. Therefore, whenever your input is less than 65,536 cells (dense), you will see a memory estimate of 0 MB (e.g.,

Re: [QUESTION] XOR operations in SystemML. Thanks.

2017-09-04 Thread Matthias Boehm
try for the XOR operator, with caret ` ^ ` symbol. > But, this have been reserved for exponentiation. So, another alternative > would be > > 1. ` (+) ` > 2. ` >< ` > 3. ` >-< ` > > Thanks, > Janardhan > > On Thu, Aug 31, 2017 at 7:38 PM, Matthia

Re: SystemML 0.15 Release Candidate build (Not 1.0 release)

2017-09-02 Thread Matthias Boehm
it as a blocker. Regards, Matthias On Fri, Sep 1, 2017 at 7:37 PM, Matthias Boehm <mboe...@googlemail.com> wrote: > +1 - I also think that a 0.15 release is a good idea. Many improvements > and fixes have been made since 0.14 but we're not quite ready for a 1.0 > release. > >

Re: Enabling CLA by default in SystemML 1.0

2017-08-31 Thread Matthias Boehm
-- > > > > > +1 for the change. This will give us enough time to test CLA with different > setting/algorithms/data characteristics before the release. > > > On May 28, 2017, at 2:42 PM, Matthias Boehm <mboe...@googlemail.com> > wrote: > > > &

Re: [QUESTION] XOR operations in SystemML. Thanks.

2017-08-31 Thread Matthias Boehm
>From a scalar operation perspective, you could of course emulate XOR via AND, OR, and negation. However, you might want to write anyway a java-based UDF to efficiently implement this recursive operator. Down the road, we can think about a generalization of our existing cumulative operations such

Re: SystemML 1.0 release timeline

2017-12-01 Thread Matthias Boehm
would recommend to defer the RC1 for a couple of days until these issues are fixed as well. Regards, Matthias On Tue, Nov 14, 2017 at 1:31 PM, Krishna Kalyan <krishnakaly...@gmail.com> wrote: > +1 > > Regards, > Krishna > > On Sun, Nov 12, 2017 at 5:53 AM, Matthias Boehm &

Re: [VOTE] Apache SystemML 1.0.0 (RC2)

2017-12-12 Thread Matthias Boehm
<http://researcher.watson.ibm.com/researcher/view.php? > person=us-npansar> > > > > > > "Glenn Weidner" ---12/11/2017 09:49:48 AM---+1 I ran Linear Regression, > > > Logistic Regression, SVM, Naive Bayes Python tests > > > > > > From:

Re: Jenkins build became unstable: SystemML-DailyTest #1424

2017-12-17 Thread Matthias Boehm
yes, that's indeed the case. The original PR was fine but I introduced this issue during some final cleanups while merging this PR. Regards, Matthias On 12/17/2017 11:49 PM, Ted Yu wrote: XorTest failure seems to be related to: [SYSTEMML-1883] New xor builtin functions over scalars On Sun,

Re: Distribution functions such as gamma, weibull etc.

2017-11-13 Thread Matthias Boehm
unfortunately, our cdf and invcdf currently only support the distributions normal, exp, chisq, f, and t and scalar inputs. So you would have to emulate this at script level. Extending the list of distribution functions and adding matrix support would be a good addition though. Regards, Matthias

SystemML 1.0 release timeline

2017-11-07 Thread Matthias Boehm
Hi all, we made some good progress regarding deep learning support, code generation, and low-latency scoring - so, I'm looking forward to our upcoming 1.0 release. Since it's our first stable release, I think it would be a good idea to allocate some extra time for QA. How about we shoot for a

SystemML 1.0 release timeline

2017-11-07 Thread Matthias Boehm
Hi all, we made some good progress regarding deep learning support, code generation, and low-latency scoring - so, I'm looking forward to our upcoming 1.0 release. Since it's our first stable release, I think it would be a good idea to allocate some extra time for QA. How about we shoot for a

Re: [VOTE] Apache SystemML 1.0.0 (RC1)

2017-12-07 Thread Matthias Boehm
-1 due to the issue mentioned by Niketan, as well as additional correctness and performance issues fixed in the last couple of days. Regards, Matthias On Tue, Dec 5, 2017 at 6:25 PM, Niketan Pansare wrote: > Soft -1 as GPU backend is in experimental mode. GPU matrix

[DISCUSS] Roadmap SystemML 1.1 and beyond

2017-12-08 Thread Matthias Boehm
Hi all, with our SystemML 1.0 release around the corner, I think we should start the discussion on the roadmap for SystemML 1.1 and beyond. Below is an initial list as a starting point, but please help to add relevant items, especially for algorithms and APIs, which are barely covered so far. 1)

Re: [VOTE] Apache SystemML 1.0.0 (RC2)

2017-12-09 Thread Matthias Boehm
+1 I ran the perftest suite with the artifact on Spark 2.2 up to 80GB without any failures or performance issues. On earlier versions, I also ran the perftest suite with Spark 2.1 and 2.2, w/ and w/o codegen, and w/ auto compression up to 800GB without remaining issues. As a minor nitpick (to be

Re: Get plans before and after rewrites

2017-10-25 Thread Matthias Boehm
cannot find anything relevant to 'org.apache.sysml.hops.rewrite'. Is > there > another file I should check? > > Thanks again, > Nantia > > 2017-10-13 23:29 GMT+03:00 Matthias Boehm <mboe...@googlemail.com>: > > > Hi Nantia, > > > > in optimization lev

Re: My life made easier, now!

2017-10-30 Thread Matthias Boehm
Janardhan, could you please elaborate a little what issues you faced? SystemML itself does not require any specific installation. Also to be productive, you might want to setup a dev environment, where you can run tests locally directly from your IDE. Regards, Matthias On Mon, Oct 30, 2017 at

Re: Questions about MNIST LeNet example

2018-05-06 Thread Matthias Boehm
Hi Guobao, that sounds very good. In general, the "model" refers to the collection of all weights and bias matrices of a given architecture. Similar to a classic regression model, we can view the weights as the "slope", i.e., multiplicative terms, while the biases are the "intercept", i.e.,

Re: SYSTEMML-447

2018-05-10 Thread Matthias Boehm
This particular JIRA is only partially related. Niketan and Nakul worked out the details - the only reason I show up as the reporter is that, if I remember correctly, we split a larger scoped JIRA for low-level optimizations (GPU, codegen, compression) into individual JIRAs and created the

Re: Questions about MNIST LeNet example

2018-05-10 Thread Matthias Boehm
entries via l1[7] or l2['g'] accordingly. We're still working on additional features to make the integration with IPA, functions, and size/type propagation smoother, but the basic functionality is already available. Regards, Matthias On Sun, May 6, 2018 at 1:08 PM, Matthias Boehm <mboe...@gmail.

Re: [DISCUSS] Adding SystemML to OSS Fuzz

2018-05-21 Thread Matthias Boehm
Well, in general this can be interesting. Apart from our default testsuite, we occasionally ran static code analysis tools. Having additional tests for partially valid scripts and inputs can help to find more issues. That being said, I don't think we currently qualify as a project with

Re: Release Planning SystemML 1.2

2018-06-07 Thread Matthias Boehm
; > > > From: Krishna Kalyan > To: dev@systemml.apache.org > Date: 06/05/2018 10:09 PM > Subject:Re: Release Planning SystemML 1.2 > > > > +1 > > I am completely available to help with the QA cycle and help with > switching > to new perf test suite.

Re: Release Planning SystemML 1.2

2018-06-24 Thread Matthias Boehm
given the current status of open tasks and the delay with regard to QA, I think we need to push this release out by a couple of weeks. Does mid to end July sound good to everyone? Regards, Matthias On Wed, Jun 6, 2018 at 11:28 PM, Matthias Boehm wrote: > thanks Berthold - that sounds g

GSoC 2018 Student Guobao Li

2018-05-01 Thread Matthias Boehm
Hi all, please join me in welcoming Guobao Li as a GSoC 2018 student, who will be working on SYSTEMML-2083 (Language and runtime for parameter servers) this summer. We're currently in the community bonding phase, but the project will start May 14. Krishna already kindly volunteered (on the dev

Re: Passing a CoordinateMatrix to SystemML

2017-12-23 Thread Matthias Boehm
at org.apache.sysml.runtime.controlprogram.Program. execute(Program.java:130) at org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram( ScriptExecutor.java:388) ... 16 more ... On Fri, Dec 22, 2017 at 5:48 AM, Matthias Boehm <mboe...@gmail.com> wrote: well, let's do the following to figure this out: 1) If the schema is

Re: Passing a CoordinateMatrix to SystemML

2017-12-24 Thread Matthias Boehm
r.java:624) at java.lang.Thread.run(Thread.java:748) Anthony On Sat, Dec 23, 2017 at 4:27 AM, Matthias Boehm <mboe...@gmail.com> wrote: Given the line numbers from the stacktrace, it seems that you use a rather old version of SystemML. Hence, I would recommend to upgrade to SystemML

Re: Passing a CoordinateMatrix to SystemML

2017-12-25 Thread Matthias Boehm
:624) at java.lang.Thread.run(Thread.java:748) Best, Anthony On Sun, Dec 24, 2017 at 3:14 AM, Matthias Boehm <mboe...@gmail.com> wrote: Thanks again for catching this issue Anthony - this IJV reblock issue with large ultra-sparse matrices is now fixed in master. It likely

Re: Passing a CoordinateMatrix to SystemML

2018-01-10 Thread Matthias Boehm
s > a dataframe of sparse vectors to a DML script without issue. Sorry for the > slow confirmation on this - I've been out of the office for the last couple > weeks. Thanks for your help debugging this! > > Best, > > Anthony > > On Mon, Dec 25, 2017 at 5:35 AM, Matthias

Re: [Discuss] GSoC 2018

2018-01-27 Thread Matthias Boehm
, Matthias On Sat, Jan 27, 2018 at 1:57 PM, Nakul Jindal <naku...@gmail.com> wrote: > This is awesome! > I am guessing the goal is to have this epic be a summer worth of > mini-projects for a single GSoC student, isthat correct? > > > > On Fri, Jan 26, 2018 at 7:

Re: Release Planning

2018-02-06 Thread Matthias Boehm
Berthold Reinwald > > IBM Almaden Research Center > > office: (408) 927 2208; T/L: 457 2208 > > e-mail: reinw...@us.ibm.com > > > > > > > > From: Matthias Boehm <mboe...@gmail.com> > > To: dev@systemml.apache.org > > Date: 02/

Re: Can BITWISE_XOR be added. Thanks.

2018-01-01 Thread Matthias Boehm
Hi Janardhan, sure - adding such bitwise operations is a nice addition. There is still an open task (SYSTEMML-1931) to generalize the existing NOT, AND, OR, and XOR to matrix arguments which should be straightforward as is seamlessly fits into the existing binary operator. In a separate task, we

Re: Passing a CoordinateMatrix to SystemML

2017-12-22 Thread Matthias Boehm
well, let's do the following to figure this out: 1) If the schema is indeed [label: Integer, features: SparseVector], please change the third line to val y = input_data.select("label"). 2) For debugging, I would recommend to use a simple script like "print(sum(X));" and try converting X and

Re: Release Planning SystemML 1.2

2018-08-11 Thread Matthias Boehm
gt; Subject:Re: Release Planning SystemML 1.2 >> >> >> >> One more thing, I am already building a docker image. But, which image do >> you prefer >> >> 1. CentOS 7 or >> 2. Ubuntu - Later extensible to GPU very easily. >> >> This decrea

Re: [VOTE] Apache SystemML 1.2.0 (RC1)

2018-08-19 Thread Matthias Boehm
+1 I ran the perftest suite multiple times up to 80GB with and without codegen. After fixing all the issues and regressions, the entire suite ran successfully against Spark 2.2 and 2.3 and all use cases showed equal or better performance compared to SystemML 1.1. Regards, Matthias On Fri, Aug

Fwd: [ComDev] High resolution project logos wanted!

2018-08-23 Thread Matthias Boehm
Could someone with access to our logos please commit them into the mentioned repo? @Deron: I remember you gave me once the archive with all versions of our logos. Thanks. Regards, Matthias -- Forwarded message -- From: Daniel Gruno Date: Thu, Aug 23, 2018 at 1:52 PM Subject:

Re: [DISCUSS] Adding SystemML to OSS Fuzz

2018-07-19 Thread Matthias Boehm
can discuss here. If I don't here back in 3 days, I'll recommend to close the issue at google/oss-fuzz. Regards, Matthias On Mon, May 21, 2018 at 5:29 PM, Matthias Boehm wrote: > Well, in general this can be interesting. Apart from our default > testsuite, we occasionally ran static code an

New committer: Guobao Li

2018-09-04 Thread Matthias Boehm
The Project Management Committee (PMC) for Apache SystemML has invited Guobao Li to become a committer and we are pleased to announce that he has accepted. Guobao was instrumental in creating language and runtime support for parameter servers in SystemML. This includes local and distributed

Fwd: Extending Codegen algorithm tests for heuristics

2018-03-13 Thread Matthias Boehm
-- Forwarded message -- From: Matthias Boehm <mboe...@gmail.com> Date: Tue, Mar 13, 2018 at 1:00 PM Subject: Re: Extending Codegen algorithm tests for heuristics To: Chamath Abeysinghe <abeysinghecham...@gmail.com> without debugging it's hard to tell, but usually so

Re: [DISCUSS] integrated testing for MLContext, SPARK, codegen.

2018-03-10 Thread Matthias Boehm
Hi Janardhan, in general, we prefer to compare against R because it helps detecting issues that are common across different optimizers and execution modes. So for small scripts like PCA, I would recommend to simply create an R script, which should be very similar to the dml script. However, for

Re: Sub projects in Language and run time for parameter servers [SYSTEMML-2083]

2018-03-09 Thread Matthias Boehm
Hi Chamath, ad 1: Yes, this is absolutely correct. However, it is important to realize that within the workers, we want to run dml functions, and for these we'll reuse our existing compiler, runtime, operations, and data structures. ad 2: Yes, this is also correct. Indeed we can use an existing

Re: distributed cholesky on systemml

2018-04-22 Thread Matthias Boehm
Pu <qifan...@gmail.com> wrote: > Matthias, > > Thanks so much for taking time to fix. Really appreciated it. > Does the same reasoning apply to the cholesky script? The recursive approach > also looks inherently sequential. > > Best, > Qifan > > On Sat, Apr 21, 20

Re: distributed cholesky on systemml

2018-04-23 Thread Matthias Boehm
ach operator will generate a Spark task converting things > into RDD operators. > Thanks so much for the patience and detailed instructions. I have a much > better understanding of the system now. > > On Sun, Apr 22, 2018 at 7:47 PM, Matthias Boehm <mboe...@gmail.com> wrote: >> >&

Re: [SYSTEMML-2084][SYSTEMML-2085] Language and Compiler Extension, Basic Runtime Primitives.

2018-03-26 Thread Matthias Boehm
Well, SYSTEMML-2084 aims to integrate the new paramserv builtin function - as described in the JIRA - into SystemML's semantic validation and compiler. This would entail to first finalize the design of this builtin function (e.g., function signature and semantics) and then integrate it into the

Re: Draft Release Notes 1.1.0

2018-03-28 Thread Matthias Boehm
thanks for the initial draft and extensions - I would remove internals #2/#3 because they are still open, move the other internals to performance, and include (or extend) the following: * codegen extensions (operation support, extended optimizer, see SYSTEMML-2065) * new accumulator operator +=

Fwd: Contribution to SystemML

2018-03-28 Thread Matthias Boehm
-- Forwarded message -- From: Matthias Boehm <mboe...@gmail.com> Date: Wed, Mar 28, 2018 at 9:34 PM Subject: Re: Contribution to SystemML To: Govinda Malavipathirana <mp.govi...@gmail.com> well, first of all sorry that you wasted one of your proposals because it wou

Re: [VOTE] Apache SystemML 1.1.0 (RC2)

2018-03-24 Thread Matthias Boehm
+1, I reran all tests mentioned before for RC2 as well without any issues. Regards, Matthias On Fri, Mar 23, 2018 at 5:26 PM, Berthold Reinwald wrote: > Please vote on releasing the following candidate as Apache SystemML > version 1.1.0 > > The vote is open for at least 72

Re: Contribution to SystemML

2018-04-02 Thread Matthias Boehm
> select one? > regards, > Govinda > > > On Thu, Mar 29, 2018 at 10:06 AM Matthias Boehm <mboe...@gmail.com> wrote: > >> -- Forwarded message -- >> From: Matthias Boehm <mboe...@gmail.com> >> Date: Wed, Mar 28, 2018 at 9:34 PM >>

Fwd: Fw: Request for a beginner JIRA

2018-04-03 Thread Matthias Boehm
Thanks for your interest Daiki. I created two JIRAs SYSTEMML-2233 and SYSTEMML-2232 that might me a good starting point. I would recommend to begin with 2233 as a basic cleanup task, which is meant to get you comfortable. The other task is then a bit more involved but would improve our function

Fwd: Sub projects in Language and run time for parameter servers [SYSTEMML-2083]

2018-03-17 Thread Matthias Boehm
-- Forwarded message -- From: Matthias Boehm <mboe...@gmail.com> Date: Sat, Mar 17, 2018 at 5:41 PM Subject: Re: Sub projects in Language and run time for parameter servers [SYSTEMML-2083] To: Chamath Abeysinghe <abeysinghecham...@gmail.com> great to see that y

Re: [VOTE] Apache SystemML 1.1.0 (RC1)

2018-03-21 Thread Matthias Boehm
-1, sorry but I have to change my vote due to SYSTEMML-2201. The change will be in master tomorrow - once this is done, we can cut another RC. Regards, Matthias On Wed, Mar 21, 2018 at 4:12 PM, Matthias Boehm <mboe...@gmail.com> wrote: > +1 > > I ran the perftest s

Re: Apache SystemML 1.1.0 : Performance Test

2018-03-22 Thread Matthias Boehm
awesome - thanks for sharing Krishna. Regards, Matthias On Wed, Mar 21, 2018 at 2:10 AM, Krishna Kalyan wrote: > Hello All, > Sharing a small Shiny App to visualize the runtime performance for System > ML algorithms. This is work still in progress. Any feedback would

Re: Autoencoder codegen testing with R

2019-01-15 Thread Matthias Boehm
yes, you're absolutely right - in this form, results would always differ. Even if we feed a seed to both R and DML scripts, the implementation of our rand is very different as we need to ensure that, given a seed, we generate the same data in local and distributed operations. Accordingly, we

Re: upgrading hadoop version

2019-02-20 Thread Matthias Boehm
Raising the minimum version to 2.7.x is a good idea and fine by me. I would suggest updating the pom to version 2.7.7 but only mention 2.7 as a minimum requirement in the README because as far as I know no APIs changed that wouldn't allow running SystemML on older versions as well. Regards,

Re: SYSTEMML PyPi Statistics over the last six months.

2019-08-10 Thread Matthias Boehm
great - thank you so much for the summary. It would be awesome to get it for a longer history as well and aggregate it with the release/maven downloads. Regards, Matthias On 10/08/2019 16:23, Janardhan wrote: Hi, The following are the queried SystemML download statistics to understand the

Re: DML scripts under scripts/staging

2019-10-31 Thread Matthias Boehm
Hi Remy, generally these scripts are not much different from the ones found in scripts/algorithms. However, the staging scripts were either in development, or did not receive enough testing to move to algorithms yet. If you have a specific algorithm in mind, let us know and we help

Re: Roadmap Merge and Rename SystemDS

2020-04-10 Thread Matthias Boehm
, Janardhan On Tue, Mar 24, 2020 at 6:28 PM Matthias Boehm wrote: that's a good point Henry. Yes, with SystemDS 0.1.0, we removed the MapReduce compiler and runtime backend, the pydml parser and language support, the Java-UDF framework, and the script-level debugger. We are concentrating on local

Welcome 4 New Committers

2020-05-01 Thread Matthias Boehm
The Project Management Committee (PMC) for Apache SystemML (SystemDS) has invited Arnab Phani, Mark Dokter, Shafaq Siddiqi, and Kevin Innerebner to become committers, and we are pleased to announce that all four have accepted. The new committers cover many important areas of current and

Re: [#904] Can the ONNX-SystemDS implementation be reviewed. Thanks.

2020-05-12 Thread Matthias Boehm
yes this is the classic blocking issue - we only held back because you commented you want to review it in detail. We'll take care of it now. Regards, Matthias On 5/12/2020 8:05 AM, Janardhan wrote: Hi, @lukas-jkl have implemented ONNX support for SystemDS, also well documented (in code and

Re: Docker Container Organisation

2020-05-17 Thread Matthias Boehm
thanks for following-up on this. @Berthold: do I remember correctly that you looked into a similar setup a while ago? Regards, Matthias On 5/15/2020 7:46 PM, Baunsgaard, Sebastian wrote: Hi SystemDS developers. Currently we have some docker containers that are associated with my personal

Re: Roadmap Merge and Rename SystemDS

2020-03-21 Thread Matthias Boehm
of the very different objectives and because SystemDS reflects both the origin from SystemML and its new focus on data science pipelines. [1] https://issues.apache.org/jira/projects/PODLINGNAMESEARCH/issues/PODLINGNAMESEARCH-179?filter=allissues Regards, Matthias On 3/9/2020 6:37 PM, Matthias

Re: Roadmap Merge and Rename SystemDS

2020-03-24 Thread Matthias Boehm
SystemDS being merged to SystemML repository? - Henry On Sat, Mar 21, 2020 at 2:47 PM Matthias Boehm wrote: just FYI, we created a ticket for the suitable name search, and shared the related results [1]. So from my perspective, it really boils down to the question if we accept the closeness

Re: [DISCUSSION] Documentation dev & user along with builtins.

2020-05-23 Thread Matthias Boehm
thanks for the initiative and moving this discussion to the dev list. I think this would be very valuable. Regarding how to start, I would recommend to focus on external functionality first by documenting all dml-bodied and native builtin functions (see org.apache.sysds.common.Builtins) and

Re: Fwd: Help me improve my documentation search :)

2020-06-01 Thread Matthias Boehm
Janardhan, thanks for the initiative but please refrain from sending such advertisements to our dev mailing list and making any promises in the name of SystemDS/SystemML on potential inclusion of such dependencies into future releases. Right now we did not yet decide what our target

Re: Performance Question

2020-06-19 Thread Matthias Boehm
Thanks for the question and the detailed inputs - this is an effect of simplification rewrites that only apply in one of the cases. Specifically, (t(N)%*%t(M))[1,1] is rewritten to t(N)[1,] %*% t(M)[,1], which is a form of selection pushdown. You can do the following, for benchmarking*:

Re: [DISCUSSION] Website stack `systemml.apache.org`. Thanks.

2020-06-18 Thread Matthias Boehm
trying to merge something else. That said nothing bad happened except you did not get a chance to look through the PR, and it is progress in the direction discussed in this thread. Best regards Sebastian From: Matthias Boehm Sent: Saturday, June 6, 2020 7:03:19

Re: [DISCUSSION] Website stack `systemml.apache.org`. Thanks.

2020-06-06 Thread Matthias Boehm
from my perspective it would be very important to have all builtin functions in a single markdown file to allow users to search for things and users don't care how a builtin function is internally implemented. So we might want to use this opportunity to consolidate the already documented