Re: Sparse Matrix Storage Consumption Issue

2017-05-08 Thread Matthias Boehm
at 3:09 PM, Matthias Boehm <mboe...@googlemail.com> wrote: > ok thanks for sharing - I'll have a look later this week. > > Regards, > Matthias > > On Mon, May 8, 2017 at 2:20 PM, Mingyang Wang <miw...@eng.ucsd.edu> wrote: > >> Hi Matthias, >> >>

Re: Sparse Matrix Storage Consumption Issue

2017-05-08 Thread Matthias Boehm
p of 33.8 GB to disk (1 time so far) > 17/05/08 13:20:20 INFO ExternalSorter: Thread 116 spilling in-memory > map of 31.2 GB to disk (1 time so far) > > ... > > 17/05/08 13:24:50 INFO ExternalAppendOnlyMap: Thread 116 spilling > in-memory map of 26.9 GB to disk (1 time so far)

Re: Sparse Matrix Storage Consumption Issue

2017-05-06 Thread Matthias Boehm
, time, count): > > -- 1) sp_uak+ 92.597 sec 1 > > -- 2) sp_chkpoint 0.377 sec 1 > > -- 3) == 0.001 sec 1 > > -- 4) print 0.000 sec 1 > > -- 5) + 0.000 sec 1 > > -- 6) castdts 0.000 sec 1 > > -- 7) createvar 0.000 sec 3 > > -- 8) rmvar 0.000 sec 7 >

Re: Sparse Matrix Storage Consumption Issue

2017-05-03 Thread Matthias Boehm
to summarize, this was an issue of selecting serialized representations for large ultra-sparse matrices. Thanks again for sharing your feedback with us. 1) In-memory representation: In CSR every non-zero will require 12 bytes - this is 240MB in your case. The overall memory consumption,

Re: Standard code styles for DML and Java?

2017-05-02 Thread Matthias Boehm
thanks Deron for centralizing this discussion, as this could help to avoid redundancy spread across many individual JIRAs and PRs. Overall, I think it would be good to agree on individual style guides for DML and Java. I'm fine with using spaces for DML scripts because they are rarely

Re: [DISCUSS] Remove old MLContext API

2017-05-01 Thread Matthias Boehm
definitely +1 from me, although I think we already agreed upon that by properly deprecating this API in previous releases. Regards, Matthias On Mon, May 1, 2017 at 6:55 PM, Nakul Jindal wrote: > +1 > > Nakul > > On Mon, May 1, 2017 at 5:37 PM, wrote:

Re: Randomly Selecting rows from a dataframe

2017-04-30 Thread Matthias Boehm
jit chakraborty <ak...@hotmail.com> > Sent: Saturday, April 22, 2017 12:45 PM > To: dev@systemml.incubator.apache.org > Subject: Re: Randomly Selecting rows from a dataframe > > Thank you Matthias! You are most helpful! > > > Thanks again! > > Arijit > > _

Re: Build passed/failed messages for pull requests

2017-04-28 Thread Matthias Boehm
as I commented on one of these github comments, I'm strongly against these kind of unnecessary messages because they distract from the actual discussions. I already had to change my notification settings accordingly - essentially I'm not watching SystemML's PR activity any more. Regards,

Re: Updating A Vector

2017-04-27 Thread Matthias Boehm
if your values in matrix2 are aligned as in your example, then you can do the following (which works for arbitrary values in matrix1 but you could simplify it if you have just 1s): matrix1 = matrix1*(matrix2==0) + (matrix2!=0)*2; The only problematic case would be special values such NaNs in

Re: Evaluate a scalar DAG during compilation

2017-04-24 Thread Matthias Boehm
yes, we already do constant folding - the details are in org.apache.sysml.hops.rewrite.RewriteConstantFolding In order to ensure consistency with our runtime, we actually generate instructions for these sub dags, execute them and finally replace the dag with the computed literal. Regards,

Re: [VOTE] Apache SystemML 0.14.0-incubating (RC4)

2017-04-24 Thread Matthias Boehm
+1 I ran large-scale experiments on Spark 2.1 for L2SVM, GLM, MLogreg, LinregCG, LinregDS, and PCA over scaled versions of MNIST and ImageNet (up to 1TB, with uncompressed and compressed linear algebra) without any issues. Compared to previous experiments with SystemML 0.11 and Spark 1.6, I've

Re: [VOTE] Apache SystemML 0.14.0-incubating (RC4)

2017-04-24 Thread Matthias Boehm
+1 I ran large-scale experiments on Spark 2.1 for L2SVM, GLM, MLogreg, LinregCG, LinregDS, and PCA over scaled versions of MNIST and ImageNet (up to 1TB, with uncompressed and compressed linear algebra) without any issues. Compared to previous experiments with SystemML 0.11 and Spark 1.6, I've

Fwd: Questions about the Compositions of Execution Time

2017-04-22 Thread Matthias Boehm
-- Forwarded message -- From: Matthias Boehm <mboe...@googlemail.com> Date: Sat, Apr 22, 2017 at 4:23 PM Subject: Re: Questions about the Compositions of Execution Time To: Mingyang Wang <miw...@eng.ucsd.edu> with the latest change from today there should not be muc

Re: function default parameters

2017-04-21 Thread Matthias Boehm
well, for arguments passed into dml scripts there is of course ifdef($b, 2) but for functions there is indeed no good support. At runtime level we still support default parameters for scalar arguments at the tail of the parameter list but I guess at one point the corresponding parser support was

Re: Randomly Selecting rows from a dataframe

2017-04-21 Thread Matthias Boehm
you can take for example a 1% sample of rows via a permutation matrix (specifically selection matrix) as follows I = (rand(rows=nrow(X), cols=1, min=0, max=1) <= 0.01); P = removeEmpty(target=diag(I), margin="rows"); Xsample = P %*% X; or via removeEmpty and selection vector I =

Re: Vector of Matrix

2017-04-21 Thread Matthias Boehm
no, right now, we don't support structs or complex objects. Regards, Matthias On 4/21/2017 4:17 AM, arijit chakraborty wrote: Hi, In R (as well as in python), we can store values list within list. Say I've 2 matrix with different dimensions, x <- matrix(1:10, ncol=2) y <- matrix(1:5,

Re: Table

2017-04-21 Thread Matthias Boehm
The input vectors to table are interpreted as row indexes and column indexes, respectively. Without weights, we add 1, otherwise the corresponding weight value to the output cells. So in your example you have constant row indexes of 1 but a seq(1,10) for column indexes and hence you get a

Re: Questions about the Compositions of Execution Time

2017-04-21 Thread Matthias Boehm
On Thu, Apr 20, 2017 at 11:44 AM, Matthias Boehm <mboe...@googlemail.com> wrote: > 1) Understanding execution plans: Our local bufferpool reads matrices in a > lazy manner on the first singlenode, i.e., CP, operation that tries to pin > the matrix into memory. Similarly, distributed

Re: Questions about the Compositions of Execution Time

2017-04-20 Thread Matthias Boehm
le read/write script (it took quite a long time and failed). Regards, Mingyang On Thu, Apr 20, 2017 at 2:08 AM Matthias Boehm <mboe...@googlemail.com> wrote: Hi Mingyang, thanks for the questions - this is very valuable feedback. I was able to reproduce your performance issue on scenario 1

Re: Experimental code generation

2017-04-20 Thread Matthias Boehm
, at 8:32 AM, Berthold Reinwald <reinw...@us.ibm.com> wrote: This is awesome! Regards, Berthold Reinwald IBM Almaden Research Center office: (408) 927 2208; T/L: 457 2208 e-mail: reinw...@us.ibm.com From: Matthias Boehm <mboe...@googlemail.com> To: dev@systemml.incubator.apa

Re: Loss of dimensionality info in transient reads

2017-04-18 Thread Matthias Boehm
In general, there are a couple of scenarios which make size propagation challenging. This includes: * Complex function call patterns (where functions are potentially called with different sizes) * External user-defined functions * Data-dependent operators (e.g., table, aggregate, removeEmtpy); *

Re: True/False flags in HOPs parameters

2017-04-18 Thread Matthias Boehm
These flags in the runtime plans (-explain runtime or recompile_runtime) are indicators if the given input operand is a literal or not. Without these flags we could not differentiate between literal strings and variable names. Regards, Matthias On Tue, Apr 18, 2017 at 12:20 PM,

Re: SystemML query

2017-04-17 Thread Matthias Boehm
if your data X is already ordered you can do the following: I = rbind(matrix(1,1,1), (X[1:nrow(X)-1,]!=X[2:nrow(X),])); dX = removeEmpty(target=X, margin="rows", select=I); Regards, Matthias On 4/17/2017 8:40 AM, arijit chakraborty wrote: Hi, I've an issue regarding finding and removing the

Re: [VOTE] Apache SystemML 0.14.0-incubating (RC3)

2017-04-15 Thread Matthias Boehm
I think SYSTEMML-1518 and SYSTEMML-1520 require a new RC and I agree that we should create a 0.14 branch along with it to unblock ongoing development. I'm happy to backport any additional fixes into this branch until we have a solid release candidate. Regards, Matthias On Thu, Apr 13, 2017 at

Re: [VOTE] Apache SystemML 0.14.0-incubating (RC1)

2017-04-04 Thread Matthias Boehm
sorry, but -1 due to SYSTEMML-1464 and SYSTEMML-1459. In detail, SYSTEMML-1464 is a blocker issue for me because it renders JMLC model scoring of text inputs with tokens that contain spaces almost unusable. Furthermore, SYSTEMML-1459 covers a rewrite issue that might corrupt hop dags for special

Re: Java compiler for code generation

2017-03-31 Thread Matthias Boehm
7 2208 > e-mail: reinw...@us.ibm.com > > > > From: Matthias Boehm <mboe...@googlemail.com> > To: dev@systemml.incubator.apache.org > Date: 03/31/2017 08:17 PM > Subject:Java compiler for code generation > > > > Hi all, > > currently, our new

Re: UDFs Within Expressions

2017-03-29 Thread Matthias Boehm
Well, this would indeed be a very useful extension - I've actually seen many use cases, where new users ran into issues with simple expressions like X[i,i] = foo(). In the general case, the problem with UDFs is that they can have - in contrast to builtin functions - multiple returns. These

Re: [HELP] Undesired Benchmark Results

2017-03-24 Thread Matthias Boehm
hu, Mar 23, 2017 at 11:36 PM Matthias Boehm <mboe...@googlemail.com> wrote: well, after thinking some more about this issue, I have to correct myself but the workarounds still apply. The problem is not the "in-memory reblock" but the collect of the reblocked RDD, whic

Re: Build failed in Jenkins: SystemML-DailyTest #870

2017-03-16 Thread Matthias Boehm
sorry for the issues - I'll fix it with the next change. Regards, Matthias On Thu, Mar 16, 2017 at 2:56 AM, <jenk...@spark.tc> wrote: > See <https://sparktc.ibmcloud.com/jenkins/job/SystemML- > DailyTest/870/changes> > > Changes: > > [Matthias Boehm] [SYSTEMML-1402

Re: Release cadence

2017-03-12 Thread Matthias Boehm
) now rather than waiting additional months. Also I would like > to > >> be able to correctly identify our next version in the online > documentation. > >> > >> > > How about just make SystemML Next and change the release name when we do > > the relea

Re: Next Steps in the graduation process

2017-03-07 Thread Matthias Boehm
I could help doing this assessment. Btw, here is a working link: https://community.apache.org/apache-way/apache-project-maturity-model.html Regards, Matthias On Tue, Mar 7, 2017 at 1:38 PM, Luciano Resende wrote: > On Tue, Mar 7, 2017 at 11:59 AM, Arvind Surve

Dropping Java 6 and 7 support

2017-03-06 Thread Matthias Boehm
Hi all, I'd like to drop the support for Java 6 and 7 in our SystemML 1.0 release. Our build still refers to a java compliance level 6, which has not been changed for more than 5 years now. Spark >= 1.5 anyway requires Java 7 and there has been some discussion on removing Java 7 as well because

Re: Release cadence

2017-03-04 Thread Matthias Boehm
e contributors each month. > > If the overhead slows us down too much, then we can go to a slower release > cycle. > > Deron > > > > > On Thu, Jan 5, 2017 at 1:50 PM, <dusenberr...@gmail.com> wrote: > > > +1 for adopting a 1 month release cycle. > > >

Re: [DISCUSS] SystemML Graduation

2017-03-03 Thread Matthias Boehm
Thanks for starting this discussion Luciano. I think it's a good point in time to graduate SystemML as we have shown readiness by creating an open and positive community, and it would send a great signal to potential new users and developers. From my perspective, we should aim for a top-level

Re: incubator-systemml git commit: [maven-release-plugin] prepare for next development iteration

2017-02-22 Thread Matthias Boehm
Could we please change the target version to 1.0 instead of 0.14 to make clear that master is now open for 1.0 features? Regards, Matthias On Mon, Feb 20, 2017 at 12:08 PM, wrote: > Repository: incubator-systemml > Updated Branches: > refs/heads/master 07f26ca4e ->

Re: Minimum required Spark version

2017-02-21 Thread Matthias Boehm
excellent - thanks for the quick fix Deron. Regards, Matthias On 2/21/2017 1:09 AM, Deron Eriksson wrote: Note that MLContext has been updated to log a warning rather than throw an exception to the user for Spark versions previous to 2.1.0. Deron On Mon, Feb 20, 2017 at 2:29 PM, Matthias

Weighted Statistical Estimates

2017-02-18 Thread Matthias Boehm
Going toward to our 1.0 release, I'd like to create consistency across our weighted statistics. Conceptually, theses weights represent frequency counts, i.e., multiplicities of input values. So far, our documentation does not state any restrictions on these weights but some runtime operations

Re: Operators in HOP DAG

2017-02-17 Thread Matthias Boehm
ad 1: t(-*): ternary minus mult (for patterns like X-s*Y) ad 2: ua(+RC): unary aggregate with aggregation function + (at runtime level you will see k+ for Kahan plus) and direction RC, i.e., full aggregate over rows and columns. ad 3: lix: matrix or frame left indexing (for patterns like

Re: Removal of workaround flags

2017-02-13 Thread Matthias Boehm
neck, thus leading to the creation of > SYSTEMML-1140. Specifically, what did you use to attempt to reproduce 1140? > > > -Mike > > -- > > Mike Dusenberry > GitHub: github.com/dusenberrymw > LinkedIn: linkedin.com/in/mikedusenberry > > Sent from my iPhone. >

Re: Build failed in Jenkins: SystemML-DailyTest #805

2017-02-12 Thread Matthias Boehm
org.apache.sysml.test.integration.functions.transform.TransformCSVFrameEncodeReadTest Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 29.677 sec - in org.apache.sysml.test.integration.functions.transform.TransformCSVFrameEncodeReadTest On Sun, Feb 12, 2017 at 12:26 AM, Matthias Boehm <mboe...@googlemail.com> wrote: could someone pleas

Namespace handling w/ imports

2017-02-12 Thread Matthias Boehm
While debugging our mnist_lenet script, I encountered an issue with our namespace handling with imports. Here is the related function call graph (after inlining): FUNCTION CALL GRAPH --MAIN PROGRAM .\mnist_lenet.dml::train --.\nn/layers/dropout.dml::forward

Removal of workaround flags

2017-02-12 Thread Matthias Boehm
just a little heads-up: I intend to the remove the recently added workaround flags DISABLE_SPARSE and DISABLE_CACHING because any underlying issues should be directly addressed. Furthermore, I was not able the reproduce the issues reported in SYSTEMML-1140, probably due to improvements that

Re: Build failed in Jenkins: SystemML-DailyTest #805

2017-02-12 Thread Matthias Boehm
ins/job/SystemML-DailyTest/805/changes> Changes: [Matthias Boehm] [SYSTEMML-1244] Fix robustness csv text read (quoted recoded maps) [Matthias Boehm] [SYSTEMML-1243] Fix size update wdivmm/wsigmoid/wumm on rewrite [Matthias Boehm] [SYSTEMML-1248] Fix loop rewrite update-in-place (exclude

Re: [DISCUSS] Enable Python Tests on Jenkins

2017-02-03 Thread Matthias Boehm
this is fine, but please make sure that it gets integrated into our existing testsuite which can be run through maven or junit. Regards, Matthias On 2/3/2017 9:10 PM, Deron Eriksson wrote: +1 for enabling the Python tests in the test suite. Since we use multiple languages and it's not always

Re: February Podling Report

2017-02-01 Thread Matthias Boehm
optionally, we could include the following paper that we presented at CIDR'17 in January. Tarek Elgamal, Shangyu Luo, Mattias Boehm, Alexandre V. Evfimievski, Shirish Tatikonda, Berthold Reinwald, Prithviraj Sen: SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine

Re: HOP and LOP DAGs

2017-01-30 Thread Matthias Boehm
Hi Nantia, good question - so far the documentation of tools like explain and stats is indeed very sparse. However, there are some overview slides from a tutorial we gave last year at the BOSS workshop: http://boss.dima.tu-berlin.de/media/BOSS16-Tutorial-mboehm.pdf (slides 10-15) If you

Re: [VOTE] Apache SystemML 0.12.0-incubating (RC2)

2017-01-27 Thread Matthias Boehm
Thanks Glenn. Could you please also share the measurements (maybe in a jira). Furthermore, seeing that you ran only a subset of multinomial experiments, makes me wonder if you used the current default configuration of 150 classes? In the recent past, we usually ran this perftest with a

Re: Build failed in Jenkins: SystemML-DailyTest #761

2017-01-23 Thread Matthias Boehm
g/jira/browse/SYSTEMML-541 under: https://issues.apache.org/jira/browse/SYSTEMML-1188 Thanks, Glenn [image: Inactive hide details for Matthias Boehm ---01/21/2017 02:21:04 AM---Let's keep the test, collect the used seeds, and fix it. T]Matthias Boehm ---01/21/2017 02:21:04 AM---Let's keep th

Re: SystemML optimizer design

2017-01-17 Thread Matthias Boehm
Hi Dylan, these are very interesting questions - let me answer them one by one: 0. SPOOF: We developed the SPOOF compiler framework in a separate fork that will be integrated back into SystemML master soon. Initially, we will add the code generation part as an experimental feature, likely in

Re: Time To Release 0.13

2017-01-17 Thread Matthias Boehm
I agree with Arvind here as the 8GB case would mostly run as singlenode, in-memory operations and not test the Spark 2.x integration. Regards, Matthias On 1/17/2017 5:33 AM, Arvind Surve wrote: We are planning to have 80GB testing for 0.13 release (to support Spark 2.0). It will add couple

[DISCUSS] Roadmap SystemML 1.0

2017-01-03 Thread Matthias Boehm
I'd like to initiate the discussion of a concrete roadmap for our next release. According, to previous discussions, I'd think it's fair to say that we agree on calling it SystemML 1.0. We should carefully plan this release as it's an opportunity to change APIs and remove some older deprecated

Re: Performance differences between SystemML LibMatrixMult and Breeze with native BLAS

2016-12-01 Thread Matthias Boehm
require setting up a more "scientific" benchmark suite than my little test here. Felix Am 01.12.2016 01:00 schrieb Matthias Boehm: ok, then let's sort this out one by one 1) Benchmarks: There are a couple of things we should be aware of for these native/java benchmarks. First, please

Re: Performance differences between SystemML LibMatrixMult and Breeze with native BLAS

2016-11-30 Thread Matthias Boehm
,1000,1000,100)x(false,1000,1000,100) in 251.290325 MM k=8 (false,1000,1000,100)x(false,1000,1000,100) in 265.851277 MM k=8 (false,1000,1000,100)x(false,1000,1000,100) in 240.902494 Am 01.12.2016 00:08 schrieb Matthias Boehm: Could you please make sure you're comparing the right

Re: Performance differences between SystemML LibMatrixMult and Breeze with native BLAS

2016-11-30 Thread Matthias Boehm
Could you please make sure you're comparing the right thing. Even on old sandy bridge CPUs our matrix mult for 1kx1k usually takes 40-50ms. We also did the same experiments with larger matrices and SystemML was about 2x faster compared to Breeze. Please decomment the timings in

Re: Build and distribution related issues for GPU support

2016-11-24 Thread Matthias Boehm
the cuda compiler that ships with that version of the toolkit and compile the .cu files in the project and commit the resulting .ptx files. Thoughts, comments? -Nakul On Wed, Nov 23, 2016 at 2:43 PM, Matthias Boehm <mboe...@googlemail.com> wrote: thanks for sharing Nakul. Could you

Re: Build and distribution related issues for GPU support

2016-11-23 Thread Matthias Boehm
thanks for sharing Nakul. Could you please also comment on the PTX story for custom kernels and different PTX versions? Regards, Matthias On 11/23/2016 10:13 PM, Nakul Jindal wrote: Hi, SystemML has experimental GPU support, which we are working to solidify. Currently, GPU is supported in CP

Re: Parfor semantics

2016-11-23 Thread Matthias Boehm
model over the full dataset using a mini-batch SGD approach. Has the `parfor` construct been used for this purpose before? -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. On Nov 22, 2016, at 2:01 PM, Matthias Boehm <m

Re: Parfor semantics

2016-11-22 Thread Matthias Boehm
: The constrained optimizer doesn't seem to know about a REMOTE_SPARK execution mode and either sets CP or REMOTE_MR. I can open a jira for that and provide a fix. Felix Am 22.11.2016 02:07 schrieb Matthias Boehm: yes, this came up several times - initially we only supported opt=NONE where users

Re: Parfor semantics

2016-11-21 Thread Matthias Boehm
yes, this came up several times - initially we only supported opt=NONE where users had to specify all other parameters. Meanwhile, there is a so-called "constrained optimizer" that does the same as the rule-based optimizer but respects any given parameters. Please try something like this:

Re: [DISCUSS] Adding tensorboard-like functionality to SystemML

2016-10-28 Thread Matthias Boehm
Thanks for putting this together Niketan. However, could we please postpone this discussion after our 1.0 release? Right now, I'm concerned to see that we're adding many experimental features without really getting them done. This includes for example, the GPU backend, the new MLContext API,

Re: Local versions of Linear Algebra Operators in DML

2016-10-24 Thread Matthias Boehm
functions at compile time depending on what intermediates they produce ... Meaning you may still end up with java heap space OOM at runtime. Regards, Berthold Reinwald IBM Almaden Research Center office: (408) 927 2208; T/L: 457 2208 e-mail: reinw...@us.ibm.com From: Matthias Boehm <m

Re: Local versions of Linear Algebra Operators in DML

2016-10-24 Thread Matthias Boehm
mail: npansar At us.ibm.com http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar Matthias Boehm ---10/21/2016 01:00:51 PM---thanks Nakul for reaching out before starting work on this. Actually, the introduction of these CP- From: Matthias Boehm <mboe...@googlemail.com> To: dev@systemml

Re: Local versions of Linear Algebra Operators in DML

2016-10-21 Thread Matthias Boehm
thanks Nakul for reaching out before starting work on this. Actually, the introduction of these CP-only builtin functions was a big mistake because (as you already mentioned) they mistakenly suggest that we provide distributed operations for them too. The intend was to support them in later

Re: [VOTE] SystemML New Logo Ideas

2016-10-21 Thread Matthias Boehm
this is. Deron On Fri, Oct 21, 2016 at 12:15 PM, Matthias Boehm <mboe...@googlemail.com> wrote: Thanks for these proposals. For all the options, I'd prefer to remove the TM - it's just a little odd for an open source project with no intentions to register a trademark. I know, the new Spark lo

Re: [VOTE] SystemML New Logo Ideas

2016-10-21 Thread Matthias Boehm
Thanks for these proposals. For all the options, I'd prefer to remove the TM - it's just a little odd for an open source project with no intentions to register a trademark. I know, the new Spark logo has it too but it's probably a different context, especially since there are discussions to

Re: [VOTE] Apache SystemML 0.11.0-incubating (RC2)

2016-10-10 Thread Matthias Boehm
I hate to say it, but -1. There have been a couple of important fixes since we've cut the rc and unfortunately, additional (so far unresolved) blocking issues showed up. In detail the fixed issues are: * SYSTEMML-1023: Fix csv line parsing (the quote-aware column-splitting was hanging on a

Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)

2016-10-05 Thread Matthias Boehm
Apache SystemML 0.11.0-incubating (RC1) Imran has opened Jira 1013. -Arvind From: Matthias Boehm <mbo...@us.ibm.com> To: dev@systemml.incubator.apache.org Sent: Tuesday, October 4, 2016 5:43 PM Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1) ok, SYSTEMML-1009 has

Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)

2016-10-04 Thread Matthias Boehm
> end, we are ready to go. > > -Mike > > -- > > Mike Dusenberry > GitHub: github.com/dusenberrymw > LinkedIn: linkedin.com/in/mikedusenberry > > Sent from my iPhone. > > > > On Oct 4, 2016, at 2:02 PM, Matthias Boehm <mbo...@us.ibm.com> wrote: >

Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)

2016-10-04 Thread Matthias Boehm
rry > > Sent from my iPhone. > > > > On Oct 2, 2016, at 8:35 PM, Matthias Boehm <mbo...@us.ibm.com> wrote: > > > > yes, I just closed them - I left them open for Mike to confirm, but we > resolved all known issues yesterday together. We should be good to go. >

Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)

2016-10-02 Thread Matthias Boehm
folks forgot to clode the jiras ? Or are there things that still need to be handled here ? On Sat, Oct 1, 2016 at 2:41 PM, Matthias Boehm <mbo...@us.ibm.com> wrote: > ok the blocking issues SYSTEMML-993, 994, and 995 have been resolved - > from my perspective we're ready to cut a n

Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)

2016-10-01 Thread Matthias Boehm
ok the blocking issues SYSTEMML-993, 994, and 995 have been resolved - from my perspective we're ready to cut a new RC. Regards, Matthias From: Matthias Boehm/Almaden/IBM@IBMUS To: dev@systemml.incubator.apache.org Date: 09/29/2016 10:44 PM Subject:Re: [VOTE] Apache SystemML

Re: Enhancing SystemML JavaDocs

2016-09-30 Thread Matthias Boehm
actually, I would prefer to leave the empty (automatically generated) javadoc comments - at least in eclipse, this provides a better overview of parameters and exceptions. Regards, Matthias From: Deron Eriksson To: dev@systemml.incubator.apache.org Date:

Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)

2016-09-28 Thread Matthias Boehm
-1, unfortunately, SYSTEMML-964 and SYSTEMML-968 are blocking the release right now but we should be able to resolve them by tomorrow. Regards, Matthias From: Luciano Resende To: dev@systemml.incubator.apache.org Date: 09/28/2016 11:53 AM Subject:[VOTE]

[DISCUSS] SystemML releases 0.11 and 1.0

2016-09-19 Thread Matthias Boehm
Hi all, we already discussed and agreed that it would be good to make our next release relatively soon. However, there was also a discussion around making the major 1.0 release but this would require substantially more time because it is our opportunity to remove APIs and cleanup the language.

Re: [DISCUSS] Migration to Spark 2.0.0

2016-08-24 Thread Matthias Boehm
e able to focus most > > of our efforts towards the future rather than the past. > > > > Deron > > > > > > On Thu, Aug 4, 2016 at 10:59 AM, Luciano Resende <luckbr1...@gmail.com> > > wrote: > > > > > That was going to be my suggestion..

Re: [DISCUSS] Migration to Spark 2.0.0

2016-08-04 Thread Matthias Boehm
I would recommend to start an investigation if we could support both the 1.x and 2.x lines with a single code base. It seems feasible to refactor the code a bit, compile against 2.0 (or with profiles), and run on either 1.6 or 2.0. For example, by creating a wrapper that implements both Iterable

Re: Draft for August monthly report

2016-08-02 Thread Matthias Boehm
this looks already pretty good - thanks Deron for pulling it together. Furthermore, you could include the following paper, published July 29: Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large-Scale Machine Learning, PVLDB 9

SystemML at BOSS'16

2016-08-01 Thread Matthias Boehm
just FYI: there will be a SystemML tutorial at the BOSS workshop, co-located with VLDB 2016: https://research.cs.wisc.edu/dbworld/messages/2016-08/1470069574.html Regards, Matthias

Re: Restricted Boltzmann Machine scripts

2016-07-08 Thread Matthias Boehm
thanks for reaching out Nikolay, 1) Scripts: Could you please create a PR to add them to /scripts/staging? This is the place we typically use to share new scripts. Once they are tested for accuracy and runtime, we would migrate them into scripts/algorithms along with some basic documentation.

Re: print a value in a frame?

2016-07-05 Thread Matthias Boehm
2016 at 1:11 PM, Matthias Boehm <mbo...@us.ibm.com> wrote: > quick correction: I meant to say, option 2 because you have a frame of > strings (option 3 is only possible if you have numeric/boolean data). Btw, > it's fixed now - so please go ahead and give it a try. Thanks. > > >

Re: print a value in a frame?

2016-07-03 Thread Matthias Boehm
dev@systemml.incubator.apache.org Date: 06/29/2016 01:40 PM Subject:Re: print a value in a frame? Thanks for the quick reply. I'll use the toString() for now (for a unit test). Deron On Wed, Jun 29, 2016 at 1:28 PM, Matthias Boehm <mbo...@us.ibm.com> wrote: > optio

Re: print a value in a frame?

2016-06-29 Thread Matthias Boehm
option 3 is possible but probably needs a fix. Alternatively, you can use print(toString(M)) which is implemented similar to the matrix toString(). Regards, Matthias From: Deron Eriksson To: dev@systemml.incubator.apache.org Date: 06/29/2016 01:23 PM Subject:

Re: Build failed in Jenkins: SystemML-DailyTest #338

2016-06-24 Thread Matthias Boehm
tps://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/338/changes> Changes: From: jenk...@spark.tc To: Michael W Dusenberry/San Francisco/IBM@IBMUS, lrese...@apache.org, dev@systemml.incubator.apache.org, Matthias Boehm/Almaden/IBM@IBMUS Date: 06/24/2016 12:32 AM Subject: Build failed in J

Re: [VOTE] Apache SystemML 0.10.0-incubating (RC2)

2016-06-01 Thread Matthias Boehm
+1, but if there is a third rc, let us please create a branch or cut the release as of today to ensure no new features are leaking in. Regards, Matthias From: Luciano Resende To: dev@systemml.incubator.apache.org Date: 05/31/2016 10:05 PM Subject:[VOTE]

Re: Executing DMLScript in Eclipse on Windows

2016-05-27 Thread Matthias Boehm
just put the following parameters into the VM arguments of your run configuration: -Dhadoop.home.dir=\src\test\config\hadoop_bin_windows -Djava.library.path=\src\test\config\hadoop_bin_windows\bin Regards, Matthias From: Deron Eriksson To:

Re: Starting a SystemML 0.10 release?

2016-05-19 Thread Matthias Boehm
sounds good to me - in addition to PR167, I'd also like to get PR162 into this release. Furthermore, it would be good to run our full performance testsuite (at least up to 80GB) but this could be done on the RC too. Thanks guys for taking care of the release again. Regards, Matthias From:

Re: Citations

2016-05-12 Thread Matthias Boehm
Indeed, various of our ML algorithms [4] and our matrix multiplication chain rewrite [8] are based on existing textbook algorithms. This means that we implemented these artifacts (loosely) based on the ideas or pseudo-code described in these references but never directly took over existing code.

Re: Refactor ML code logic to reduce duplicate codes

2016-04-24 Thread Matthias Boehm
that is a good point - the compilation chain is indeed replicated in various places (DMLScript, JMLC, MLContext, Debugger, and potentially new MLContext). However, it is not a plain code duplication but differently composed compilation chains and slightly different primitives (e.g., read script

Re: parfor fails

2016-04-16 Thread Matthias Boehm
the local server, and it subdirectories named '_p22748_127.0.0.1' etc. It looks like other SystemML jobs had no trouble writing to it. The stderr and one failed MR log is attached. Thanks, Ethan On Thu, Apr 14, 2016 at 11:14 PM, Matthias Boehm <mbo...@us.ibm.com> wrote: just for completeness,

Re: 'sample.dml' replaces rows with 0's

2016-04-14 Thread Matthias Boehm
well, it looks like an issue of incorrect meta data propagation (wrong propagation of dimensions through mr pmm instructions). The data itself looks good if I write a 20% sample to textcell (what is used in our testsuite). @Shirish: thanks for looking into it. Just fyi, while testing this on an

Re: parfor fails

2016-04-14 Thread Matthias Boehm
just for completeness, this issue is tracked with https://issues.apache.org/jira/browse/SYSTEMML-635 and the fix will be available tomorrow. Regards, Matthias From: Matthias Boehm/Almaden/IBM@IBMUS To: dev@systemml.incubator.apache.org Cc: "Ethan Xu" <ethan.yifa...@gm

Re: machine learning - Some tests failure when build systemML project -StackOverflow

2016-04-12 Thread Matthias Boehm
- Original ------ From: "Matthias Boehm";<mbo...@us.ibm.com>; Date: Tue, Apr 12, 2016 01:18 PM To: "dev"<dev@systemml.incubator.apache.org>; Cc: "葡萄??"<281165...@qq.com>; Subject: Re: machine learning - Some tests failure when b

Re: Change commons-math3 to compile scope?

2016-04-06 Thread Matthias Boehm
well, we don't want to get into having multiple commons math versions in the classpath and newer hadoop distributions have it by default. So I would rather add it to a trouble shooting guide. Alternatively, we could have two different 'distribution' profiles for releases. Regards, Matthias

Re: Discussion SYSTEMML-593 MLContext Resign

2016-04-04 Thread Matthias Boehm
iks...@gmail.com> To: dev@systemml.incubator.apache.org Date: 04/04/2016 02:38 PM Subject:Re: Discussion SYSTEMML-593 MLContext Resign Hi Matthias, On Sat, Apr 2, 2016 at 9:34 PM, Matthias Boehm <mbo...@us.ibm.com> wrote: > > Also rather than introducing anoth

Re: Gxuides about running SystemML by spark cluster

2016-04-03 Thread Matthias Boehm
ion on the selection process. You can evaluate this model on a hold out test set or run some form of cross validation. However, keep in mind that for accuracy experiments, you might want to be very careful with random data. Regards, Matthias From: Wenjie Zhuang <ka...@vt.edu> To: Matth

Re: Gxuides about running SystemML by spark cluster

2016-04-02 Thread Matthias Boehm
too. Regards, Matthias From: Wenjie Zhuang <ka...@vt.edu> To: dev@systemml.incubator.apache.org Cc: Matthias Boehm/Almaden/IBM@IBMUS Date: 04/02/2016 07:50 PM Subject:Re: Gxuides about running SystemML by spark cluster Hi, I try to run StepLinearRegDS.dml by spar

Re: Remove "Scratch Space" In Favor Of Temp Folder

2016-04-02 Thread Matthias Boehm
just to clarify, the configuration 'scratch' (remote tmp working directory) is a user-defined configuration coming out of SystemML-config.xml with internal default set to ./scratch_space if not specified and it is always accessed as dfs (which depending on your hadoop configuration might use

Re: Logical indexing?

2016-03-31 Thread Matthias Boehm
just a quick correction of option 2: Ind = (X[,1]>10); Y = removeEmpty(target=X, select=Ind); Regards, Matthias From: Matthias Boehm/Almaden/IBM@IBMUS To: dev@systemml.incubator.apache.org Date: 03/31/2016 10:14 AM Subject:Re: Logical indexing? that's a good quest

Re: Logical indexing?

2016-03-31 Thread Matthias Boehm
that's a good question - no SystemML does not support set indexing yet but you can emulate it via permutation matrices or similar transformations. Here are some examples: # option 1: via permutation (aka selection) matrices P = removeEmpty(target=diag(X[,1]>10), margin="rows"); Y = P %*% X; #

Design discussion distributed frame representations

2016-03-27 Thread Matthias Boehm
Hi all, I just added the initial design of our distributed frame representations to the related JIRA https://issues.apache.org/jira/browse/SYSTEMML-560. Any comments are very welcome! Regards, Matthias

  1   2   >