Re: Next SystemML Release

2015-11-20 Thread Matthias Boehm
first week of December for our 0.8.1 release sounds good to me. Apart from bug fixes and performance features, the focus should be mostly on increasing robustness of our spark backend. Since 0.8.0, we already added important features like "partitioned broadcasts" (in order to overcome the 2GB bro

Re: How to contribute SystemML

2015-12-01 Thread Matthias Boehm
Hi Tatsuya, thanks for your interest, we'd love to help you get started. Although we do have various APIs, including MLContext that allows you to invoke DML scripts from Spark's interactive shell, we don't have an actual REPL interface yet. Niketan built an initial prototype of a related API. @N

Runtime package refactoring

2015-12-04 Thread Matthias Boehm
Hi all, just a quick heads-up, I'd like to do a refactoring of our runtime package. The goals are (1) to separate out all mr-related classes (cleanup), and (2) to prepare our core matrix block runtime for packaging as an individual jar which would make it consumable as a small-footprint library.

Re: Runtime package refactoring

2015-12-05 Thread Matthias Boehm
e: 12/05/2015 01:13 PM Subject:Re: Runtime package refactoring On Fri, Dec 4, 2015 at 5:16 PM, Matthias Boehm wrote: > > > Hi all, > > just a quick heads-up, I'd like to do a refactoring of our runtime package. > The goals are (1) to separate out all mr-relate

Re: Apache JIRA project key for SystemML

2015-12-11 Thread Matthias Boehm
I'm fine with either one but I would prefer SYSML in order to keep common prefixes of commit messages as short as possible. Regards, Matthias From: Deron Eriksson To: dev@systemml.incubator.apache.org Date: 12/11/2015 09:03 AM Subject:Apache JIRA project key for SystemML Hi

Re: Is there any equivalent of kron function in DML ?

2015-12-15 Thread Matthias Boehm
Hi Sourav, well, we do not support Kronecker products K = kron(X,Y) yet. However, as a workaround you could indeed create a custom permutation matrix P from X, where P[1:nrow(Y),] = diag(matrix(as.scalar(X[1,1]),nrow(Y),1)) P[nrow(Y)+1:2*nrow(Y),] = diag(matrix(as.scalar(X[1,2]),nrow(Y),1)) ...

Re: Is there any equivalent of kron function in DML ?

2015-12-15 Thread Matthias Boehm
bug in certain algebraic simplification rewrites. I'll deliver the fix in the next couple of days, which will allow (table(seq(1,N), Svl, Xvl, N, nrow(Y))) as well. Regards, Matthias From: Matthias Boehm/Almaden/IBM@IBMUS To: dev@systemml.incubator.apache.org Date: 12/15/2015 09:

Re: parser.Token class and related exceptions

2015-12-16 Thread Matthias Boehm
Hi Deron, thanks for brining up this topic - I'm strongly in favor of this cleanup. Originally, 'ParseException' was a JavaCC generated class and the purpose of DMLParseException was to extend its functionality without modifying the generated class. When we removed JavaCC we simplified ParseExcep

Re: DML example on main SystemML website

2015-12-16 Thread Matthias Boehm
please include the computation of the objective function (for convergence checks or at least print outs). Thanks. while( iter < max_iterations ){ iter = iter + 1; H = (H * (t(W) %*% (V/(W%*%H/t(colSums(W)); W = (W * ((V/(W%*%H)) %*% t(H)))/t(rowSums(H)); obj =

Re: [DISCUSS] Project Roadmap

2015-12-21 Thread Matthias Boehm
>From my perspective, our roadmap for 2016 should cover the following SystemML engine extensions with regard to runtime (R), optimizer (O), as well as language and tools (L). Each sub-bullet in the following list will be further broken down into multiple JIRAs. R1) Extended Scale-Up Backend * Sup

Re: [DISCUSS] Project Roadmap

2015-12-31 Thread Matthias Boehm
3) academic papers (4) blog posts (5) post information to forums such as stackoverflow Deron On Mon, Dec 21, 2015 at 3:09 AM, Matthias Boehm wrote: > From my perspective, our roadmap for 2016 should cover the following > SystemML engine extensions with regard to runtime (R), optimize

Re: [DISCUSS] Project Roadmap

2016-01-01 Thread Matthias Boehm
racy and scalability of the DML algorithm, in addition to SystemML's customizability applied to a real-world business case. It also showcased the power of DML utilizing a very compact piece of code. Deron On Thu, Dec 31, 2015 at 9:36 AM, Matthias Boehm wrote: > That's a good p

Cleanup SparkTC/systemml repository

2016-01-02 Thread Matthias Boehm
Hi all, I'd like to delete our old SparkTC/systemml repository because it's causing unnecessary confusion and it's anyway outdated. For example, even "developerWorks Open" is still referring to the old repository. @Luciano: Could you please delete the SparkTC/systemml repository if nobody object

Re: POC Eclipse IDE for DML

2016-01-02 Thread Matthias Boehm
just to clarify, our parser grammar does not include builtin functions because they are -- in contrast to keywords and language constructs -- not part of the DML/PyDML syntax. This is important for both maintainability and flexibility. For example, it allows you to define a variable or user-define

Re: POC Eclipse IDE for DML

2016-01-02 Thread Matthias Boehm
function and I will be less confused. :-) Maybe I am overthinking things. The DML Language Reference does contain a tremendous amount of useful information with regards to the language capabilities. Deron On Sat, Jan 2, 2016 at 12:25 PM, Matthias Boehm wrote: > just to clarify, our

Re: January 2016 SystemML Incubator Podling Report - Draft

2016-01-05 Thread Matthias Boehm
from contributors, committers have discussed these pull requests with their contributors, and contributions have been merged into the project. Matthias Boehm has presented talks regarding the SystemML Optimizer at TU Dresden, HTW Dresden, and TU Berlin. How has the project developed sinc

Re: Cleanup SparkTC/systemml repository

2016-01-06 Thread Matthias Boehm
lease grant me access to see > the private repository? > > Thanks! > Deron > > > On Sun, Jan 3, 2016 at 9:32 PM, Luciano Resende > wrote: > >> On Sat, Jan 2, 2016 at 12:11 PM, Matthias Boehm >> wrote: >> >> > >> > Hi all, >> >

Re: Cleanup SparkTC/systemml repository

2016-01-08 Thread Matthias Boehm
:48:42 PM---On Wednesday, January 6, 2016, Matthias Boehm wrote: > Actually, I'd like to pro From: Luciano Resende To: "dev@systemml.incubator.apache.org" Date: 01/06/2016 09:48 PM Subject: Re: Cleanup SparkTC/systemml repository On Wednesday, January 6, 2016, Mat

Re: [DISCUSS] Project Roadmap

2016-01-09 Thread Matthias Boehm
ons [SYSTEMML-450] Extended spark interfaces Regards, Matthias From: Matthias Boehm/Almaden/IBM@IBMUS To: dev@systemml.incubator.apache.org Date: 01/01/2016 03:40 PM Subject:Re: [DISCUSS] Project Roadmap thanks for the comments Deron. The note on additional algorithms, howe

Re: Starting a SystemML 0.9 release

2016-01-11 Thread Matthias Boehm
great - thanks everybody. Let's get these two fixes in and close the release. Until that point, please no new features. The version number 0.9 is fine with me since it's not really a pure maintenance release as many new features went in too. Down the road, however, we need to think about release b

Re: SystemML JIRA Site Is Live!

2016-01-12 Thread Matthias Boehm
Could we please disable sending notifications for every JIRA update to our dev list? Thanks. Regards, Matthias From: Deron Eriksson To: dev@systemml.incubator.apache.org Date: 01/08/2016 01:31 PM Subject:Re: SystemML JIRA Site Is Live! Apparently it is being handled by htt

Re: SystemML JIRA Site Is Live!

2016-01-12 Thread Matthias Boehm
; > Work Logged On Issue > > Work Started On Issue > > Work Stopped On Issue > > Issue Worklog Updated > > Issue Worklog Deleted > > Generic Event > > The following event had no notifications: > > Issue Comment Deleted > > > > Deron &

Re: SystemML JIRA Site Is Live!

2016-01-14 Thread Matthias Boehm
fications. I posted the old scheme to the > > INFRA-10714 > > >>> comments. > > >>> > > >>> Sounds like there is a general consensus regarding the notifications. > > 15 > > >>> minutes ago I added a comment on > > >>

New sparse matrix block representation

2016-01-14 Thread Matthias Boehm
Just a heads-up: the week after our 0.9 release, I'll make major changes to our core matrix block library by introducing a new abstraction for sparse matrix blocks (SYSTEMML-377). This will affect all operations of all backends. The benefit is a more memory-efficient representation, which will si

Re: Starting a SystemML 0.9 release

2016-01-15 Thread Matthias Boehm
3D%202015-10-27 > > > > And was wondering if we could all move this to 0.9 release. > > > > Could someone please help me verify. > > > > Thanks > > > > > > On Mon, Jan 11, 2016 at 11:43 AM, Luciano Resende > > wrote: > > > &g

Re: Starting a SystemML 0.9 release

2016-01-17 Thread Matthias Boehm
ng jekyll so I'm not sure if that would >> help anyone if the *.md files aren't included in the release distributions. >> Or is this a different README? >> >> Deron >> >> >> On Fri, Jan 15, 2016 at 12:00 PM, Matthias Boehm >> wrote: >> >&

Re: Workflow for assigning issues to users

2016-01-20 Thread Matthias Boehm
well, I would prefer to use common sense here instead of a fixed workflow. There are many different scenarios of external contributions w/ or w/o JIRA - one size does not fit all. In general, we should create an atmosphere were we encourage people to contribute. However, if a new user likes to tak

Re: SystemML Hadoop version support

2016-01-20 Thread Matthias Boehm
Just to clarify because there seems to be some confusion: we are supporting any Spark version >=1.4 and Hadoop version >=2.4. The only reason why our build sticks to older versions is to avoid exploiting new features too eagerly which would loose backwards compatibility if done without care. With

Re: [VOTE] Release SystemML 0.9.0-incubating (RC3)

2016-01-30 Thread Matthias Boehm
+1 Just for the backlog of a potential maintenance release 0.9.1 - here is the list of fixes that we would need to backport: 0a2b587 Fix automatic vectorization of left indexing chains (size updates) ad3fa90 Fix spark matrix-scalar builtin instructions log/log_nz (opcode checks) 10d1afc Fix erro

Re: Compatibility with MR1 Cloudera cdh4.2.1

2016-02-04 Thread Matthias Boehm
well, we did indeed not run on MR v1 for a while now. However, I don't want to get that far and say we don't support it anymore. I'll fix this particular issue by tomorrow. In the next couple of weeks we should run our full performance testsuite (for broad coverage) over an MR v1 cluster and syst

Re: Compatibility with MR1 Cloudera cdh4.2.1

2016-02-04 Thread Matthias Boehm
udera.com/artifactory/repo/org/apache/hadoop/hadoop-common/2.4.1/ ) does appear to have the getDouble method. It's possible that adding that jar to your classpath may fix your problem, as Shirish pointed out. It sounds like Matthias may have another fix. Deron On Thu, Feb 4, 2016 at 6:40

Re: Compatibility with MR1 Cloudera cdh4.2.1

2016-02-05 Thread Matthias Boehm
4.3 > > 1.4.1 > > > > > > > > > > > > Am I supposed to modify the hadoop.version before build? > > > > Thanks again, > > > > Ethan > > > > > > > > On Fri, Feb 5, 201

Re: Compatibility with MR1 Cloudera cdh4.2.1

2016-02-05 Thread Matthias Boehm
Child$4.run(Child.java:268) >at java.security.AccessController.doPrivileged(Native Method) >at javax.security.auth.Subject.doAs(Subject.java:415) > at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1408) >at

Re: Project folder structure

2016-02-09 Thread Matthias Boehm
-1 I don't see a compelling argument for this unnecessary change to a more complex project structure just to follow Spark which is not directly comparable - both in project size and content. For example, our algorithms are at the same time a library of algorithms as well as samples for how to wri

Re: Turn off parallelism in parfor?

2016-02-12 Thread Matthias Boehm
yes, this is because the parfor optimizer overwrites this specification. You could use either one of the following 1) "parfor(opt=NONE, par=1)" (disables optimization, uses defaults, and overwrites the specified parameters) 2) "parfor(opt=CONSTRAINED, par=1)" (optimizes as usual under the constrai

Re: Matrix Market format with metadata file

2016-02-15 Thread Matthias Boehm
The meta data file is still useful in order to get the format. In case of matrix market, errors will be raised if included meta data is inconsistent. So no, we should not disallow to specify the meta data. In general, we anyway recommend using text (textcell) instead mm (matrix market) for scalabi

Re: Add Apache Flink as new backend

2016-03-02 Thread Matthias Boehm
Thanks guys, for sharing the details of this prototype. In general, I really like the idea of having a Flink backend in SystemML. We just need to structure the code (similar to our Spark backend) in a way that Flink libraries are not necessarily required when running in Spark or MapReduce execution

Re: DMLRuntimeException

2016-03-19 Thread Matthias Boehm
thanks Deron for bringing this up. Generally, I'm in favor of this change since it simplifies our internal APIs. The behavior should not change as we're already very careful about propagating exceptions all the way up to the APIs. One important thing, however, is to keep the concatenation of line

Buffer pool integration of frames

2016-03-19 Thread Matthias Boehm
Hi all, just a heads-up: in the next couple of days, I'll introduce the basic buffer pool integration of frames (SYSTEMML-567) and generalize our existing buffer pool (caching framework for matrices) along the way. This might destabilize SystemML as it affects all operations of all backends and

Re: Creating a new SystemML maintenance release: 0.9.1

2016-03-21 Thread Matthias Boehm
ok here is the list of fixes to backport for the 0.9.1 release (in reverse chronological order) - any volunteers? 1) Source code fixes: * [SYSTEMML-585] Fix JMLC connection to disable any multi-threaded ops https://github.com/apache/incubator-systemml/commit/59a4a50acb9432f2cb976e5c26fd3daea323a

Design discussion distributed frame representations

2016-03-27 Thread Matthias Boehm
Hi all, I just added the initial design of our distributed frame representations to the related JIRA https://issues.apache.org/jira/browse/SYSTEMML-560. Any comments are very welcome! Regards, Matthias

Re: Logical indexing?

2016-03-31 Thread Matthias Boehm
that's a good question - no SystemML does not support set indexing yet but you can emulate it via permutation matrices or similar transformations. Here are some examples: # option 1: via permutation (aka selection) matrices P = removeEmpty(target=diag(X[,1]>10), margin="rows"); Y = P %*% X; # op

Re: Logical indexing?

2016-03-31 Thread Matthias Boehm
just a quick correction of option 2: Ind = (X[,1]>10); Y = removeEmpty(target=X, select=Ind); Regards, Matthias From: Matthias Boehm/Almaden/IBM@IBMUS To: dev@systemml.incubator.apache.org Date: 03/31/2016 10:14 AM Subject:Re: Logical indexing? that's a good quest

Re: Remove "Scratch Space" In Favor Of Temp Folder

2016-04-02 Thread Matthias Boehm
just to clarify, the configuration 'scratch' (remote tmp working directory) is a user-defined configuration coming out of SystemML-config.xml with internal default set to ./scratch_space if not specified and it is always accessed as dfs (which depending on your hadoop configuration might use diffe

Discussion SYSTEMML-593 MLContext Resign

2016-04-02 Thread Matthias Boehm
thanks Deron for initiating the discussion around the rework of our MLContext API (https://issues.apache.org/jira/browse/SYSTEMML-593). Here are a couple of thoughts: (1) Simplicity: Given that the primary usecase of MLContext calls a script exactly once, I'm wondering if the separation into Scr

Re: Logical indexing?

2016-04-02 Thread Matthias Boehm
= (X[,1] > 10);' is acceptable, so aggregation would work with ind = (X[,1] > 10) + 1; F = aggregate(target = X[,2], groups = ind, fn = "sum"); Ethan On Thu, Mar 31, 2016 at 1:22 PM, Matthias Boehm wrote: > just a quick correction of option 2: > > Ind =

Re: Gxuides about running SystemML by spark cluster

2016-04-02 Thread Matthias Boehm
too. Regards, Matthias From: Wenjie Zhuang To: dev@systemml.incubator.apache.org Cc: Matthias Boehm/Almaden/IBM@IBMUS Date: 04/02/2016 07:50 PM Subject:Re: Gxuides about running SystemML by spark cluster Hi, I try to run StepLinearRegDS.dml by spark yarn mode today

Re: Gxuides about running SystemML by spark cluster

2016-04-03 Thread Matthias Boehm
n on the selection process. You can evaluate this model on a hold out test set or run some form of cross validation. However, keep in mind that for accuracy experiments, you might want to be very careful with random data. Regards, Matthias From: Wenjie Zhuang To: Matthias Boehm/Almaden/I

Re: Gxuides about running SystemML by spark cluster

2016-04-04 Thread Matthias Boehm
n on the selection process. You can evaluate this model on a hold out test set or run some form of cross validation. However, keep in mind that for accuracy experiments, you might want to be very careful with random data. Regards, Matthias From: Wenjie Zhuang To: Matthias Boehm/Almaden/I

Re: Gxuides about running SystemML by spark cluster

2016-04-04 Thread Matthias Boehm
There are no practically relevant size restrictions. Also, if there are issues, please share some more information on it. Thanks. Regards, Matthias From: Wenjie Zhuang To: Matthias Boehm/Almaden/IBM@IBMUS Cc: dev@systemml.incubator.apache.org Date: 04/04/2016 04:37 AM Subject

Re: Discussion SYSTEMML-593 MLContext Resign

2016-04-04 Thread Matthias Boehm
dev@systemml.incubator.apache.org Date: 04/04/2016 02:38 PM Subject:Re: Discussion SYSTEMML-593 MLContext Resign Hi Matthias, On Sat, Apr 2, 2016 at 9:34 PM, Matthias Boehm wrote: > > Also rather than introducing another exception class, couldn't we just > reuse DML

Re: Gxuides about running SystemML by spark cluster

2016-04-05 Thread Matthias Boehm
position parameters (to be used with -args). So if you invoke the script with "-args foo bar", $1 and $2 refer to foo and bar respectively. Regards, Matthias From: Wenjie Zhuang To: Matthias Boehm/Almaden/IBM@IBMUS Cc: dev@systemml.incubator.apache.org Date: 04/05/201

Re: Change commons-math3 to compile scope?

2016-04-06 Thread Matthias Boehm
well, we don't want to get into having multiple commons math versions in the classpath and newer hadoop distributions have it by default. So I would rather add it to a trouble shooting guide. Alternatively, we could have two different 'distribution' profiles for releases. Regards, Matthias Fro

Re: machine learning - Some tests failure when build systemML project - Stack Overflow

2016-04-11 Thread Matthias Boehm
well the error is not coming from R but from SystemML's runtime. Could you please provide the full stacktrace to see what is going on here? Regards, Matthias From: 281165...@qq.com To: "dev" Date: 04/11/2016 08:22 PM Subject:machine learning - Some tests failure when build sys

Re: machine learning - Some tests failure when build systemML project -Stack Overflow

2016-04-11 Thread Matthias Boehm
d you like help to see it? -- Original -- From: "Matthias Boehm";; Date: Tue, Apr 12, 2016 11:31 AM To: "dev"; Cc: "葡萄爸爸"<281165...@qq.com>; Subject: Re: machine learning - Some tests failure when build systemML project -Stack O

Re: machine learning - Some tests failure when build systemML project -StackOverflow

2016-04-11 Thread Matthias Boehm
.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) -- Original --

Re: parfor fails

2016-04-14 Thread Matthias Boehm
Hi Ethan, thanks for catching this issue. The parfor script itself is perfectly fine but you encountered an interesting runtime bug. Usually, you can find the actual cause at the bottom of the stacktrace or in previous exceptions. I was able to reproduce this issue if NO systemml config file is p

Re: parfor fails

2016-04-14 Thread Matthias Boehm
just for completeness, this issue is tracked with https://issues.apache.org/jira/browse/SYSTEMML-635 and the fix will be available tomorrow. Regards, Matthias From: Matthias Boehm/Almaden/IBM@IBMUS To: dev@systemml.incubator.apache.org Cc: "Ethan Xu" Date: 04/14/201

Re: 'sample.dml' replaces rows with 0's

2016-04-14 Thread Matthias Boehm
well, it looks like an issue of incorrect meta data propagation (wrong propagation of dimensions through mr pmm instructions). The data itself looks good if I write a 20% sample to textcell (what is used in our testsuite). @Shirish: thanks for looking into it. Just fyi, while testing this on an u

Re: parfor fails

2016-04-16 Thread Matthias Boehm
server, and it subdirectories named '_p22748_127.0.0.1' etc. It looks like other SystemML jobs had no trouble writing to it. The stderr and one failed MR log is attached. Thanks, Ethan On Thu, Apr 14, 2016 at 11:14 PM, Matthias Boehm wrote: just for completeness, this issue is tracked with h

Re: remove castAsScalar?

2016-04-21 Thread Matthias Boehm
Let's be careful not to unnecessarily break backwards compatibility. How about we collect all instances of language builtin functions that we want to remove and clean them up with our 1.0 release later this year? There are other instances like ppred that do not exist in R and meanwhile redundant i

Re: Refactor ML code logic to reduce duplicate codes

2016-04-24 Thread Matthias Boehm
that is a good point - the compilation chain is indeed replicated in various places (DMLScript, JMLC, MLContext, Debugger, and potentially new MLContext). However, it is not a plain code duplication but differently composed compilation chains and slightly different primitives (e.g., read script fr

Re: ALS Algorithm

2016-05-06 Thread Matthias Boehm
ALS iteratively computes updates of left/right factors in an alternating fashion. The script runs "max_iter*2" iterations because we conservatively count a full update of both factors as an iteration. However, a jira/fix for improving the script documentation is certainly useful. Regards, Matthia

Re: Citations

2016-05-12 Thread Matthias Boehm
Indeed, various of our ML algorithms [4] and our matrix multiplication chain rewrite [8] are based on existing textbook algorithms. This means that we implemented these artifacts (loosely) based on the ideas or pseudo-code described in these references but never directly took over existing code. I

Re: package dml files in folders in jars for 0.10.0?

2016-05-18 Thread Matthias Boehm
+1 for the cleanup of packaged scripts. You might want to also remove obsolete algorithms. Regards, Matthias From: Niketan Pansare/Almaden/IBM@IBMUS To: dev@systemml.incubator.apache.org Date: 05/18/2016 01:21 PM Subject:Re: package dml files in folders in jars for 0.10.0? Hi

Re: Starting a SystemML 0.10 release?

2016-05-19 Thread Matthias Boehm
sounds good to me - in addition to PR167, I'd also like to get PR162 into this release. Furthermore, it would be good to run our full performance testsuite (at least up to 80GB) but this could be done on the RC too. Thanks guys for taking care of the release again. Regards, Matthias From: L

Re: Formalize a release candidate review process?

2016-05-23 Thread Matthias Boehm
as Deron mentioned, running all experiments up to 80GB is a good compromise. Over the weekend, I ran exactly that on Spark 1.6.1 and it took less than a day. This approach would allow us to run MR and different Spark versions instead. Regarding the original mail, I think we can deduplicate the li

Re: synchronized prints?

2016-05-23 Thread Matthias Boehm
a simple workaround is to place an "if(1==1){}" cut between the prints to force their order. Regards, Matthias From: Deron Eriksson To: dev@systemml.incubator.apache.org Date: 05/23/2016 01:55 PM Subject:synchronized prints? Hi, Is there a way to make sure that print state

Re: synchronized prints?

2016-05-23 Thread Matthias Boehm
din.com/in/mikedusenberry Sent from my iPhone. > On May 23, 2016, at 2:02 PM, Matthias Boehm wrote: > > a simple workaround is to place an "if(1==1){}" cut between the prints to force their order. > > Regards, > Matthias > > Deron Eriksson ---05/23/2016 01:55:17 PM-

Re: Fw: Questions/query about recode / transform in systemML

2016-05-24 Thread Matthias Boehm
M on 05/23/2016 10:04 PM - From: Matthias Boehm/Almaden/IBM To: Alok Singh/San Francisco/IBM@IBMUS Cc: Arvind Surve/San Jose/IBM@IBMUS Date: 05/23/2016 09:02 PM Subject:Re: Questions/query about recode / transform in systemML Hi Alok, would you mind posting this ques

Re: [VOTE] Apache SystemML 0.10.0-incubating (RC1)

2016-05-24 Thread Matthias Boehm
+1 In detail, I ran our performance testsuite on Spark 1.6.1 for data sizes {80MB, 800MB, 8GB, 80GB}, sparse/dense, intercept 0/1/2, and the algorithm classes binomial (Mlogreg, L2SVM, MSVM), multinomial (Mlogreg, MSVM, Naive Bayes), regression (LinregCG, LinregDS, GLM poisson-log, GLM gamma-log,

Re: Discussion on GPU backend

2016-05-24 Thread Matthias Boehm
Generally, I think we should really stick to (3) as done in the past, i.e., bring up major features in the roadmap discussions, create jira epics and try to break them into rather isolated tasks. This works for almost any major/minor feature. The only exception are features, where it is initially

Re: missing release candidate checksums?

2016-05-24 Thread Matthias Boehm
good catch Deron - and I agree, adding the checksums should be fine. Regards, Matthias From: Deron Eriksson To: dev@systemml.incubator.apache.org Date: 05/24/2016 08:42 PM Subject:Re: missing release candidate checksums? In my opinion, I don't think restarting the vote is n

Re: [VOTE] Apache SystemML 0.10.0-incubating (RC1)

2016-05-26 Thread Matthias Boehm
eleasemanagement.html> > > > > === > > == How can I help test this release? == > > === > > If you are a SystemML user, you can help us test this release by taking > an > > existing Algorithm or workl

Re: Executing DMLScript in Eclipse on Windows

2016-05-27 Thread Matthias Boehm
just put the following parameters into the VM arguments of your run configuration: -Dhadoop.home.dir=\src\test\config\hadoop_bin_windows -Djava.library.path=\src\test\config\hadoop_bin_windows\bin Regards, Matthias From: Deron Eriksson To: dev@systemml.incubator.apache.org Date: 05/2

Re: [VOTE] Apache SystemML 0.10.0-incubating (RC2)

2016-06-01 Thread Matthias Boehm
+1, but if there is a third rc, let us please create a branch or cut the release as of today to ensure no new features are leaking in. Regards, Matthias From: Luciano Resende To: dev@systemml.incubator.apache.org Date: 05/31/2016 10:05 PM Subject:[VOTE] Apache SystemML 0.10.0-

Default execution modes

2016-06-03 Thread Matthias Boehm
just FYI - so far our default exec mode was always hybrid (cp+mr), no matter if invoked in a hadoop client or spark driver process. Following Mike's suggestion (SYSTEMML-490), we now automatically switch the default exec mode to hybrid_spark (cp+spark), if invoked in the spark driver process. In

Re: Release notes for 0.10.0

2016-06-13 Thread Matthias Boehm
ok, here is a first draft of the release notes - please feel free to extend or prune this: a) [SYSTEMML-377] Different types of spark matrix blocks * Supported internal formats: MCSR (default), CSR, COO * Automatic MCSR->CSR on Spark read/caching (for memory efficiency) * Automatic MCSR->CS

Re: Release notes for 0.10.0

2016-06-14 Thread Matthias Boehm
; T/L: 457 2208 e-mail: reinw...@us.ibm.com From: Matthias Boehm/Almaden/IBM@IBMUS To: dev@systemml.incubator.apache.org Date: 06/13/2016 11:25 PM Subject:Re: Release notes for 0.10.0 ok, here is a first draft of the release notes - please feel free to extend or prune this: a) [SY

Re: incubator-systemml git commit: [HOTFIX] Replacing CSV file with Windows line-endings with version that uses Unix line-endings.

2016-06-14 Thread Matthias Boehm
well, this was my fault; I'm using the EGit plugin (which ignores this file) and I've forgotten to manually double check the line separators. Regards, Matthias From: Luciano Resende To: dev@systemml.incubator.apache.org Date: 06/14/2016 02:25 PM Subject:Re: incubator-systemml

Re: Build failed in Jenkins: SystemML-DailyTest #332

2016-06-21 Thread Matthias Boehm
just FYI - this seems to be a race condition (that occasionally leads to checksum errors), caused by the recent update-in-place extension and I'm looking into it. Regards, Matthias From: jenk...@spark.tc To: Michael W Dusenberry/San Francisco/IBM@IBMUS, lrese...@apache.org, d

[DISCUSS] Version-specific documentation

2016-06-21 Thread Matthias Boehm
In the context of SYSTEMML-554, we aim to introduce native frame data type support. While porting the file-based transform, I intend to drop the existing transform scaling functionality (mean substraction, z-scoring) as it is more naturally expressed over matrices. However, this change raises a g

BOSS - Public Vote Starting

2016-06-23 Thread Matthias Boehm
just FYI - here is the voting link for the SystemML tutorial at the BOSS workshop, co-located with VLDB'16: http://goo.gl/forms/k7yit0y5pUkz6BhO2 Regards, Matthias

Re: Build failed in Jenkins: SystemML-DailyTest #338

2016-06-24 Thread Matthias Boehm
t; https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/338/changes> Changes: From: jenk...@spark.tc To: Michael W Dusenberry/San Francisco/IBM@IBMUS, lrese...@apache.org, dev@systemml.incubator.apache.org, Matthias Boehm/Almaden/IBM@IBMUS Date: 06/24/2016 12:32 AM Subject: Build failed

Re: print a value in a frame?

2016-06-29 Thread Matthias Boehm
option 3 is possible but probably needs a fix. Alternatively, you can use print(toString(M)) which is implemented similar to the matrix toString(). Regards, Matthias From: Deron Eriksson To: dev@systemml.incubator.apache.org Date: 06/29/2016 01:23 PM Subject:print a value in a

Re: print a value in a frame?

2016-07-03 Thread Matthias Boehm
e.org Date: 06/29/2016 01:40 PM Subject:Re: print a value in a frame? Thanks for the quick reply. I'll use the toString() for now (for a unit test). Deron On Wed, Jun 29, 2016 at 1:28 PM, Matthias Boehm wrote: > option 3 is possible but probably needs a fix. Alternatively

Re: print a value in a frame?

2016-07-05 Thread Matthias Boehm
ing: Caused by: java.lang.ClassCastException: org.apache.sysml.runtime.controlprogram.caching.FrameObject cannot be cast to org.apache.sysml.runtime.controlprogram.caching.MatrixObject at org.apache.sysml.hops.recompile.LiteralReplacement.replaceLiteralValueTypeCastRightIndexing (LiteralReplacement.java:306) Deron On Sun, Jul 3, 201

Re: Restricted Boltzmann Machine scripts

2016-07-08 Thread Matthias Boehm
thanks for reaching out Nikolay, 1) Scripts: Could you please create a PR to add them to /scripts/staging? This is the place we typically use to share new scripts. Once they are tested for accuracy and runtime, we would migrate them into scripts/algorithms along with some basic documentation. Tha

[DISCUSS] SystemML 0.11 release

2016-07-27 Thread Matthias Boehm
Soon, we'll be done with the native frame support and various API changes. This seems to be a good point in time to create our next 0.11 release. What do you think? In case the majority is in favor, let's collect the open features and issues here in this thread. Regards, Matthias

SystemML at BOSS'16

2016-08-01 Thread Matthias Boehm
just FYI: there will be a SystemML tutorial at the BOSS workshop, co-located with VLDB 2016: https://research.cs.wisc.edu/dbworld/messages/2016-08/1470069574.html Regards, Matthias

Re: Draft for August monthly report

2016-08-02 Thread Matthias Boehm
this looks already pretty good - thanks Deron for pulling it together. Furthermore, you could include the following paper, published July 29: Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold Reinwald: Compressed Linear Algebra for Large-Scale Machine Learning, PVLDB 9

Re: [DISCUSS] Migration to Spark 2.0.0

2016-08-04 Thread Matthias Boehm
I would recommend to start an investigation if we could support both the 1.x and 2.x lines with a single code base. It seems feasible to refactor the code a bit, compile against 2.0 (or with profiles), and run on either 1.6 or 2.0. For example, by creating a wrapper that implements both Iterable a

Re: [DISCUSS] Migration to Spark 2.0.0

2016-08-24 Thread Matthias Boehm
h minimal inconvenience. > > > > However, I would lean towards Fred's approach (Spark 1.6 release followed > > shortly by a Spark 2 release). If possible, I want to be able to focus most > > of our efforts towards the future rather than the past. > > > >

Re: [DISCUSS] Apache SystemML Release 1.0.0

2016-08-25 Thread Matthias Boehm
I'm still not fully convinced that we need to drop Spark 1.x support, instead of supporting both 1.x and 2.x. I would appreciate if we could first conclude the discussion around migrating to Spark 2.0. Furthermore, I think that creating a dependency to Spark versioning would unnecessarily complic

Simplification of MLContext and related APIs

2016-09-11 Thread Matthias Boehm
It's great to see the ongoing progress on MLContext and related APIs. However, one aspect that really concerns me is the creation of many redundant data types and exposition of various internal data structures. For example, exposing MatrixObject and FrameObject at API level is dangerous because i

Re: Simplification of MLContext and related APIs

2016-09-12 Thread Matthias Boehm
uld be > > justifiable with rationale. > > I have introduced FrameObject as oversight. It should have been private > > method instead of public method. I can fix it soon. But there are more > > changes you have proposed I will let Deron to respond. > > Thanks for catchi

Changed Binary Format of Frames

2016-09-16 Thread Matthias Boehm
just a quick heads-up, I'm about to the change the serialized representation of our frame blocks. The implication is that we lose binary backwards compatibility to previously materialized frames in binary block format. However, there is a workaround: convert the existing data with an old jar to s

[DISCUSS] SystemML releases 0.11 and 1.0

2016-09-19 Thread Matthias Boehm
Hi all, we already discussed and agreed that it would be good to make our next release relatively soon. However, there was also a discussion around making the major 1.0 release but this would require substantially more time because it is our opportunity to remove APIs and cleanup the language.

Re: Proof of Concept: Embedded Scala DSL

2016-09-24 Thread Matthias Boehm
thanks for sharing the summary - this is very nice. While looking over the example, I had the following questions: 1) Output handling: It would be great to see an example how the results of Algorithm.execute() are consumed. Do you intend to hand out our binary matrix representation or MLContext's

Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)

2016-09-28 Thread Matthias Boehm
-1, unfortunately, SYSTEMML-964 and SYSTEMML-968 are blocking the release right now but we should be able to resolve them by tomorrow. Regards, Matthias From: Luciano Resende To: dev@systemml.incubator.apache.org Date: 09/28/2016 11:53 AM Subject:[VOTE] Apache SystemML 0.11.0-

  1   2   3   >