at 3:09 PM, Matthias Boehm <mboe...@googlemail.com>
wrote:
> ok thanks for sharing - I'll have a look later this week.
>
> Regards,
> Matthias
>
> On Mon, May 8, 2017 at 2:20 PM, Mingyang Wang <miw...@eng.ucsd.edu> wrote:
>
>> Hi Matthias,
>>
>>
p of 33.8 GB to disk (1 time so far)
> 17/05/08 13:20:20 INFO ExternalSorter: Thread 116 spilling in-memory
> map of 31.2 GB to disk (1 time so far)
>
> ...
>
> 17/05/08 13:24:50 INFO ExternalAppendOnlyMap: Thread 116 spilling
> in-memory map of 26.9 GB to disk (1 time so far)
, time, count):
> > -- 1) sp_uak+ 92.597 sec 1
> > -- 2) sp_chkpoint 0.377 sec 1
> > -- 3) == 0.001 sec 1
> > -- 4) print 0.000 sec 1
> > -- 5) + 0.000 sec 1
> > -- 6) castdts 0.000 sec 1
> > -- 7) createvar 0.000 sec 3
> > -- 8) rmvar 0.000 sec 7
>
to summarize, this was an issue of selecting serialized representations
for large ultra-sparse matrices. Thanks again for sharing your feedback
with us.
1) In-memory representation: In CSR every non-zero will require 12 bytes
- this is 240MB in your case. The overall memory consumption,
thanks Deron for centralizing this discussion, as this could help to
avoid redundancy spread across many individual JIRAs and PRs. Overall, I
think it would be good to agree on individual style guides for DML and
Java.
I'm fine with using spaces for DML scripts because they are rarely
definitely +1 from me, although I think we already agreed upon that by
properly deprecating this API in previous releases.
Regards,
Matthias
On Mon, May 1, 2017 at 6:55 PM, Nakul Jindal wrote:
> +1
>
> Nakul
>
> On Mon, May 1, 2017 at 5:37 PM, wrote:
jit chakraborty <ak...@hotmail.com>
> Sent: Saturday, April 22, 2017 12:45 PM
> To: dev@systemml.incubator.apache.org
> Subject: Re: Randomly Selecting rows from a dataframe
>
> Thank you Matthias! You are most helpful!
>
>
> Thanks again!
>
> Arijit
>
> _
as I commented on one of these github comments, I'm strongly against
these kind of unnecessary messages because they distract from the actual
discussions. I already had to change my notification settings
accordingly - essentially I'm not watching SystemML's PR activity any
more.
Regards,
if your values in matrix2 are aligned as in your example, then you can do
the following (which works for arbitrary values in matrix1 but you could
simplify it if you have just 1s):
matrix1 = matrix1*(matrix2==0) + (matrix2!=0)*2;
The only problematic case would be special values such NaNs in
yes, we already do constant folding - the details are in
org.apache.sysml.hops.rewrite.RewriteConstantFolding
In order to ensure consistency with our runtime, we actually generate
instructions for these sub dags, execute them and finally replace the dag
with the computed literal.
Regards,
+1
I ran large-scale experiments on Spark 2.1 for L2SVM, GLM, MLogreg,
LinregCG, LinregDS, and PCA over scaled versions of MNIST and ImageNet (up
to 1TB, with uncompressed and compressed linear algebra) without any
issues.
Compared to previous experiments with SystemML 0.11 and Spark 1.6, I've
+1
I ran large-scale experiments on Spark 2.1 for L2SVM, GLM, MLogreg,
LinregCG, LinregDS, and PCA over scaled versions of MNIST and ImageNet (up
to 1TB, with uncompressed and compressed linear algebra) without any
issues.
Compared to previous experiments with SystemML 0.11 and Spark 1.6, I've
-- Forwarded message --
From: Matthias Boehm <mboe...@googlemail.com>
Date: Sat, Apr 22, 2017 at 4:23 PM
Subject: Re: Questions about the Compositions of Execution Time
To: Mingyang Wang <miw...@eng.ucsd.edu>
with the latest change from today there should not be muc
well, for arguments passed into dml scripts there is of course ifdef($b, 2)
but for functions there is indeed no good support. At runtime level we
still support default parameters for scalar arguments at the tail of the
parameter list but I guess at one point the corresponding parser support
was
you can take for example a 1% sample of rows via a permutation matrix
(specifically selection matrix) as follows
I = (rand(rows=nrow(X), cols=1, min=0, max=1) <= 0.01);
P = removeEmpty(target=diag(I), margin="rows");
Xsample = P %*% X;
or via removeEmpty and selection vector
I =
no, right now, we don't support structs or complex objects.
Regards,
Matthias
On 4/21/2017 4:17 AM, arijit chakraborty wrote:
Hi,
In R (as well as in python), we can store values list within list. Say I've 2
matrix with different dimensions,
x <- matrix(1:10, ncol=2)
y <- matrix(1:5,
The input vectors to table are interpreted as row indexes and column
indexes, respectively. Without weights, we add 1, otherwise the
corresponding weight value to the output cells.
So in your example you have constant row indexes of 1 but a seq(1,10)
for column indexes and hence you get a
On Thu, Apr 20, 2017 at 11:44 AM, Matthias Boehm <mboe...@googlemail.com>
wrote:
> 1) Understanding execution plans: Our local bufferpool reads matrices in a
> lazy manner on the first singlenode, i.e., CP, operation that tries to pin
> the matrix into memory. Similarly, distributed
le
read/write script (it took quite a long time and failed).
Regards,
Mingyang
On Thu, Apr 20, 2017 at 2:08 AM Matthias Boehm <mboe...@googlemail.com>
wrote:
Hi Mingyang,
thanks for the questions - this is very valuable feedback. I was able to
reproduce your performance issue on scenario 1
, at 8:32 AM, Berthold Reinwald <reinw...@us.ibm.com> wrote:
This is awesome!
Regards,
Berthold Reinwald
IBM Almaden Research Center
office: (408) 927 2208; T/L: 457 2208
e-mail: reinw...@us.ibm.com
From: Matthias Boehm <mboe...@googlemail.com>
To: dev@systemml.incubator.apa
In general, there are a couple of scenarios which make size propagation
challenging. This includes:
* Complex function call patterns (where functions are potentially called
with different sizes)
* External user-defined functions
* Data-dependent operators (e.g., table, aggregate, removeEmtpy);
*
These flags in the runtime plans (-explain runtime or recompile_runtime)
are indicators if the given input operand is a literal or not. Without
these flags we could not differentiate between literal strings and variable
names.
Regards,
Matthias
On Tue, Apr 18, 2017 at 12:20 PM,
if your data X is already ordered you can do the following:
I = rbind(matrix(1,1,1), (X[1:nrow(X)-1,]!=X[2:nrow(X),]));
dX = removeEmpty(target=X, margin="rows", select=I);
Regards,
Matthias
On 4/17/2017 8:40 AM, arijit chakraborty wrote:
Hi,
I've an issue regarding finding and removing the
I think SYSTEMML-1518 and SYSTEMML-1520 require a new RC and I agree that
we should create a 0.14 branch along with it to unblock ongoing
development. I'm happy to backport any additional fixes into this branch
until we have a solid release candidate.
Regards,
Matthias
On Thu, Apr 13, 2017 at
sorry, but -1 due to SYSTEMML-1464 and SYSTEMML-1459.
In detail, SYSTEMML-1464 is a blocker issue for me because it renders JMLC
model scoring of text inputs with tokens that contain spaces almost
unusable. Furthermore, SYSTEMML-1459 covers a rewrite issue that might
corrupt hop dags for special
7 2208
> e-mail: reinw...@us.ibm.com
>
>
>
> From: Matthias Boehm <mboe...@googlemail.com>
> To: dev@systemml.incubator.apache.org
> Date: 03/31/2017 08:17 PM
> Subject:Java compiler for code generation
>
>
>
> Hi all,
>
> currently, our new
Well, this would indeed be a very useful extension - I've actually seen
many use cases, where new users ran into issues with simple expressions
like X[i,i] = foo(). In the general case, the problem with UDFs is that
they can have - in contrast to builtin functions - multiple returns. These
hu, Mar 23, 2017 at 11:36 PM Matthias Boehm <mboe...@googlemail.com>
wrote:
well, after thinking some more about this issue, I have to correct myself
but the workarounds still apply. The problem is not the "in-memory reblock"
but the collect of the reblocked RDD, whic
sorry for the issues - I'll fix it with the next change.
Regards,
Matthias
On Thu, Mar 16, 2017 at 2:56 AM, <jenk...@spark.tc> wrote:
> See <https://sparktc.ibmcloud.com/jenkins/job/SystemML-
> DailyTest/870/changes>
>
> Changes:
>
> [Matthias Boehm] [SYSTEMML-1402
) now rather than waiting additional months. Also I would like
> to
> >> be able to correctly identify our next version in the online
> documentation.
> >>
> >>
> > How about just make SystemML Next and change the release name when we do
> > the relea
I could help doing this assessment. Btw, here is a working link:
https://community.apache.org/apache-way/apache-project-maturity-model.html
Regards,
Matthias
On Tue, Mar 7, 2017 at 1:38 PM, Luciano Resende
wrote:
> On Tue, Mar 7, 2017 at 11:59 AM, Arvind Surve
Hi all,
I'd like to drop the support for Java 6 and 7 in our SystemML 1.0 release.
Our build still refers to a java compliance level 6, which has not been
changed for more than 5 years now. Spark >= 1.5 anyway requires Java 7 and
there has been some discussion on removing Java 7 as well because
e contributors each month.
>
> If the overhead slows us down too much, then we can go to a slower release
> cycle.
>
> Deron
>
>
>
>
> On Thu, Jan 5, 2017 at 1:50 PM, <dusenberr...@gmail.com> wrote:
>
> > +1 for adopting a 1 month release cycle.
> >
>
Thanks for starting this discussion Luciano. I think it's a good point in
time to graduate SystemML as we have shown readiness by creating an open
and positive community, and it would send a great signal to potential new
users and developers. From my perspective, we should aim for a top-level
Could we please change the target version to 1.0 instead of 0.14 to make
clear that master is now open for 1.0 features?
Regards,
Matthias
On Mon, Feb 20, 2017 at 12:08 PM, wrote:
> Repository: incubator-systemml
> Updated Branches:
> refs/heads/master 07f26ca4e ->
excellent - thanks for the quick fix Deron.
Regards,
Matthias
On 2/21/2017 1:09 AM, Deron Eriksson wrote:
Note that MLContext has been updated to log a warning rather than throw an
exception to the user for Spark versions previous to 2.1.0.
Deron
On Mon, Feb 20, 2017 at 2:29 PM, Matthias
Going toward to our 1.0 release, I'd like to create consistency across our
weighted statistics. Conceptually, theses weights represent frequency
counts, i.e., multiplicities of input values.
So far, our documentation does not state any restrictions on these weights
but some runtime operations
ad 1: t(-*): ternary minus mult (for patterns like X-s*Y)
ad 2: ua(+RC): unary aggregate with aggregation function + (at runtime
level you will see k+ for Kahan plus) and direction RC, i.e., full
aggregate over rows and columns.
ad 3: lix: matrix or frame left indexing (for patterns like
neck, thus leading to the creation of
> SYSTEMML-1140. Specifically, what did you use to attempt to reproduce 1140?
>
>
> -Mike
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
org.apache.sysml.test.integration.functions.transform.TransformCSVFrameEncodeReadTest
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 29.677 sec
- in
org.apache.sysml.test.integration.functions.transform.TransformCSVFrameEncodeReadTest
On Sun, Feb 12, 2017 at 12:26 AM, Matthias Boehm <mboe...@googlemail.com>
wrote:
could someone pleas
While debugging our mnist_lenet script, I encountered an issue with our
namespace handling with imports. Here is the related function call graph
(after inlining):
FUNCTION CALL GRAPH
--MAIN PROGRAM
.\mnist_lenet.dml::train
--.\nn/layers/dropout.dml::forward
just a little heads-up: I intend to the remove the recently added
workaround flags DISABLE_SPARSE and DISABLE_CACHING because any
underlying issues should be directly addressed. Furthermore, I was not
able the reproduce the issues reported in SYSTEMML-1140, probably due to
improvements that
ins/job/SystemML-DailyTest/805/changes>
Changes:
[Matthias Boehm] [SYSTEMML-1244] Fix robustness csv text read (quoted recoded
maps)
[Matthias Boehm] [SYSTEMML-1243] Fix size update wdivmm/wsigmoid/wumm on rewrite
[Matthias Boehm] [SYSTEMML-1248] Fix loop rewrite update-in-place (exclude
this is fine, but please make sure that it gets integrated into our
existing testsuite which can be run through maven or junit.
Regards,
Matthias
On 2/3/2017 9:10 PM, Deron Eriksson wrote:
+1 for enabling the Python tests in the test suite.
Since we use multiple languages and it's not always
optionally, we could include the following paper that we presented at
CIDR'17 in January.
Tarek Elgamal, Shangyu Luo, Mattias Boehm, Alexandre V. Evfimievski,
Shirish Tatikonda, Berthold Reinwald, Prithviraj Sen: SPOOF: Sum-Product
Optimization and Operator Fusion for Large-Scale Machine
Hi Nantia,
good question - so far the documentation of tools like explain and stats
is indeed very sparse. However, there are some overview slides from a
tutorial we gave last year at the BOSS workshop:
http://boss.dima.tu-berlin.de/media/BOSS16-Tutorial-mboehm.pdf
(slides 10-15)
If you
Thanks Glenn. Could you please also share the measurements (maybe in a
jira).
Furthermore, seeing that you ran only a subset of multinomial
experiments, makes me wonder if you used the current default
configuration of 150 classes? In the recent past, we usually ran this
perftest with a
g/jira/browse/SYSTEMML-541 under:
https://issues.apache.org/jira/browse/SYSTEMML-1188
Thanks,
Glenn
[image: Inactive hide details for Matthias Boehm ---01/21/2017 02:21:04
AM---Let's keep the test, collect the used seeds, and fix it. T]Matthias
Boehm ---01/21/2017 02:21:04 AM---Let's keep th
Hi Dylan,
these are very interesting questions - let me answer them one by one:
0. SPOOF: We developed the SPOOF compiler framework in a separate fork
that will be integrated back into SystemML master soon. Initially, we
will add the code generation part as an experimental feature, likely in
I agree with Arvind here as the 8GB case would mostly run as singlenode,
in-memory operations and not test the Spark 2.x integration.
Regards,
Matthias
On 1/17/2017 5:33 AM, Arvind Surve wrote:
We are planning to have 80GB testing for 0.13 release (to support Spark 2.0).
It will add couple
I'd like to initiate the discussion of a concrete roadmap for our next
release. According, to previous discussions, I'd think it's fair to say
that we agree on calling it SystemML 1.0. We should carefully plan this
release as it's an opportunity to change APIs and remove some older
deprecated
require setting up a more "scientific" benchmark
suite than my little test here.
Felix
Am 01.12.2016 01:00 schrieb Matthias Boehm:
ok, then let's sort this out one by one
1) Benchmarks: There are a couple of things we should be aware of for
these native/java benchmarks. First, please
,1000,1000,100)x(false,1000,1000,100) in 251.290325
MM k=8 (false,1000,1000,100)x(false,1000,1000,100) in 265.851277
MM k=8 (false,1000,1000,100)x(false,1000,1000,100) in 240.902494
Am 01.12.2016 00:08 schrieb Matthias Boehm:
Could you please make sure you're comparing the right
Could you please make sure you're comparing the right thing. Even on old
sandy bridge CPUs our matrix mult for 1kx1k usually takes 40-50ms. We
also did the same experiments with larger matrices and SystemML was
about 2x faster compared to Breeze. Please decomment the timings in
the cuda compiler that ships with that
version of the toolkit and compile the .cu files in the project and commit
the resulting .ptx files.
Thoughts, comments?
-Nakul
On Wed, Nov 23, 2016 at 2:43 PM, Matthias Boehm <mboe...@googlemail.com>
wrote:
thanks for sharing Nakul. Could you
thanks for sharing Nakul. Could you please also comment on the PTX story
for custom kernels and different PTX versions?
Regards,
Matthias
On 11/23/2016 10:13 PM, Nakul Jindal wrote:
Hi,
SystemML has experimental GPU support, which we are working to solidify.
Currently, GPU is supported in CP
model over the full dataset using a mini-batch SGD approach. Has the
`parfor` construct been used for this purpose before?
--
Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry
Sent from my iPhone.
On Nov 22, 2016, at 2:01 PM, Matthias Boehm <m
:
The constrained optimizer doesn't seem to know about a REMOTE_SPARK
execution mode and either sets CP or REMOTE_MR. I can open a jira for
that and provide a fix.
Felix
Am 22.11.2016 02:07 schrieb Matthias Boehm:
yes, this came up several times - initially we only supported opt=NONE
where users
yes, this came up several times - initially we only supported opt=NONE
where users had to specify all other parameters. Meanwhile, there is a
so-called "constrained optimizer" that does the same as the rule-based
optimizer but respects any given parameters. Please try something like this:
Thanks for putting this together Niketan. However, could we please
postpone this discussion after our 1.0 release? Right now, I'm concerned
to see that we're adding many experimental features without really
getting them done. This includes for example, the GPU backend, the new
MLContext API,
functions at compile time
depending on what intermediates they produce ... Meaning you may still end
up with java heap space OOM at runtime.
Regards,
Berthold Reinwald
IBM Almaden Research Center
office: (408) 927 2208; T/L: 457 2208
e-mail: reinw...@us.ibm.com
From: Matthias Boehm <m
mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
Matthias Boehm ---10/21/2016 01:00:51 PM---thanks Nakul for reaching out
before starting work on this. Actually, the introduction of these CP-
From: Matthias Boehm <mboe...@googlemail.com>
To: dev@systemml
thanks Nakul for reaching out before starting work on this. Actually,
the introduction of these CP-only builtin functions was a big mistake
because (as you already mentioned) they mistakenly suggest that we
provide distributed operations for them too. The intend was to support
them in later
this is.
Deron
On Fri, Oct 21, 2016 at 12:15 PM, Matthias Boehm <mboe...@googlemail.com>
wrote:
Thanks for these proposals. For all the options, I'd prefer to remove the
TM - it's just a little odd for an open source project with no intentions
to register a trademark. I know, the new Spark lo
Thanks for these proposals. For all the options, I'd prefer to remove
the TM - it's just a little odd for an open source project with no
intentions to register a trademark. I know, the new Spark logo has it
too but it's probably a different context, especially since there are
discussions to
I hate to say it, but -1. There have been a couple of important fixes since
we've cut the rc and unfortunately, additional (so far unresolved) blocking
issues showed up.
In detail the fixed issues are:
* SYSTEMML-1023: Fix csv line parsing (the quote-aware column-splitting was
hanging on a
Apache SystemML 0.11.0-incubating (RC1)
Imran has opened Jira 1013.
-Arvind
From: Matthias Boehm <mbo...@us.ibm.com>
To: dev@systemml.incubator.apache.org
Sent: Tuesday, October 4, 2016 5:43 PM
Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)
ok, SYSTEMML-1009 has
> end, we are ready to go.
>
> -Mike
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
> > On Oct 4, 2016, at 2:02 PM, Matthias Boehm <mbo...@us.ibm.com> wrote:
>
rry
>
> Sent from my iPhone.
>
>
> > On Oct 2, 2016, at 8:35 PM, Matthias Boehm <mbo...@us.ibm.com> wrote:
> >
> > yes, I just closed them - I left them open for Mike to confirm, but we
> resolved all known issues yesterday together. We should be good to go.
>
folks forgot to clode the jiras ? Or are there things that still need
to be handled here ?
On Sat, Oct 1, 2016 at 2:41 PM, Matthias Boehm <mbo...@us.ibm.com> wrote:
> ok the blocking issues SYSTEMML-993, 994, and 995 have been resolved -
> from my perspective we're ready to cut a n
ok the blocking issues SYSTEMML-993, 994, and 995 have been resolved - from
my perspective we're ready to cut a new RC.
Regards,
Matthias
From: Matthias Boehm/Almaden/IBM@IBMUS
To: dev@systemml.incubator.apache.org
Date: 09/29/2016 10:44 PM
Subject:Re: [VOTE] Apache SystemML
actually, I would prefer to leave the empty (automatically generated)
javadoc comments - at least in eclipse, this provides a better overview of
parameters and exceptions.
Regards,
Matthias
From: Deron Eriksson
To: dev@systemml.incubator.apache.org
Date:
-1, unfortunately, SYSTEMML-964 and SYSTEMML-968 are blocking the release
right now but we should be able to resolve them by tomorrow.
Regards,
Matthias
From: Luciano Resende
To: dev@systemml.incubator.apache.org
Date: 09/28/2016 11:53 AM
Subject:[VOTE]
Hi all,
we already discussed and agreed that it would be good to make our next
release relatively soon. However, there was also a discussion around making
the major 1.0 release but this would require substantially more time
because it is our opportunity to remove APIs and cleanup the language.
e able to focus
most
> > of our efforts towards the future rather than the past.
> >
> > Deron
> >
> >
> > On Thu, Aug 4, 2016 at 10:59 AM, Luciano Resende <luckbr1...@gmail.com>
> > wrote:
> >
> > > That was going to be my suggestion..
I would recommend to start an investigation if we could support both the
1.x and 2.x lines with a single code base. It seems feasible to refactor
the code a bit, compile against 2.0 (or with profiles), and run on either
1.6 or 2.0. For example, by creating a wrapper that implements both
Iterable
this looks already pretty good - thanks Deron for pulling it together.
Furthermore, you could include the following paper, published July 29:
Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, Berthold
Reinwald: Compressed Linear Algebra for Large-Scale Machine Learning, PVLDB
9
just FYI: there will be a SystemML tutorial at the BOSS workshop,
co-located with VLDB 2016:
https://research.cs.wisc.edu/dbworld/messages/2016-08/1470069574.html
Regards,
Matthias
thanks for reaching out Nikolay,
1) Scripts: Could you please create a PR to add them to /scripts/staging?
This is the place we typically use to share new scripts. Once they are
tested for accuracy and runtime, we would migrate them into
scripts/algorithms along with some basic documentation.
2016 at 1:11 PM, Matthias Boehm <mbo...@us.ibm.com> wrote:
> quick correction: I meant to say, option 2 because you have a frame of
> strings (option 3 is only possible if you have numeric/boolean data).
Btw,
> it's fixed now - so please go ahead and give it a try. Thanks.
>
>
>
dev@systemml.incubator.apache.org
Date: 06/29/2016 01:40 PM
Subject:Re: print a value in a frame?
Thanks for the quick reply. I'll use the toString() for now (for a unit
test).
Deron
On Wed, Jun 29, 2016 at 1:28 PM, Matthias Boehm <mbo...@us.ibm.com> wrote:
> optio
option 3 is possible but probably needs a fix. Alternatively, you can use
print(toString(M)) which is implemented similar to the matrix toString().
Regards,
Matthias
From: Deron Eriksson
To: dev@systemml.incubator.apache.org
Date: 06/29/2016 01:23 PM
Subject:
tps://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/338/changes>
Changes:
From: jenk...@spark.tc
To: Michael W Dusenberry/San Francisco/IBM@IBMUS, lrese...@apache.org,
dev@systemml.incubator.apache.org, Matthias Boehm/Almaden/IBM@IBMUS
Date: 06/24/2016 12:32 AM
Subject: Build failed in J
+1, but if there is a third rc, let us please create a branch or cut the
release as of today to ensure no new features are leaking in.
Regards,
Matthias
From: Luciano Resende
To: dev@systemml.incubator.apache.org
Date: 05/31/2016 10:05 PM
Subject:[VOTE]
just put the following parameters into the VM arguments of your run
configuration:
-Dhadoop.home.dir=\src\test\config\hadoop_bin_windows
-Djava.library.path=\src\test\config\hadoop_bin_windows\bin
Regards,
Matthias
From: Deron Eriksson
To:
sounds good to me - in addition to PR167, I'd also like to get PR162 into
this release. Furthermore, it would be good to run our full performance
testsuite (at least up to 80GB) but this could be done on the RC too.
Thanks guys for taking care of the release again.
Regards,
Matthias
From:
Indeed, various of our ML algorithms [4] and our matrix multiplication
chain rewrite [8] are based on existing textbook algorithms. This means
that we implemented these artifacts (loosely) based on the ideas or
pseudo-code described in these references but never directly took over
existing code.
that is a good point - the compilation chain is indeed replicated in
various places (DMLScript, JMLC, MLContext, Debugger, and potentially new
MLContext). However, it is not a plain code duplication but differently
composed compilation chains and slightly different primitives (e.g., read
script
the local server,
and it subdirectories named '_p22748_127.0.0.1' etc. It looks like other
SystemML jobs had no trouble writing to it.
The stderr and one failed MR log is attached.
Thanks,
Ethan
On Thu, Apr 14, 2016 at 11:14 PM, Matthias Boehm <mbo...@us.ibm.com> wrote:
just for completeness,
well, it looks like an issue of incorrect meta data propagation (wrong
propagation of dimensions through mr pmm instructions). The data itself
looks good if I write a 20% sample to textcell (what is used in our
testsuite).
@Shirish: thanks for looking into it. Just fyi, while testing this on an
just for completeness, this issue is tracked with
https://issues.apache.org/jira/browse/SYSTEMML-635 and the fix will be
available tomorrow.
Regards,
Matthias
From: Matthias Boehm/Almaden/IBM@IBMUS
To: dev@systemml.incubator.apache.org
Cc: "Ethan Xu" <ethan.yifa...@gm
- Original ------
From: "Matthias Boehm";<mbo...@us.ibm.com>;
Date: Tue, Apr 12, 2016 01:18 PM
To: "dev"<dev@systemml.incubator.apache.org>;
Cc: "葡萄??"<281165...@qq.com>;
Subject: Re: machine learning - Some tests failure when b
well, we don't want to get into having multiple commons math versions in
the classpath and newer hadoop distributions have it by default. So I would
rather add it to a trouble shooting guide. Alternatively, we could have two
different 'distribution' profiles for releases.
Regards,
Matthias
iks...@gmail.com>
To: dev@systemml.incubator.apache.org
Date: 04/04/2016 02:38 PM
Subject:Re: Discussion SYSTEMML-593 MLContext Resign
Hi Matthias,
On Sat, Apr 2, 2016 at 9:34 PM, Matthias Boehm <mbo...@us.ibm.com> wrote:
>
> Also rather than introducing anoth
ion on the selection
process. You can evaluate this model on a hold out test set or run some
form of cross validation. However, keep in mind that for accuracy
experiments, you might want to be very careful with random data.
Regards,
Matthias
From: Wenjie Zhuang <ka...@vt.edu>
To: Matth
too.
Regards,
Matthias
From: Wenjie Zhuang <ka...@vt.edu>
To: dev@systemml.incubator.apache.org
Cc: Matthias Boehm/Almaden/IBM@IBMUS
Date: 04/02/2016 07:50 PM
Subject:Re: Gxuides about running SystemML by spark cluster
Hi,
I try to run StepLinearRegDS.dml by spar
just to clarify, the configuration 'scratch' (remote tmp working directory)
is a user-defined configuration coming out of SystemML-config.xml with
internal default set to ./scratch_space if not specified and it is always
accessed as dfs (which depending on your hadoop configuration might use
just a quick correction of option 2:
Ind = (X[,1]>10);
Y = removeEmpty(target=X, select=Ind);
Regards,
Matthias
From: Matthias Boehm/Almaden/IBM@IBMUS
To: dev@systemml.incubator.apache.org
Date: 03/31/2016 10:14 AM
Subject:Re: Logical indexing?
that's a good quest
that's a good question - no SystemML does not support set indexing yet but
you can emulate it via permutation matrices or similar transformations.
Here are some examples:
# option 1: via permutation (aka selection) matrices
P = removeEmpty(target=diag(X[,1]>10), margin="rows");
Y = P %*% X;
#
Hi all,
I just added the initial design of our distributed frame representations to
the related JIRA
https://issues.apache.org/jira/browse/SYSTEMML-560. Any comments are very
welcome!
Regards,
Matthias
1 - 100 of 107 matches
Mail list logo