Re: [VOTE] Apache SystemML 0.10.0-incubating (RC2)

2016-06-01 Thread Shirish Tatikonda
+1 On Jun 1, 2016 12:40 AM, "Matthias Boehm" wrote: > +1, but if there is a third rc, let us please create a branch or cut the > release as of today to ensure no new features are leaking in. > > Regards, > Matthias > > [image: Inactive hide details for Luciano Resende

Re: "sparse" metadata attribute default value for writing csv

2016-02-16 Thread Shirish Tatikonda
Deron, It should be *false*. When we created that capability, the default value we started with was *true* (i.e., do not write out zeros) but then, after getting feedback from users, we changed the default to *false* (i.e., write out zeros). I guess the documentation was never updated. Good

Re: Matrix Market format with metadata file

2016-02-15 Thread Shirish Tatikonda
> > either it was a typo or the 4th row contains all zeros. > > > > > > > > On Mon, Feb 15, 2016 at 4:26 PM, Shirish Tatikonda < > > shirish.tatiko...@gmail.com> wrote: > > > > > Both "mm" and "text" formats are identical ex

Re: Matrix Market format with metadata file

2016-02-15 Thread Shirish Tatikonda
Btw (Just to be precise), in your example of "mm" file.. the metadata is "4 3 6" but the following non-zero values are only up to row number 3. So, either it was a typo or the 4th row contains all zeros. On Mon, Feb 15, 2016 at 4:26 PM, Shirish Tatikonda < shirish.tat

Re: Compatibility with MR1 Cloudera cdh4.2.1

2016-02-04 Thread Shirish Tatikonda
Hi Ethan, The getDouble() method is actually part of org.apache.hadoop.conf.Configuration.java, which is part of hadoop-common but not hadoop-core -- see [1]. Seems like, it used to be part of hadoop-core a long time ago. Also, the pom.xml in SystemML project does specify hadoop-common as the

Re: User friendly output of univariate statistics

2016-02-04 Thread Shirish Tatikonda
Just to clarify: the current output is actually a matrix, in which rows denote stats and columns denote input variables. So, the output you see is simply the univariate stats matrix in IJV format. In a general case, the primary data type for input/output and computations in SystemML is a *matrix

Re: [VOTE] Release SystemML 0.9-incubating (RC1)

2016-01-20 Thread Shirish Tatikonda
+1 On Tue, Jan 19, 2016 at 9:46 PM, Luciano Resende wrote: > Please vote on releasing the following candidate as Apache SystemML version > 0.9.0! > > The vote is open for at least 72 hours and will close on Saturday, January > 23 and passes if a majority of at least 3 +1

Re: DML example on main SystemML website

2015-12-16 Thread Shirish Tatikonda
Deron, Along with such a complete algorithm, we could also include one/two common and useful DML snippets. We could also create a "DML Cookbook" with such snippets and keep adding more over time. Some example snippets are below -- note that I created them quite a while back, and they may need

Re: Link from old GitHub project to incubator GitHub project

2015-12-09 Thread Shirish Tatikonda
Can we redirect to the new repo automatically? On Tue, Dec 8, 2015 at 3:43 PM, Deron Eriksson wrote: > I noticed that at the top of our old GitHub project page ( > https://github.com/SparkTC/systemml), there is a link that points to the > Apache project website. Down

Re: Using GLM-predict

2015-12-08 Thread Shirish Tatikonda
Hi Sourav, Yes, GLM-predict.dml gives out only the probabilities. You can put a threshold on the resulting probabilities to get the actual class labels -- for example, prob > 0.5 is positive and <=0.5 as negative. The exact value of threshold typically depends on the data and the application.