[jira] [Closed] (SYSTEMML-547) Implement built-in functions for max and average pooling

2016-05-27 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare closed SYSTEMML-547.

   Resolution: Fixed
Fix Version/s: SystemML 0.10

https://github.com/apache/incubator-systemml/commit/c334c2c85bc9cbb343e63b5b28ff3a1c5098c7fa

> Implement built-in functions for max and average pooling
> 
>
> Key: SYSTEMML-547
> URL: https://issues.apache.org/jira/browse/SYSTEMML-547
> Project: SystemML
>  Issue Type: New Feature
>  Components: Parser, Runtime
>Reporter: Niketan Pansare
>Assignee: Nakul Jindal
>Priority: Minor
> Fix For: SystemML 0.10
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> pool2d(input, pool_size, stride_length, border_mode="valid", pool_mode="max")
> Performs downscaling of the input matrix.
> The arguments to this function are:
> 1. input is a 2-dimensional matrix.
> 2. pool_size is a required integer parameter.
> 3. stride_length is an optional Int parameter. The default value is 1.
> 4. border_mode is an optional String parameter. The valid values are "same" 
> and "valid".
> 5. pool_mode is an optional String parameter. The valid values are "max" and 
> "avg". We can later add additional operators here (such as sum).
> For detailed documentation, see Theano's pool_2d function: 
> https://github.com/Theano/Theano/blob/master/theano/tensor/signal/pool.py#L40
> An an example, our pool2d(input=X, pool_size=2, stride_length=1, 
> border_mode="valid", pool_mode="avg") invocation is similar to Theano's 
> pool_2d(X, ds=(2,2), st=(1,1), ignore_border=True, padding=(0, 0), 
> mode="average_exc_pad")
> Since padding=(0,0) is the most common padding (probably the only one most 
> people will use), I thought of simplifying the interface by borrowing 
> concepts from TensorFlow's functions max_pool and avg_pool. See 
> https://www.tensorflow.org/versions/r0.7/api_docs/python/nn.html#avg_pool
> The above example will translate into following TensorFlow code:
> tf.nn.avg_pool(X, pool_size=(1,2,2,1), strides=(1,1,1,1), padding="VALID")
> Another good reference to understanding pooling operation is 
> http://cs231n.github.io/convolutional-networks/#pool
> [~mwdus...@us.ibm.com], [~nakul02], [~prithvi_r_s], [~reinw...@us.ibm.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-692) Generate DML from Caffe solver/net proto files

2016-05-27 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-692:
-
Issue Type: Task  (was: Epic)

> Generate DML from Caffe solver/net proto files
> --
>
> Key: SYSTEMML-692
> URL: https://issues.apache.org/jira/browse/SYSTEMML-692
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-719) Create proto file for Autoencoder and test the generated DML on MNIST dataset for accuracy

2016-05-27 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-719:


 Summary: Create proto file for Autoencoder and test the generated 
DML on MNIST dataset for accuracy
 Key: SYSTEMML-719
 URL: https://issues.apache.org/jira/browse/SYSTEMML-719
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare
Assignee: Niketan Pansare






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-720) Implement additional loss layers required by the Autoencoder proto

2016-05-27 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-720:


 Summary: Implement additional loss layers required by the 
Autoencoder proto
 Key: SYSTEMML-720
 URL: https://issues.apache.org/jira/browse/SYSTEMML-720
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare
Assignee: Niketan Pansare






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-704) Host jcu*.jar libraries on mvn repo

2016-05-27 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304772#comment-15304772
 ] 

Niketan Pansare commented on SYSTEMML-704:
--

As per the discussion on the dev mailing list, the PR 
https://github.com/apache/incubator-systemml/pull/165 will be merged when this 
issue is resolved.

> Host jcu*.jar libraries on mvn repo
> ---
>
> Key: SYSTEMML-704
> URL: https://issues.apache.org/jira/browse/SYSTEMML-704
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>Priority: Minor
>
> The PR https://github.com/apache/incubator-systemml/pull/165/ uses system 
> scope for jcu*.jar as they are not published on mvn central. Since we are 
> planning to include them into SystemML, it would be good to host them into a 
> repo we maintain and have provided scope instead. If for LICENSE or some 
> other reasons, we are not able to host them, I am fine with rejecting this 
> issue too. From jcuda's website "JCuda is published under the terms of the 
> MIT/X11 License".
> The current version depends on jcu*-0.7.5b.jar (except jcudnn-0.7.5.jar). The 
> jars are available for download from 
> http://www.jcuda.org/downloads/downloads.html. The source is available at 
> https://github.com/jcuda
> [~nakul02] [~deron] [~luciano resende]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-734) Implement additional loss layers required by the GoogleNet proto

2016-05-27 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-734:


 Summary: Implement additional loss layers required by the 
GoogleNet proto
 Key: SYSTEMML-734
 URL: https://issues.apache.org/jira/browse/SYSTEMML-734
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-692) Generate DML from Caffe solver/net proto files

2016-05-27 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-692:
-
Issue Type: Epic  (was: Task)

> Generate DML from Caffe solver/net proto files
> --
>
> Key: SYSTEMML-692
> URL: https://issues.apache.org/jira/browse/SYSTEMML-692
> Project: SystemML
>  Issue Type: Epic
>Reporter: Niketan Pansare
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-717) Create proto file for Lenet and test the generated DML on MNIST dataset for accuracy

2016-05-27 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304735#comment-15304735
 ] 

Niketan Pansare commented on SYSTEMML-717:
--

Created 
https://github.com/niketanpansare/incubator-systemml/blob/d9e6efaf297b1a22fcbe3eb0b7f75f07e19969db/samples/caffe/Lenet.proto
 based that generates DML file having network/parameters same as 
https://github.com/apache/incubator-systemml/blob/master/scripts/staging/lenet-train.dml
 ... [~prithvi_r_s]

> Create proto file for Lenet and test the generated DML on MNIST dataset for 
> accuracy
> 
>
> Key: SYSTEMML-717
> URL: https://issues.apache.org/jira/browse/SYSTEMML-717
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-730) Add fused GPU instructions for LSTM/RNN

2016-05-27 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-730:


 Summary: Add fused GPU instructions for LSTM/RNN
 Key: SYSTEMML-730
 URL: https://issues.apache.org/jira/browse/SYSTEMML-730
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare


When we decide to move to CuDNN v5, this will call the respective CuDNN 
functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-731) Conduct initial performance experiments for mat mult

2016-05-27 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-731:


 Summary: Conduct initial performance experiments for mat mult
 Key: SYSTEMML-731
 URL: https://issues.apache.org/jira/browse/SYSTEMML-731
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare
Assignee: Niketan Pansare


Before the PR https://github.com/apache/incubator-systemml/pull/165 gets 
merged, initial performance experiments needs to be conducted for dense-dense 
mat mult.

[~nakul02] [~mboehm7]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-717) Create proto file for Lenet and test the generated DML on MNIST dataset for accuracy

2016-05-27 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-717:


 Summary: Create proto file for Lenet and test the generated DML on 
MNIST dataset for accuracy
 Key: SYSTEMML-717
 URL: https://issues.apache.org/jira/browse/SYSTEMML-717
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (SYSTEMML-692) Create initial prototype for generating DML from Caffe solver/net proto files

2016-05-27 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare reassigned SYSTEMML-692:


Assignee: Niketan Pansare

> Create initial prototype for generating DML from Caffe solver/net proto files
> -
>
> Key: SYSTEMML-692
> URL: https://issues.apache.org/jira/browse/SYSTEMML-692
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-718) Implement generator for the layers used in Lenet proto

2016-05-27 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-718:


 Summary: Implement generator for the layers used in Lenet proto
 Key: SYSTEMML-718
 URL: https://issues.apache.org/jira/browse/SYSTEMML-718
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare
Assignee: Niketan Pansare


Implemented in the PR 
https://github.com/niketanpansare/incubator-systemml/tree/d9e6efaf297b1a22fcbe3eb0b7f75f07e19969db/src/main/java/org/apache/sysml/api/dl/layer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-725) Implement generator for the layers used in AlexNet proto

2016-05-27 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-725:


 Summary: Implement generator for the layers used in AlexNet proto
 Key: SYSTEMML-725
 URL: https://issues.apache.org/jira/browse/SYSTEMML-725
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare


An example of such a layer is LRN. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-726) Explore additional solver generators (for example: L-BFGS, Conjugate gradient)

2016-05-27 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-726:


 Summary: Explore additional solver generators (for example: 
L-BFGS, Conjugate gradient)
 Key: SYSTEMML-726
 URL: https://issues.apache.org/jira/browse/SYSTEMML-726
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-729) Add GPU instructions that utilizes CuDNN v4's conv2d and pooling related functions

2016-05-27 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-729:


 Summary: Add GPU instructions that utilizes CuDNN v4's conv2d and 
pooling related functions
 Key: SYSTEMML-729
 URL: https://issues.apache.org/jira/browse/SYSTEMML-729
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (SYSTEMML-732) Explore different memory management policy for GPU

2016-05-27 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304794#comment-15304794
 ] 

Niketan Pansare edited comment on SYSTEMML-732 at 5/27/16 9:17 PM:
---

In initial PR https://github.com/apache/incubator-systemml/pull/165, we are 
using a naive eviction policy.


was (Author: niketanpansare):
In initial PR, we are using a naive eviction policy.

> Explore different memory management policy for GPU
> --
>
> Key: SYSTEMML-732
> URL: https://issues.apache.org/jira/browse/SYSTEMML-732
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>
> The issues that needs to be addressed are:
> 1. Eviction policy 
> 2. Lazy/Eager synchronization between CP/GPU



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-732) Explore different memory management policy for GPU

2016-05-27 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304794#comment-15304794
 ] 

Niketan Pansare commented on SYSTEMML-732:
--

In initial PR, we are using a naive eviction policy.

> Explore different memory management policy for GPU
> --
>
> Key: SYSTEMML-732
> URL: https://issues.apache.org/jira/browse/SYSTEMML-732
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>
> The issues that needs to be addressed are:
> 1. Eviction policy 
> 2. Lazy/Eager synchronization between CP/GPU



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-721) Integrate Barista api into DMLScript for direct invocation

2016-05-27 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304745#comment-15304745
 ] 

Niketan Pansare commented on SYSTEMML-721:
--

This issue is dependent on the PR getting into the master: 
https://github.com/apache/incubator-systemml/pull/158

> Integrate Barista api into DMLScript for direct invocation
> --
>
> Key: SYSTEMML-721
> URL: https://issues.apache.org/jira/browse/SYSTEMML-721
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>
> Barista is the class that generates DML from Caffe proto.
> Once this task is completed, the user should be able to invoke a caffe proto 
> file using following command (as an example):
> hadoop jar SystemML.jar -f Caffe.proto -caffe 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-727) Add bufferpool integration logic to CUDA backend

2016-05-27 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-727:


 Summary: Add bufferpool integration logic to CUDA backend
 Key: SYSTEMML-727
 URL: https://issues.apache.org/jira/browse/SYSTEMML-727
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare


This work is done in the PR: 
https://github.com/apache/incubator-systemml/pull/165



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-733) Create proto file for GoogleNet and test the generated DML on ImageNet dataset for accuracy

2016-05-27 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-733:


 Summary: Create proto file for GoogleNet and test the generated 
DML on ImageNet dataset for accuracy
 Key: SYSTEMML-733
 URL: https://issues.apache.org/jira/browse/SYSTEMML-733
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-737) Explore Multi-GPU instructions on the driver

2016-05-29 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-737:


 Summary: Explore Multi-GPU instructions on the driver 
 Key: SYSTEMML-737
 URL: https://issues.apache.org/jira/browse/SYSTEMML-737
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-739) Explore model-parallel constructs in DML

2016-05-29 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-739:


 Summary: Explore model-parallel constructs in DML
 Key: SYSTEMML-739
 URL: https://issues.apache.org/jira/browse/SYSTEMML-739
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare


An example of such construct is providing access to the parameter server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-704) Host jcu*.jar libraries on mvn repo

2016-05-29 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306109#comment-15306109
 ] 

Niketan Pansare commented on SYSTEMML-704:
--

Alternate option is to use:

mavenized-jcuda-mvn-repo

https://raw.github.com/MysterionRise/mavenized-jcuda/mvn-repo/

true
always


   
  
org.mystic
mavenized-jcuda
0.7.5b
provided
  

This is added to the PR 165 in the commit: 
https://github.com/apache/incubator-systemml/pull/165/commits/1a52bd8908773593fb8b240b38ed0e81a33ebb3f

> Host jcu*.jar libraries on mvn repo
> ---
>
> Key: SYSTEMML-704
> URL: https://issues.apache.org/jira/browse/SYSTEMML-704
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>Priority: Minor
>
> The PR https://github.com/apache/incubator-systemml/pull/165/ uses system 
> scope for jcu*.jar as they are not published on mvn central. Since we are 
> planning to include them into SystemML, it would be good to host them into a 
> repo we maintain and have provided scope instead. If for LICENSE or some 
> other reasons, we are not able to host them, I am fine with rejecting this 
> issue too. From jcuda's website "JCuda is published under the terms of the 
> MIT/X11 License".
> The current version depends on jcu*-0.7.5b.jar (except jcudnn-0.7.5.jar). The 
> jars are available for download from 
> http://www.jcuda.org/downloads/downloads.html. The source is available at 
> https://github.com/jcuda
> [~nakul02] [~deron] [~luciano resende]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-762) Fix the bug that causes local MR-Jobs when running in non-singlenode mode on MNIST data for Lenet script

2016-06-22 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345582#comment-15345582
 ] 

Niketan Pansare commented on SYSTEMML-762:
--

[~mwdus...@us.ibm.com] Local MR jobs were created for matrix multiplication 
(mapmm) because n couldnot be computed until nrow was executed (Note: conv2d = 
reshape_col(filter %*% im2col(image)))

{code:none}
end = beg + BATCH_SIZE - 1
if(end > num_images) end = num_images
#pulling out the batch
Xb = X[beg:end,]
n = nrow(Xb)
H1_activations = conv2d(Xb, conv_layer1_wts, padding=[2,2], stride=[1,1], 
input_shape=[n,1,img_height,img_width], 
filter_shape=[n1,1,kernel_height,kernel_width])
{code}

If you change the code to following, you will no longer see local MR jobs:
{code:none}
 end = beg + BATCH_SIZE - 1
if(end > num_images) {
 beg = 1
 end = beg + BATCH_SIZE - 1
 #end = num_images
}

#pulling out the batch
Xb = X[beg:end,] # Note: you will be missing the last few records
n = BATCH_SIZE
{code}

I have added a debug information to identify situations like this in the 
commit: 
https://github.com/apache/incubator-systemml/commit/873229f30527c8bfe6dc9399f53fd9f6dbb5b10e

Since we recompile at the loop-level, I am not sure we can fix the former case.

> Fix the bug that causes local MR-Jobs when running in non-singlenode mode on 
> MNIST data for Lenet script
> 
>
> Key: SYSTEMML-762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-762
> Project: SystemML
>  Issue Type: Bug
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
> Attachments: log.txt, log2.txt, log3.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-762) Fix the bug that causes local MR-Jobs when running in non-single node on MNIST data for Lenet script

2016-06-15 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332610#comment-15332610
 ] 

Niketan Pansare commented on SYSTEMML-762:
--

[~mwdus...@us.ibm.com] [~mboehm7] [~reinwald] Will deliver the fix soon.

> Fix the bug that causes local MR-Jobs when running in non-single node on 
> MNIST data for Lenet script
> 
>
> Key: SYSTEMML-762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-762
> Project: SystemML
>  Issue Type: Bug
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-762) Fix the bug that causes local MR-Jobs when running in non-single node on MNIST data for Lenet script

2016-06-15 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-762:


 Summary: Fix the bug that causes local MR-Jobs when running in 
non-single node on MNIST data for Lenet script
 Key: SYSTEMML-762
 URL: https://issues.apache.org/jira/browse/SYSTEMML-762
 Project: SystemML
  Issue Type: Bug
Reporter: Niketan Pansare






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (SYSTEMML-762) Fix the bug that causes local MR-Jobs when running in non-single node on MNIST data for Lenet script

2016-06-15 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare reassigned SYSTEMML-762:


Assignee: Niketan Pansare

> Fix the bug that causes local MR-Jobs when running in non-single node on 
> MNIST data for Lenet script
> 
>
> Key: SYSTEMML-762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-762
> Project: SystemML
>  Issue Type: Bug
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-762) Fix the bug that causes local MR-Jobs when running in non-singlenode mode on MNIST data for Lenet script

2016-06-15 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-762:
-
Summary: Fix the bug that causes local MR-Jobs when running in 
non-singlenode mode on MNIST data for Lenet script  (was: Fix the bug that 
causes local MR-Jobs when running in non-single mode on MNIST data for Lenet 
script)

> Fix the bug that causes local MR-Jobs when running in non-singlenode mode on 
> MNIST data for Lenet script
> 
>
> Key: SYSTEMML-762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-762
> Project: SystemML
>  Issue Type: Bug
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-762) Fix the bug that causes local MR-Jobs when running in non-singlenode mode on MNIST data for Lenet script

2016-06-24 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348913#comment-15348913
 ] 

Niketan Pansare commented on SYSTEMML-762:
--

Yes, this is expected as no metadata is available for CSV. Please note that no 
MR jobs are created for im2col or ba+*.  Also, out of 41.459 seconds of im2col, 
33.114 seconds is spent in Cache release (which most likely occurs during 
validation and/or testing). I am working on optimized conv2d* operators that 
will help avoid this cost and the improvement should be in soon :)

Also, as a related sidenote, we should provide additional converter utils (if 
necessary) and encourage users to test their deep learning scripts using binary 
format for more accurate profiling.

> Fix the bug that causes local MR-Jobs when running in non-singlenode mode on 
> MNIST data for Lenet script
> 
>
> Key: SYSTEMML-762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-762
> Project: SystemML
>  Issue Type: Bug
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
> Attachments: log.txt, log2.txt, log3.txt, log4.txt, log5.txt, 
> log6.txt, log7.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (SYSTEMML-762) Fix the bug that causes local MR-Jobs when running in non-singlenode mode on MNIST data for Lenet script

2016-06-24 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348913#comment-15348913
 ] 

Niketan Pansare edited comment on SYSTEMML-762 at 6/24/16 11:53 PM:


Yes, this is expected as no metadata is available for CSV. Please note that no 
MR jobs are created for im2col or ba+.  Also, out of 41.459 seconds of im2col, 
33.114 seconds is spent in Cache release (which most likely occurs during 
validation and/or testing). I am working on optimized conv2d* operators that 
will help avoid this cost and the improvement should be in soon :)

Also, as a related sidenote, we should provide additional converter utils (if 
necessary) and encourage users to test their deep learning scripts using binary 
format for more accurate profiling.


was (Author: niketanpansare):
Yes, this is expected as no metadata is available for CSV. Please note that no 
MR jobs are created for im2col or ba+*.  Also, out of 41.459 seconds of im2col, 
33.114 seconds is spent in Cache release (which most likely occurs during 
validation and/or testing). I am working on optimized conv2d* operators that 
will help avoid this cost and the improvement should be in soon :)

Also, as a related sidenote, we should provide additional converter utils (if 
necessary) and encourage users to test their deep learning scripts using binary 
format for more accurate profiling.

> Fix the bug that causes local MR-Jobs when running in non-singlenode mode on 
> MNIST data for Lenet script
> 
>
> Key: SYSTEMML-762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-762
> Project: SystemML
>  Issue Type: Bug
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
> Attachments: log.txt, log2.txt, log3.txt, log4.txt, log5.txt, 
> log6.txt, log7.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-762) Fix the bug that causes local MR-Jobs when running in non-singlenode mode on MNIST data for Lenet script

2016-06-15 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333009#comment-15333009
 ] 

Niketan Pansare commented on SYSTEMML-762:
--

Fixed by the commit: 
https://github.com/apache/incubator-systemml/commit/55c8ee7d6e3c1fcdf5c2583eee3f0a287d4baac9

> Fix the bug that causes local MR-Jobs when running in non-singlenode mode on 
> MNIST data for Lenet script
> 
>
> Key: SYSTEMML-762
> URL: https://issues.apache.org/jira/browse/SYSTEMML-762
> Project: SystemML
>  Issue Type: Bug
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-590) Assume Parent's Namespace for Nested UDF calls.

2016-03-23 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209042#comment-15209042
 ] 

Niketan Pansare commented on SYSTEMML-590:
--

Hi [~mboehm7], I have not looked into this yet.

[~mwdus...@us.ibm.com], I too agree this is important !

> Assume Parent's Namespace for Nested UDF calls.
> ---
>
> Key: SYSTEMML-590
> URL: https://issues.apache.org/jira/browse/SYSTEMML-590
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Mike Dusenberry
>
> Currently, if a UDF body involves calling another UDF, the default global 
> namespace is assumed, unless a namespace is explicitly indicated.  This 
> becomes a problem when a file contains UDFs, and is then sourced from another 
> script.
> Imagine a file {{funcs.dml}} as follows:
> {code}
> f = function(double x, int a) return (double ans) {
>   x2 = g(x)
>   ans = a * x2
> }
> g = function(double x) return (double ans) {
>   ans = x * x
> }
> {code}
> Then, let's try to call {{f}}:
> {code}
> script = """
> source ("funcs.dml") as funcs
> ans = funcs::f(3, 1)
> print(ans)
> """
> ml.reset()
> ml.executeScript(script)
> {code}
> This results in an error since {{f}} is in the {{funcs}} namespace, but the 
> call to {{g}} assumes {{g}} is still in the default namespace.  Clearly, the 
> user intends to the use the {{g}} that is located in the same file.
> Currently, we would need to adjust {{funcs.dml}} as follows to explicitly 
> assume that {{f}} and {{g}} are in a {{funcs}} namespace:
> {code}
> f = function(double x, int a) return (double ans) {
>   x2 = funcs::g(x)
>   ans = a * x2
> }
> g = function(double x) return (double ans) {
>   ans = x * x
> }
> {code}
> Instead, it would be better to simply first look for {{g}} in its parent's 
> namespace.  In this case, the "parent" would be the function {{f}}, and the 
> namespace we have selected is {{funcs}}, although that choice would be left 
> up to the end-user.  Then, namespace assumptions would not be necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-538) Decoding ID columns to string

2016-03-06 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182506#comment-15182506
 ] 

Niketan Pansare commented on SYSTEMML-538:
--

It is a useful feature to have. This brings an interesting question: Should we 
generalize frame or implement additional function for this operation ? 
transform(frame) => matrix
aboveMentionedOp(matrix) => frame or aboveMentionedOp(frame) => frame

> Decoding ID columns to string
> -
>
> Key: SYSTEMML-538
> URL: https://issues.apache.org/jira/browse/SYSTEMML-538
> Project: SystemML
>  Issue Type: New Feature
>  Components: APIs
>Reporter: Prithviraj Sen
>
> Currently, the transform operation allows one to consume a frame containing 
> strings and replace the strings with integer IDs. However, there is no 
> operation that provides the inverse of this functionality. In other words, it 
> would be nice to have an operation that allows one to use a recode map 
> produced by a previously invoked transform operation and replace the integer 
> IDs with the corresponding strings provided in the recode map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-579) Packing our algorithm scripts into JAR

2016-04-21 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253104#comment-15253104
 ] 

Niketan Pansare commented on SYSTEMML-579:
--

Added script in 
https://github.com/apache/incubator-systemml/commit/4d3987e60b36695764d7bf1f73ac0cd09178c647

> Packing our algorithm scripts into JAR
> --
>
> Key: SYSTEMML-579
> URL: https://issues.apache.org/jira/browse/SYSTEMML-579
> Project: SystemML
>  Issue Type: Task
>  Components: Algorithms, APIs
>Affects Versions: SystemML 0.9
>Reporter: Tommy Yu
>Priority: Minor
>
> Packing our algorithm to JAR without look into the user's filesystem.
> We should look into the possibility of packing our algorithm scripts into the 
> JAR during build time as perhaps a Maven "resource" that would be available 
> to the Java process without needing to look into the user's filesystem.  This 
> should help with the Scala API introduced in SYSTEMML-580.  One issue I see 
> with the current approach is if a user wishes to attach the SystemML JAR to a 
> cloud notebook (such as Databricks Cloud) in which an environment variable 
> may not be able to be set, the API will not function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-704) Host jcu*.jar libraries on mvn repo

2016-05-20 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-704:


 Summary: Host jcu*.jar libraries on mvn repo
 Key: SYSTEMML-704
 URL: https://issues.apache.org/jira/browse/SYSTEMML-704
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare
Assignee: Alan Chin
Priority: Minor


The PR https://github.com/apache/incubator-systemml/pull/165/ uses system scope 
for jcu*.jar as they are not published on mvn central. Since we are planning to 
include them into SystemML, it would be good to host them into a repo we 
maintain and have provided scope instead. If for LICENSE or some other reasons, 
we are not able to host them, I am fine with rejecting this issue too. From 
jcuda's website "JCuda is published under the terms of the MIT/X11 License".

The current version depends on jcu*-0.7.5b.jar (except jcudnn-0.7.5.jar). The 
jars are available for download from 
http://www.jcuda.org/downloads/downloads.html. The source is available at 
https://github.com/jcuda

[~nakul02] [~deron] [~luciano resende]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (SYSTEMML-703) Install CUDA along with CuDNN on Jenkins

2016-05-20 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare reassigned SYSTEMML-703:


Assignee: Alan Chin

> Install CUDA along with CuDNN on Jenkins
> 
>
> Key: SYSTEMML-703
> URL: https://issues.apache.org/jira/browse/SYSTEMML-703
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>Assignee: Alan Chin
>Priority: Minor
>
> Please install:
> 1. CUDA 7.5
> 2. CuDNN v4 from 
> http://developer.download.nvidia.com/compute/redist/cudnn/v4/cudnn-7.0-win-x64-v4.0-prod.zip
> 3. Download JCuda binaries version 0.7.5b and JCudnn version 0.7.5. Link: 
> http://www.jcuda.org/downloads/downloads.html ... The library path for 
> test-cases is set in AutomatedTestbase class: 
> https://github.com/apache/incubator-systemml/pull/165/files#diff-bcda036e4c3ff62cb2648acbbd19f61aR113
> Once these changes are in (and once GPU backend in feature-complete) we can 
> set TEST_GPU flag in AutomatedTestbase class to true. Since it will take few 
> weeks to make GPU backend feature-complete, this is a low-priority task
> [~nakul02] [~akchin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-702) Implement GPU sparse matrix multiplication

2016-05-20 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-702:
-
Assignee: (was: Nakul Jindal)

> Implement GPU sparse matrix multiplication
> --
>
> Key: SYSTEMML-702
> URL: https://issues.apache.org/jira/browse/SYSTEMML-702
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>
> Please add sparse matrix multiplication to LibMatrixCUDA library as a static 
> method 
> (https://github.com/apache/incubator-systemml/pull/165/files#diff-3299e54b4019b2ee294c9b15acde6885R205)
>  to be in consistent with other libraries in SystemML. This task is dependent 
> on https://issues.apache.org/jira/browse/SYSTEMML-701
> For initial testing, we can use following testcase: 
> https://github.com/apache/incubator-systemml/pull/165/files#diff-04c75c8061cd63e05bb7d8f8f2e89285R82
>  with TEST_GPU flag turned on 
> (https://github.com/apache/incubator-systemml/pull/165/files#diff-bcda036e4c3ff62cb2648acbbd19f61aR90).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-703) Install CUDA along with CuDNN on Jenkins

2016-05-20 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-703:
-
Assignee: (was: Alan Chin)

> Install CUDA along with CuDNN on Jenkins
> 
>
> Key: SYSTEMML-703
> URL: https://issues.apache.org/jira/browse/SYSTEMML-703
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>Priority: Minor
>
> Please install:
> 1. CUDA 7.5
> 2. CuDNN v4 from 
> http://developer.download.nvidia.com/compute/redist/cudnn/v4/cudnn-7.0-win-x64-v4.0-prod.zip
> 3. Download JCuda binaries version 0.7.5b and JCudnn version 0.7.5. Link: 
> http://www.jcuda.org/downloads/downloads.html ... The library path for 
> test-cases is set in AutomatedTestbase class: 
> https://github.com/apache/incubator-systemml/pull/165/files#diff-bcda036e4c3ff62cb2648acbbd19f61aR113
> Once these changes are in (and once GPU backend in feature-complete) we can 
> set TEST_GPU flag in AutomatedTestbase class to true. Since it will take few 
> weeks to make GPU backend feature-complete, this is a low-priority task
> [~nakul02] [~akchin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-704) Host jcu*.jar libraries on mvn repo

2016-05-20 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294196#comment-15294196
 ] 

Niketan Pansare commented on SYSTEMML-704:
--

I agree. I assume by artifact you mean only the jars, not the system-level 
dependency (such as .dll/.so). We can ask users to either install JCuda dll 
themselves or through mavenized jcuda when executing under gpu backend.

> Host jcu*.jar libraries on mvn repo
> ---
>
> Key: SYSTEMML-704
> URL: https://issues.apache.org/jira/browse/SYSTEMML-704
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>Assignee: Alan Chin
>Priority: Minor
>
> The PR https://github.com/apache/incubator-systemml/pull/165/ uses system 
> scope for jcu*.jar as they are not published on mvn central. Since we are 
> planning to include them into SystemML, it would be good to host them into a 
> repo we maintain and have provided scope instead. If for LICENSE or some 
> other reasons, we are not able to host them, I am fine with rejecting this 
> issue too. From jcuda's website "JCuda is published under the terms of the 
> MIT/X11 License".
> The current version depends on jcu*-0.7.5b.jar (except jcudnn-0.7.5.jar). The 
> jars are available for download from 
> http://www.jcuda.org/downloads/downloads.html. The source is available at 
> https://github.com/jcuda
> [~nakul02] [~deron] [~luciano resende]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-701) Implement functionality to transfer CP sparse matrixblock to GPU (and back)

2016-05-20 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-701:
-
Assignee: (was: Nakul Jindal)

> Implement functionality to transfer CP sparse matrixblock to GPU (and back)
> ---
>
> Key: SYSTEMML-701
> URL: https://issues.apache.org/jira/browse/SYSTEMML-701
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>
> This task involves adding functionality to transfer CP sparse matrix to GPU 
> (and back):
> 1. Copy data from CP to GPU: 
> https://github.com/apache/incubator-systemml/pull/165/files#diff-47d92e7e77a0c3762d9a98ac51adf449R63
> 2. Copy data from GPU to CP: 
> https://github.com/apache/incubator-systemml/pull/165/files#diff-47d92e7e77a0c3762d9a98ac51adf449R81
> 3. Other utility functions: 
> https://github.com/apache/incubator-systemml/pull/165/files#diff-47d92e7e77a0c3762d9a98ac51adf449R110
> https://github.com/apache/incubator-systemml/pull/165/files#diff-47d92e7e77a0c3762d9a98ac51adf449R46



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-703) Install CUDA along with CuDNN on Jenkins

2016-05-20 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294505#comment-15294505
 ] 

Niketan Pansare commented on SYSTEMML-703:
--

Thanks [~akchin] :)

> Install CUDA along with CuDNN on Jenkins
> 
>
> Key: SYSTEMML-703
> URL: https://issues.apache.org/jira/browse/SYSTEMML-703
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>Priority: Minor
>
> Please install:
> 1. CUDA 7.5
> 2. CuDNN v4 from 
> http://developer.download.nvidia.com/compute/redist/cudnn/v4/cudnn-7.0-win-x64-v4.0-prod.zip
> 3. Download JCuda binaries version 0.7.5b and JCudnn version 0.7.5. Link: 
> http://www.jcuda.org/downloads/downloads.html ... The library path for 
> test-cases is set in AutomatedTestbase class: 
> https://github.com/apache/incubator-systemml/pull/165/files#diff-bcda036e4c3ff62cb2648acbbd19f61aR113
> Once these changes are in (and once GPU backend in feature-complete) we can 
> set TEST_GPU flag in AutomatedTestbase class to true. Since it will take few 
> weeks to make GPU backend feature-complete, this is a low-priority task
> [~nakul02] [~akchin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-593) MLContext Redesign

2016-05-11 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280832#comment-15280832
 ] 

Niketan Pansare commented on SYSTEMML-593:
--

Thanks [~deron] for creating the design document. It improves the usability of 
MLContext a lot.

I like the common interface "in" that allows users to pass both data as well as 
command-line arguments. I also like that we use $prefix for commandline 
variables in the "in" method. Thereby, in(String, RDD/DataFrame) maps to 
registerInput and in(String, boolean/double/float/int/string) maps to 
command-line arguments. I also like that this design avoids the need to cast 
boolean/double/float/int into String.

I also like the Script abstraction as it avoids overloaded execute methods (for 
example: PyDML, DML, ...).

Few thoughts/suggestions:
1. Current MLContext allows the users to pass RDD/DataFrame to the script using 
"registerInput". In the proposed document, we pass the RDD/DataFrame through 
".in(...)". In addition, registerInput method allows for passing the format and 
the meta-data information. In some cases, the format is required but meta-data 
is optional and in some other case both are required. We need to add 
appropriate guards in our new MLContext.
For example: we should not support `script.in("A", sc.textFile("m.csv"))` as 
RDD can refer to either "csv" or "text" format. Also, `script.in("A", 
sc.textFile("m.text"), "text")` should throw an error stating meta-data is 
required.

2.  The DML language semantics should be respected. For example: if script has 
following line `X = read($fileX)`, then providing .in("X", ...), but not 
.in("$fileX", ...) should throw an error.

3. Please remember that DataFrame is unordered collection and we return matrix 
which is an ordered structure. So, please remember to return DataFrame with an 
"ID" column as we do in our current MLOutput class, else we are potentially 
breaking the contract. 

4. Please support following different types of DataFrame:
- With an ID column and one DF column of type double for every column of 
matrix. This is safe way for user to pass a DataFrame to SystemML and still be 
able to do pre-processing.
- Without an ID column, but with one DF column of type double for every column 
of matrix.  This is potentially unsafe and user ensures that rows are sorted.
- With an ID column and DF with a column of Vector DataType. This is often used 
in MLPipeline wrappers.
- Without an ID column, but with DF with a column of Vector DataType. This is 
often used in MLPipeline wrappers.

5. With exception of DataFrame, all the RDDs that we pass map to the format we 
support in read(): RDD/JavaRDD/JavaPairRDD/... for csv and text format + RDD/JavaPairRDD for 
binaryblock. For non-read formats, we implement RDDConverterUtils.

Please support all the read-formats either directly or via an abstraction (for 
example: proposed BinaryBlockMatrix which is wrapper of JavaPairRDD and 
MC). In particular, users might prefer to stick with BinaryBlockMatrix if they 
want to pass it to another DML script but might want DataFrame if they want to 
apply SQL. Why ? For extremely wide matrices, DataFrame is extremely 
inefficient format. 

An alternate suggestion: You can only support registering one type of 
DataFrame/RDD and have many constructors/factory methods for them. For example: 
Please see org.apache.sysml.api.MLMatrix (for reference implementation of 
BinaryBlockMatrix) which essentially is a two column DataFrame that supports 
simple Matrix algebra. It also fits well into Spark Datasource API: 
ml.read(sqlContext, "W_small.mtx", "binary").

[~reinwald] [~mboehm7] [~mwdus...@us.ibm.com]

> MLContext Redesign
> --
>
> Key: SYSTEMML-593
> URL: https://issues.apache.org/jira/browse/SYSTEMML-593
> Project: SystemML
>  Issue Type: Improvement
>  Components: APIs
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
> Attachments: Design Document - MLContext API Redesign.pdf
>
>
> This JIRA proposes a redesign of the Java MLContext API with several goals:
> • Simplify the user experience
> • Encapsulate primary entities using object-oriented concepts
> • Make API extensible for external users
> • Make API extensible for SystemML developers
> • Locate all user-interaction classes, interfaces, etc under a single API 
> package
> • Extensive Javadocs for all classes in the API
> • Potentially fold JMLC API into MLContext so as to have a single 
> programmatic API



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-685) Add documentation to map the NumPy tensor calls to DML expressions.

2016-05-12 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-685:


 Summary: Add documentation to map the NumPy tensor calls to DML 
expressions.
 Key: SYSTEMML-685
 URL: https://issues.apache.org/jira/browse/SYSTEMML-685
 Project: SystemML
  Issue Type: Documentation
Reporter: Niketan Pansare






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-686) Implement Spark instructions for convolution and pooling functions

2016-05-12 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-686:


 Summary: Implement Spark instructions for convolution and pooling 
functions
 Key: SYSTEMML-686
 URL: https://issues.apache.org/jira/browse/SYSTEMML-686
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-687) Optimize CP convolution/pooling instructions for sparse inputs

2016-05-12 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-687:


 Summary: Optimize CP convolution/pooling instructions for sparse 
inputs
 Key: SYSTEMML-687
 URL: https://issues.apache.org/jira/browse/SYSTEMML-687
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-695) Incorrect rand normal w/ fused scalar operation

2016-05-14 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283601#comment-15283601
 ] 

Niketan Pansare commented on SYSTEMML-695:
--

Thanks Matthias

> Incorrect rand normal w/ fused scalar operation
> ---
>
> Key: SYSTEMML-695
> URL: https://issues.apache.org/jira/browse/SYSTEMML-695
> Project: SystemML
>  Issue Type: Bug
>  Components: Compiler
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-692) Automatic DML generation using the model specification of existing Deep Learning libraries such as Caffe

2016-05-13 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-692:


 Summary: Automatic DML generation using the model specification of 
existing Deep Learning libraries such as Caffe
 Key: SYSTEMML-692
 URL: https://issues.apache.org/jira/browse/SYSTEMML-692
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (SYSTEMML-742) Implement MMTSJGPUInstruction instruction for GPU backend along with corresponding Hops/Lops

2016-08-11 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare closed SYSTEMML-742.

   Resolution: Fixed
Fix Version/s: SystemML 0.11

> Implement MMTSJGPUInstruction instruction for GPU backend along with 
> corresponding Hops/Lops
> 
>
> Key: SYSTEMML-742
> URL: https://issues.apache.org/jira/browse/SYSTEMML-742
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>Assignee: Tanuj Kr Aasawat
> Fix For: SystemML 0.11
>
>
> 1. Add an new MMTSJGPUInstruction instruction under package 
> org.apache.sysml.runtime.instructions.gpu.
> 2. Add appropriate hooks at runtime parser. For example:
> String2GPUInstructionType.put( "tsmm"   , CPINSTRUCTION_TYPE.MMTSJ);
> 3. Add appropriate hooks at Hops/Lops.
> 4. Add a new function in org.apache.sysml.runtime.matrix.data.LibMatrixCUDA 
> library to perform tsmm: transposeSelfMatrixMultOperations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-742) Implement MMTSJGPUInstruction instruction for GPU backend along with corresponding Hops/Lops

2016-08-11 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-742:
-
Assignee: Tanuj Kr Aasawat

> Implement MMTSJGPUInstruction instruction for GPU backend along with 
> corresponding Hops/Lops
> 
>
> Key: SYSTEMML-742
> URL: https://issues.apache.org/jira/browse/SYSTEMML-742
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>Assignee: Tanuj Kr Aasawat
> Fix For: SystemML 0.11
>
>
> 1. Add an new MMTSJGPUInstruction instruction under package 
> org.apache.sysml.runtime.instructions.gpu.
> 2. Add appropriate hooks at runtime parser. For example:
> String2GPUInstructionType.put( "tsmm"   , CPINSTRUCTION_TYPE.MMTSJ);
> 3. Add appropriate hooks at Hops/Lops.
> 4. Add a new function in org.apache.sysml.runtime.matrix.data.LibMatrixCUDA 
> library to perform tsmm: transposeSelfMatrixMultOperations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (SYSTEMML-727) Add bufferpool integration logic to CUDA backend

2016-08-11 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare closed SYSTEMML-727.

   Resolution: Fixed
Fix Version/s: SystemML 0.11

> Add bufferpool integration logic to CUDA backend
> 
>
> Key: SYSTEMML-727
> URL: https://issues.apache.org/jira/browse/SYSTEMML-727
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
> Fix For: SystemML 0.11
>
>
> This work is done in the PR: 
> https://github.com/apache/incubator-systemml/pull/165



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-446) Exploit GPU BLAS libraries (integration)

2016-08-11 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-446:
-
Assignee: Tanuj Kr Aasawat

> Exploit GPU BLAS libraries (integration)
> 
>
> Key: SYSTEMML-446
> URL: https://issues.apache.org/jira/browse/SYSTEMML-446
> Project: SystemML
>  Issue Type: Task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Tanuj Kr Aasawat
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-824) Improve the performance of binary cell-wise operations

2016-07-20 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-824:


 Summary: Improve the performance of binary cell-wise operations
 Key: SYSTEMML-824
 URL: https://issues.apache.org/jira/browse/SYSTEMML-824
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare


The cellwise (matrix-matrix as well as matrix-scalar) operations take 
significant amount of time while training Lenet. Here are few ways to improve 
the performance of cell-wise operations:
1. Inject in-place updates [1] (saving on zero-ing out the matrix).
2. Fused cell-wise operations (as an example, recently added axpy operations). 
3. Parallelize cellwise operations (initial investigation need to be conducted 
before proceeding in this direction especially in sparse case: 
https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/runtime/matrix/data/LibMatrixBincell.java#L274).

[~nakul02] [~mwdus...@us.ibm.com] [~prithvi_r_s] [~mboehm7] [~reinwald]

Reference:
[1] http://www.diku.dk/hjemmesider/ansatte/torbenm/ICD/Register.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-824) Improve the performance of binary cell-wise operations

2016-07-20 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-824:
-
Description: 
The cellwise (matrix-matrix as well as matrix-scalar) operations take 
significant amount of time while training Lenet. Here are few ways to improve 
the performance of cell-wise operations:
1. Inject in-place updates [1] (saving on zero-ing out the matrix).
2. Fused cell-wise operations (as an example, recently added axpy operations: 
https://github.com/apache/incubator-systemml/commit/b584aecf6b3a1eb96ff83b78cc3ad7c7c6d15baa).
 
3. Parallelize cellwise operations (initial investigation need to be conducted 
before proceeding in this direction especially in sparse case: 
https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/runtime/matrix/data/LibMatrixBincell.java#L274).

[~nakul02] [~mwdus...@us.ibm.com] [~prithvi_r_s] [~mboehm7] [~reinwald]

Reference:
[1] http://www.diku.dk/hjemmesider/ansatte/torbenm/ICD/Register.pdf

  was:
The cellwise (matrix-matrix as well as matrix-scalar) operations take 
significant amount of time while training Lenet. Here are few ways to improve 
the performance of cell-wise operations:
1. Inject in-place updates [1] (saving on zero-ing out the matrix).
2. Fused cell-wise operations (as an example, recently added axpy operations). 
3. Parallelize cellwise operations (initial investigation need to be conducted 
before proceeding in this direction especially in sparse case: 
https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/runtime/matrix/data/LibMatrixBincell.java#L274).

[~nakul02] [~mwdus...@us.ibm.com] [~prithvi_r_s] [~mboehm7] [~reinwald]

Reference:
[1] http://www.diku.dk/hjemmesider/ansatte/torbenm/ICD/Register.pdf


> Improve the performance of binary cell-wise operations
> --
>
> Key: SYSTEMML-824
> URL: https://issues.apache.org/jira/browse/SYSTEMML-824
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>
> The cellwise (matrix-matrix as well as matrix-scalar) operations take 
> significant amount of time while training Lenet. Here are few ways to improve 
> the performance of cell-wise operations:
> 1. Inject in-place updates [1] (saving on zero-ing out the matrix).
> 2. Fused cell-wise operations (as an example, recently added axpy operations: 
> https://github.com/apache/incubator-systemml/commit/b584aecf6b3a1eb96ff83b78cc3ad7c7c6d15baa).
>  
> 3. Parallelize cellwise operations (initial investigation need to be 
> conducted before proceeding in this direction especially in sparse case: 
> https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/runtime/matrix/data/LibMatrixBincell.java#L274).
> [~nakul02] [~mwdus...@us.ibm.com] [~prithvi_r_s] [~mboehm7] [~reinwald]
> Reference:
> [1] http://www.diku.dk/hjemmesider/ansatte/torbenm/ICD/Register.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-845) Compare Performance of LeNet Scripts With & Without Using SystemML-NN

2016-08-05 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410063#comment-15410063
 ] 

Niketan Pansare commented on SYSTEMML-845:
--

[~mwdus...@us.ibm.com] The performance of 
https://issues.apache.org/jira/secure/attachment/12822211/lenet-train-spark-explain.log
 is as expected. I am planning to deliver a commit with performance improvement 
to maxpool_bwd by next week.

Regarding 
https://issues.apache.org/jira/secure/attachment/12822212/mnist_lenet-train-spark-explain.log,
 am I correct in assuming that both scripts have identical DML except that 
mnist_lenet-train has UDF. Can you please run the scripts again with `-explain 
recompile_hops` ?

> Compare Performance of LeNet Scripts With & Without Using SystemML-NN
> -
>
> Key: SYSTEMML-845
> URL: https://issues.apache.org/jira/browse/SYSTEMML-845
> Project: SystemML
>  Issue Type: Improvement
>Reporter: Mike Dusenberry
> Attachments: convert.dml, lenet-train-spark-explain.log, 
> log08.03.16-1470268602.txt, mnist_lenet-train-spark-explain.log, perf.sh, 
> run.sh
>
>
> This JIRA issue tracks the comparison of the performance of the LeNet scripts 
> with & without using SystemML-NN.  The goal is that they should have equal 
> performance in terms of both accuracy and time.  Any difference will be 
> indicate areas of engine improvement.
> Scripts:
> * [mnist_lenet-train.dml | 
> https://github.com/apache/incubator-systemml/blob/master/scripts/staging/SystemML-NN/examples/mnist_lenet-train.dml]
>  - LeNet script that *does* use the SystemML-NN library.
> * [lenet-train.dml | 
> https://github.com/apache/incubator-systemml/blob/master/scripts/staging/lenet-train.dml]
>  - LeNet script that *does not* use the SystemML-NN library.
> To fully reproduce, I basically created a directory, placed the two attached 
> bash scripts in it, grabbed a copy of the NN library and placed it into the 
> directory, ran the examples/get_mnist_data.sh script from the library to get 
> the data (placed into examples/data), then used the attached convert.dml to 
> create binary copies of the data for both scripts, then ran run.sh. Also, I 
> copied examples/data to the base directory as well.  Adjust the {{EXEC}} and 
> related variables in {{perf.sh}} to switch between standalone, Spark, memory 
> sizes, explain, stats, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-854) Change Python MLContext to support new MLContext

2016-08-09 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-854:


 Summary: Change Python MLContext to support new MLContext
 Key: SYSTEMML-854
 URL: https://issues.apache.org/jira/browse/SYSTEMML-854
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-855) Add a "Get Started" tutorial for Python users

2016-08-09 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-855:


 Summary: Add a "Get Started" tutorial for Python users
 Key: SYSTEMML-855
 URL: https://issues.apache.org/jira/browse/SYSTEMML-855
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare


As an example, this tutorial could have following sections:
1. Steps to start Python shell (or cloud service like datascientistworkbench) 
with SystemML support:
wget 
https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar

2. Give context for one of the algorithm: For example: Linear regression. We 
can borrow the technical detail from 
http://apache.github.io/incubator-systemml/algorithms-regression.html#description

3. Explain steps to download data we will use and how to implement Linear 
regression DS using embedded Python DSL:
https://github.com/apache/incubator-systemml/pull/197

```
import numpy as np
from sklearn import datasets
# Load the diabetes dataset
diabetes = datasets.load_diabetes()
# Use only one feature
diabetes_X = diabetes.data[:, np.newaxis, 2]
# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]
```

4. Explain how to use our algorithm instead:
http://apache.github.io/incubator-systemml/algorithms-regression.html#examples

5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's embedded 
DSL or SystemML's mllearn, increase the data size. For example: use twitter 
feed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-855) Add a "Get Started" tutorial for Python users

2016-08-09 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-855:
-
Description: 
As an example, this tutorial could have following sections:
1. Steps to start Python shell (or cloud service like datascientistworkbench) 
with SystemML support:
wget 
https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar

2. Give context for one of the algorithm: For example: Linear regression. We 
can borrow the technical detail from 
http://apache.github.io/incubator-systemml/algorithms-regression.html#description

3. Explain steps to download data we will use and how to implement Linear 
regression DS using embedded Python DSL:
https://github.com/apache/incubator-systemml/pull/197


import numpy as np
from sklearn import datasets
# Load the diabetes dataset
diabetes = datasets.load_diabetes()
# Use only one feature
diabetes_X = diabetes.data[:, np.newaxis, 2]
# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]


4. Explain how to use our algorithm instead:
http://apache.github.io/incubator-systemml/algorithms-regression.html#examples

5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's embedded 
DSL or SystemML's mllearn, increase the data size. For example: use twitter 
feed.

  was:
As an example, this tutorial could have following sections:
1. Steps to start Python shell (or cloud service like datascientistworkbench) 
with SystemML support:
wget 
https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar

2. Give context for one of the algorithm: For example: Linear regression. We 
can borrow the technical detail from 
http://apache.github.io/incubator-systemml/algorithms-regression.html#description

3. Explain steps to download data we will use and how to implement Linear 
regression DS using embedded Python DSL:
https://github.com/apache/incubator-systemml/pull/197

```
import numpy as np
from sklearn import datasets
# Load the diabetes dataset
diabetes = datasets.load_diabetes()
# Use only one feature
diabetes_X = diabetes.data[:, np.newaxis, 2]
# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]
```

4. Explain how to use our algorithm instead:
http://apache.github.io/incubator-systemml/algorithms-regression.html#examples

5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's embedded 
DSL or SystemML's mllearn, increase the data size. For example: use twitter 
feed.


> Add a "Get Started" tutorial for Python users
> -
>
> Key: SYSTEMML-855
> URL: https://issues.apache.org/jira/browse/SYSTEMML-855
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>
> As an example, this tutorial could have following sections:
> 1. Steps to start Python shell (or cloud service like datascientistworkbench) 
> with SystemML support:
> wget 
> https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
> wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar
> 2. Give context for one of the algorithm: For example: Linear regression. We 
> can borrow the technical detail from 
> http://apache.github.io/incubator-systemml/algorithms-regression.html#description
> 3. Explain steps to download data we will use and how to implement Linear 
> regression DS using embedded Python DSL:
> https://github.com/apache/incubator-systemml/pull/197
> 
> import numpy as np
> from sklearn import datasets
> # Load the diabetes dataset
> diabetes = datasets.load_diabetes()
> # Use only one feature
> diabetes_X = diabetes.data[:, np.newaxis, 2]
> # Split the data into training/testing sets
> diabetes_X_train = diabetes_X[:-20]
> diabetes_X_test = diabetes_X[-20:]
> # Split the targets into training/testing sets
> diabetes_y_train = diabetes.target[:-20]
> diabetes_y_test = diabetes.target[-20:]
> 
> 4. Explain how to use our algorithm instead:
> http://apache.github.io/incubator-systemml/algorithms-regression.html#examples
> 5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's 
> embedded DSL or SystemML's mllearn, increase the data size. For example: use 
> twitter feed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-855) Add a "Get Started" tutorial for Python users

2016-08-09 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-855:
-
Description: 
As an example, this tutorial could have following sections:
1. Steps to start Python shell (or cloud service like datascientistworkbench) 
with SystemML support:
wget 
https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar

OR

Use pip installer.

2. Give context for one of the algorithm: For example: Linear regression. We 
can borrow the technical detail from 
http://apache.github.io/incubator-systemml/algorithms-regression.html#description

3. Explain steps to download data we will use and how to implement Linear 
regression DS using embedded Python DSL:
https://github.com/apache/incubator-systemml/pull/197

import numpy as np
from sklearn import datasets
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

4. Explain how to use our algorithm instead:
http://apache.github.io/incubator-systemml/algorithms-regression.html#examples

5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's embedded 
DSL or SystemML's mllearn, increase the data size. For example: use twitter 
feed.

  was:
As an example, this tutorial could have following sections:
1. Steps to start Python shell (or cloud service like datascientistworkbench) 
with SystemML support:
wget 
https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar

2. Give context for one of the algorithm: For example: Linear regression. We 
can borrow the technical detail from 
http://apache.github.io/incubator-systemml/algorithms-regression.html#description

3. Explain steps to download data we will use and how to implement Linear 
regression DS using embedded Python DSL:
https://github.com/apache/incubator-systemml/pull/197

import numpy as np
from sklearn import datasets
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

4. Explain how to use our algorithm instead:
http://apache.github.io/incubator-systemml/algorithms-regression.html#examples

5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's embedded 
DSL or SystemML's mllearn, increase the data size. For example: use twitter 
feed.


> Add a "Get Started" tutorial for Python users
> -
>
> Key: SYSTEMML-855
> URL: https://issues.apache.org/jira/browse/SYSTEMML-855
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>
> As an example, this tutorial could have following sections:
> 1. Steps to start Python shell (or cloud service like datascientistworkbench) 
> with SystemML support:
> wget 
> https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
> wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar
> OR
> Use pip installer.
> 2. Give context for one of the algorithm: For example: Linear regression. We 
> can borrow the technical detail from 
> http://apache.github.io/incubator-systemml/algorithms-regression.html#description
> 3. Explain steps to download data we will use and how to implement Linear 
> regression DS using embedded Python DSL:
> https://github.com/apache/incubator-systemml/pull/197
> import numpy as np
> from sklearn import datasets
> diabetes = datasets.load_diabetes()
> diabetes_X = diabetes.data[:, np.newaxis, 2]
> diabetes_X_train = diabetes_X[:-20]
> diabetes_X_test = diabetes_X[-20:]
> diabetes_y_train = diabetes.target[:-20]
> diabetes_y_test = diabetes.target[-20:]
> 4. Explain how to use our algorithm instead:
> http://apache.github.io/incubator-systemml/algorithms-regression.html#examples
> 5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's 
> embedded DSL or SystemML's mllearn, increase the data size. For example: use 
> twitter feed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-855) Add a "Get Started" tutorial for Python users

2016-08-09 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-855:
-
Description: 
As an example, this tutorial could have following sections:
1. Steps to start Python shell (or cloud service like datascientistworkbench) 
with SystemML support:
wget 
https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar

2. Give context for one of the algorithm: For example: Linear regression. We 
can borrow the technical detail from 
http://apache.github.io/incubator-systemml/algorithms-regression.html#description

3. Explain steps to download data we will use and how to implement Linear 
regression DS using embedded Python DSL:
https://github.com/apache/incubator-systemml/pull/197

import numpy as np
from sklearn import datasets
# Load the diabetes dataset
diabetes = datasets.load_diabetes()
# Use only one feature
diabetes_X = diabetes.data[:, np.newaxis, 2]
# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

4. Explain how to use our algorithm instead:
http://apache.github.io/incubator-systemml/algorithms-regression.html#examples

5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's embedded 
DSL or SystemML's mllearn, increase the data size. For example: use twitter 
feed.

  was:
As an example, this tutorial could have following sections:
1. Steps to start Python shell (or cloud service like datascientistworkbench) 
with SystemML support:
wget 
https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar

2. Give context for one of the algorithm: For example: Linear regression. We 
can borrow the technical detail from 
http://apache.github.io/incubator-systemml/algorithms-regression.html#description

3. Explain steps to download data we will use and how to implement Linear 
regression DS using embedded Python DSL:
https://github.com/apache/incubator-systemml/pull/197


import numpy as np
from sklearn import datasets
# Load the diabetes dataset
diabetes = datasets.load_diabetes()
# Use only one feature
diabetes_X = diabetes.data[:, np.newaxis, 2]
# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]


4. Explain how to use our algorithm instead:
http://apache.github.io/incubator-systemml/algorithms-regression.html#examples

5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's embedded 
DSL or SystemML's mllearn, increase the data size. For example: use twitter 
feed.


> Add a "Get Started" tutorial for Python users
> -
>
> Key: SYSTEMML-855
> URL: https://issues.apache.org/jira/browse/SYSTEMML-855
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>
> As an example, this tutorial could have following sections:
> 1. Steps to start Python shell (or cloud service like datascientistworkbench) 
> with SystemML support:
> wget 
> https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
> wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar
> 2. Give context for one of the algorithm: For example: Linear regression. We 
> can borrow the technical detail from 
> http://apache.github.io/incubator-systemml/algorithms-regression.html#description
> 3. Explain steps to download data we will use and how to implement Linear 
> regression DS using embedded Python DSL:
> https://github.com/apache/incubator-systemml/pull/197
> import numpy as np
> from sklearn import datasets
> # Load the diabetes dataset
> diabetes = datasets.load_diabetes()
> # Use only one feature
> diabetes_X = diabetes.data[:, np.newaxis, 2]
> # Split the data into training/testing sets
> diabetes_X_train = diabetes_X[:-20]
> diabetes_X_test = diabetes_X[-20:]
> # Split the targets into training/testing sets
> diabetes_y_train = diabetes.target[:-20]
> diabetes_y_test = diabetes.target[-20:]
> 4. Explain how to use our algorithm instead:
> http://apache.github.io/incubator-systemml/algorithms-regression.html#examples
> 5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's 
> embedded DSL or SystemML's mllearn, increase the data size. For example: use 
> twitter feed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-855) Add a "Get Started" tutorial for Python users

2016-08-09 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-855:
-
Description: 
As an example, this tutorial could have following sections:
1. Steps to start Python shell (or cloud service like datascientistworkbench) 
with SystemML support:
wget 
https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar
pyspark --master local[*] --driver-class-path SystemML.jar

OR

Use pip installer.

2. Give context for one of the algorithm: For example: Linear regression. We 
can borrow the technical detail from 
http://apache.github.io/incubator-systemml/algorithms-regression.html#description

3. Explain steps to download data we will use and how to implement Linear 
regression DS using embedded Python DSL:
https://github.com/apache/incubator-systemml/pull/197

import numpy as np
from sklearn import datasets
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

4. Explain how to use our algorithm instead:
http://apache.github.io/incubator-systemml/algorithms-regression.html#examples

5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's embedded 
DSL or SystemML's mllearn, increase the data size. For example: use twitter 
feed.

  was:
As an example, this tutorial could have following sections:
1. Steps to start Python shell (or cloud service like datascientistworkbench) 
with SystemML support:
wget 
https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar

OR

Use pip installer.

2. Give context for one of the algorithm: For example: Linear regression. We 
can borrow the technical detail from 
http://apache.github.io/incubator-systemml/algorithms-regression.html#description

3. Explain steps to download data we will use and how to implement Linear 
regression DS using embedded Python DSL:
https://github.com/apache/incubator-systemml/pull/197

import numpy as np
from sklearn import datasets
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

4. Explain how to use our algorithm instead:
http://apache.github.io/incubator-systemml/algorithms-regression.html#examples

5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's embedded 
DSL or SystemML's mllearn, increase the data size. For example: use twitter 
feed.


> Add a "Get Started" tutorial for Python users
> -
>
> Key: SYSTEMML-855
> URL: https://issues.apache.org/jira/browse/SYSTEMML-855
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>
> As an example, this tutorial could have following sections:
> 1. Steps to start Python shell (or cloud service like datascientistworkbench) 
> with SystemML support:
> wget 
> https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
> wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar
> pyspark --master local[*] --driver-class-path SystemML.jar
> OR
> Use pip installer.
> 2. Give context for one of the algorithm: For example: Linear regression. We 
> can borrow the technical detail from 
> http://apache.github.io/incubator-systemml/algorithms-regression.html#description
> 3. Explain steps to download data we will use and how to implement Linear 
> regression DS using embedded Python DSL:
> https://github.com/apache/incubator-systemml/pull/197
> import numpy as np
> from sklearn import datasets
> diabetes = datasets.load_diabetes()
> diabetes_X = diabetes.data[:, np.newaxis, 2]
> diabetes_X_train = diabetes_X[:-20]
> diabetes_X_test = diabetes_X[-20:]
> diabetes_y_train = diabetes.target[:-20]
> diabetes_y_test = diabetes.target[-20:]
> 4. Explain how to use our algorithm instead:
> http://apache.github.io/incubator-systemml/algorithms-regression.html#examples
> 5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's 
> embedded DSL or SystemML's mllearn, increase the data size. For example: use 
> twitter feed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-855) Add a "Get Started" tutorial for Python users

2016-08-09 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-855:
-
Description: 
As an example, this tutorial could have following sections:
1. Steps to start Python shell (or cloud service like datascientistworkbench) 
with SystemML support:
wget 
https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar

2. Give context for one of the algorithm: For example: Linear regression. We 
can borrow the technical detail from 
http://apache.github.io/incubator-systemml/algorithms-regression.html#description

3. Explain steps to download data we will use and how to implement Linear 
regression DS using embedded Python DSL:
https://github.com/apache/incubator-systemml/pull/197

import numpy as np
from sklearn import datasets
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

4. Explain how to use our algorithm instead:
http://apache.github.io/incubator-systemml/algorithms-regression.html#examples

5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's embedded 
DSL or SystemML's mllearn, increase the data size. For example: use twitter 
feed.

  was:
As an example, this tutorial could have following sections:
1. Steps to start Python shell (or cloud service like datascientistworkbench) 
with SystemML support:
wget 
https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar

2. Give context for one of the algorithm: For example: Linear regression. We 
can borrow the technical detail from 
http://apache.github.io/incubator-systemml/algorithms-regression.html#description

3. Explain steps to download data we will use and how to implement Linear 
regression DS using embedded Python DSL:
https://github.com/apache/incubator-systemml/pull/197

import numpy as np
from sklearn import datasets
# Load the diabetes dataset
diabetes = datasets.load_diabetes()
# Use only one feature
diabetes_X = diabetes.data[:, np.newaxis, 2]
# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

4. Explain how to use our algorithm instead:
http://apache.github.io/incubator-systemml/algorithms-regression.html#examples

5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's embedded 
DSL or SystemML's mllearn, increase the data size. For example: use twitter 
feed.


> Add a "Get Started" tutorial for Python users
> -
>
> Key: SYSTEMML-855
> URL: https://issues.apache.org/jira/browse/SYSTEMML-855
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>
> As an example, this tutorial could have following sections:
> 1. Steps to start Python shell (or cloud service like datascientistworkbench) 
> with SystemML support:
> wget 
> https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
> wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar
> 2. Give context for one of the algorithm: For example: Linear regression. We 
> can borrow the technical detail from 
> http://apache.github.io/incubator-systemml/algorithms-regression.html#description
> 3. Explain steps to download data we will use and how to implement Linear 
> regression DS using embedded Python DSL:
> https://github.com/apache/incubator-systemml/pull/197
> import numpy as np
> from sklearn import datasets
> diabetes = datasets.load_diabetes()
> diabetes_X = diabetes.data[:, np.newaxis, 2]
> diabetes_X_train = diabetes_X[:-20]
> diabetes_X_test = diabetes_X[-20:]
> diabetes_y_train = diabetes.target[:-20]
> diabetes_y_test = diabetes.target[-20:]
> 4. Explain how to use our algorithm instead:
> http://apache.github.io/incubator-systemml/algorithms-regression.html#examples
> 5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's 
> embedded DSL or SystemML's mllearn, increase the data size. For example: use 
> twitter feed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-855) Add a "Get Started" tutorial for Python users

2016-08-09 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-855:
-
Description: 
As an example, this tutorial could have following sections:
1. Steps to start Python shell (or cloud service like datascientistworkbench) 
with SystemML support:
wget 
https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar
pyspark --master local[*] --driver-class-path SystemML.jar

OR

Use pip installer.

2. Give context for one of the algorithm: For example: Linear regression. We 
can borrow the technical detail from 
http://apache.github.io/incubator-systemml/algorithms-regression.html#description

3. Explain steps to download data we will use and how to implement Linear 
regression DS using embedded Python DSL:
https://github.com/apache/incubator-systemml/pull/197

import numpy as np
from sklearn import datasets
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

4. Explain how to use our algorithm instead:
http://apache.github.io/incubator-systemml/algorithms-regression.html#examples

5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's embedded 
DSL or SystemML's mllearn, increase the data size. For example: use twitter 
feed.

By the end of tutorial, the programmer should understand at very high-level:
1. Moving to SystemML is painless. Almost as simple as changing "import"
2. SystemML has a sophisticated optimizer that allows it to adapt to different 
data/cluster characteristics and allows the code and algorithm to scale.

  was:
As an example, this tutorial could have following sections:
1. Steps to start Python shell (or cloud service like datascientistworkbench) 
with SystemML support:
wget 
https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar
pyspark --master local[*] --driver-class-path SystemML.jar

OR

Use pip installer.

2. Give context for one of the algorithm: For example: Linear regression. We 
can borrow the technical detail from 
http://apache.github.io/incubator-systemml/algorithms-regression.html#description

3. Explain steps to download data we will use and how to implement Linear 
regression DS using embedded Python DSL:
https://github.com/apache/incubator-systemml/pull/197

import numpy as np
from sklearn import datasets
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

4. Explain how to use our algorithm instead:
http://apache.github.io/incubator-systemml/algorithms-regression.html#examples

5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's embedded 
DSL or SystemML's mllearn, increase the data size. For example: use twitter 
feed.


> Add a "Get Started" tutorial for Python users
> -
>
> Key: SYSTEMML-855
> URL: https://issues.apache.org/jira/browse/SYSTEMML-855
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>
> As an example, this tutorial could have following sections:
> 1. Steps to start Python shell (or cloud service like datascientistworkbench) 
> with SystemML support:
> wget 
> https://raw.githubusercontent.com/apache/incubator-systemml/master/src/main/java/org/apache/sysml/api/python/SystemML.py
> wget https://sparktc.ibmcloud.com/repo/latest/SystemML.jar
> pyspark --master local[*] --driver-class-path SystemML.jar
> OR
> Use pip installer.
> 2. Give context for one of the algorithm: For example: Linear regression. We 
> can borrow the technical detail from 
> http://apache.github.io/incubator-systemml/algorithms-regression.html#description
> 3. Explain steps to download data we will use and how to implement Linear 
> regression DS using embedded Python DSL:
> https://github.com/apache/incubator-systemml/pull/197
> import numpy as np
> from sklearn import datasets
> diabetes = datasets.load_diabetes()
> diabetes_X = diabetes.data[:, np.newaxis, 2]
> diabetes_X_train = diabetes_X[:-20]
> diabetes_X_test = diabetes_X[-20:]
> diabetes_y_train = diabetes.target[:-20]
> diabetes_y_test = diabetes.target[-20:]
> 4. Explain how to use our algorithm instead:
> http://apache.github.io/incubator-systemml/algorithms-regression.html#examples
> 5. To explain tradeoffs of using NumPy or Scikit-Learn v/s SystemML's 
> embedded DSL or SystemML's mllearn, increase the data size. For example: use 
> 

[jira] [Commented] (SYSTEMML-1238) Python test failing for LinearRegCG

2017-02-08 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858442#comment-15858442
 ] 

Niketan Pansare commented on SYSTEMML-1238:
---

Looks like both script have same plan. This looks like an algorithm-related or 
repeatability issue as the statistics after training are as follows:

python_LinearReg_test_spark.1.6.log:
||r|| initial value = 64725.64237405237,  target value = 0.06472564237405237
Iteration 1:  ||r|| / ||r init|| = 0.013822097249150999
Iteration 2:  ||r|| / ||r init|| = 7.063617429825055E-14
The CG algorithm is done.
Computing the statistics...
938.237
152.919
AVG_TOT_Y,153.36255924170615
STDEV_TOT_Y,77.21853383600028
AVG_RES_Y,-1.081722178918495E-11
STDEV_RES_Y,63.03850633761024
DISPERSION,3973.8532812769263
PLAIN_R2,0.3351312506863876
ADJUSTED_R2,0.33354822985468857
PLAIN_R2_NOBIAS,0.3351312506863876
ADJUSTED_R2_NOBIAS,0.33354822985468857

python_LinearReg_test_spark.2.1.log:
||r|| initial value = 64725.64237405237,  target value = 0.06472564237405237
Iteration 1:  ||r|| / ||r init|| = 0.0137881395137
Iteration 2:  ||r|| / ||r init|| = 4.3730800595678527E-14
The CG algorithm is done.
Computing the statistics...
458.489
153.146
AVG_TOT_Y,153.36255924170615
STDEV_TOT_Y,77.21853383600028
AVG_RES_Y,-6.688193969161777E-12
STDEV_RES_Y,67.06389890324985
DISPERSION,4497.566536105316
PLAIN_R2,0.24750834362605834
ADJUSTED_R2,0.24571669682516795
PLAIN_R2_NOBIAS,0.24750834362605834
ADJUSTED_R2_NOBIAS,0.24571669682516795

> Python test failing for LinearRegCG
> ---
>
> Key: SYSTEMML-1238
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1238
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, APIs
>Affects Versions: SystemML 0.13
>Reporter: Imran Younus
> Attachments: python_LinearReg_test_spark.1.6.log, 
> python_LinearReg_test_spark.2.1.log
>
>
> [~deron] discovered that the one of the python test ({{test_mllearn_df.py}}) 
> with spark 2.1.0 was failing because the test score from linear regression 
> was very low ({{~ 0.24}}). I did a some investigation and it turns out the 
> the model parameters computed by the dml script are incorrect. In 
> systemml.12, the values of betas from linear regression model are 
> {{\[152.919, 938.237\]}}. This is what we expect from normal equation. (I 
> also tested this with sklearn). But the values of betas from systemml.13 
> (with spark 2.1.0) come out to be {{\[153.146, 458.489\]}}. These are not 
> correct and therefore the test score is much lower than expected. The data 
> going into DML script is correct. I printed out the valued of {{X}} and {{Y}} 
> in dml and I didn't see any issue there.
> Attached are the log files for two different tests (systemml0.12 and 0.13) 
> with explain flag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (SYSTEMML-1238) Python test failing for LinearRegCG

2017-02-08 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858442#comment-15858442
 ] 

Niketan Pansare edited comment on SYSTEMML-1238 at 2/8/17 7:36 PM:
---

Looks like both script have same plan. This looks like an algorithm-related or 
repeatability issue as the statistics after training are as follows:

python_LinearReg_test_spark.1.6.log:
{code}
||r|| initial value = 64725.64237405237,  target value = 0.06472564237405237
Iteration 1:  ||r|| / ||r init|| = 0.013822097249150999
Iteration 2:  ||r|| / ||r init|| = 7.063617429825055E-14
The CG algorithm is done.
Computing the statistics...
938.237
152.919
AVG_TOT_Y,153.36255924170615
STDEV_TOT_Y,77.21853383600028
AVG_RES_Y,-1.081722178918495E-11
STDEV_RES_Y,63.03850633761024
DISPERSION,3973.8532812769263
PLAIN_R2,0.3351312506863876
ADJUSTED_R2,0.33354822985468857
PLAIN_R2_NOBIAS,0.3351312506863876
ADJUSTED_R2_NOBIAS,0.33354822985468857
{/code}

python_LinearReg_test_spark.2.1.log:
||r|| initial value = 64725.64237405237,  target value = 0.06472564237405237
Iteration 1:  ||r|| / ||r init|| = 0.0137881395137
Iteration 2:  ||r|| / ||r init|| = 4.3730800595678527E-14
The CG algorithm is done.
Computing the statistics...
458.489
153.146
AVG_TOT_Y,153.36255924170615
STDEV_TOT_Y,77.21853383600028
AVG_RES_Y,-6.688193969161777E-12
STDEV_RES_Y,67.06389890324985
DISPERSION,4497.566536105316
PLAIN_R2,0.24750834362605834
ADJUSTED_R2,0.24571669682516795
PLAIN_R2_NOBIAS,0.24750834362605834
ADJUSTED_R2_NOBIAS,0.24571669682516795


was (Author: niketanpansare):
Looks like both script have same plan. This looks like an algorithm-related or 
repeatability issue as the statistics after training are as follows:

python_LinearReg_test_spark.1.6.log:
||r|| initial value = 64725.64237405237,  target value = 0.06472564237405237
Iteration 1:  ||r|| / ||r init|| = 0.013822097249150999
Iteration 2:  ||r|| / ||r init|| = 7.063617429825055E-14
The CG algorithm is done.
Computing the statistics...
938.237
152.919
AVG_TOT_Y,153.36255924170615
STDEV_TOT_Y,77.21853383600028
AVG_RES_Y,-1.081722178918495E-11
STDEV_RES_Y,63.03850633761024
DISPERSION,3973.8532812769263
PLAIN_R2,0.3351312506863876
ADJUSTED_R2,0.33354822985468857
PLAIN_R2_NOBIAS,0.3351312506863876
ADJUSTED_R2_NOBIAS,0.33354822985468857

python_LinearReg_test_spark.2.1.log:
||r|| initial value = 64725.64237405237,  target value = 0.06472564237405237
Iteration 1:  ||r|| / ||r init|| = 0.0137881395137
Iteration 2:  ||r|| / ||r init|| = 4.3730800595678527E-14
The CG algorithm is done.
Computing the statistics...
458.489
153.146
AVG_TOT_Y,153.36255924170615
STDEV_TOT_Y,77.21853383600028
AVG_RES_Y,-6.688193969161777E-12
STDEV_RES_Y,67.06389890324985
DISPERSION,4497.566536105316
PLAIN_R2,0.24750834362605834
ADJUSTED_R2,0.24571669682516795
PLAIN_R2_NOBIAS,0.24750834362605834
ADJUSTED_R2_NOBIAS,0.24571669682516795

> Python test failing for LinearRegCG
> ---
>
> Key: SYSTEMML-1238
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1238
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, APIs
>Affects Versions: SystemML 0.13
>Reporter: Imran Younus
> Attachments: python_LinearReg_test_spark.1.6.log, 
> python_LinearReg_test_spark.2.1.log
>
>
> [~deron] discovered that the one of the python test ({{test_mllearn_df.py}}) 
> with spark 2.1.0 was failing because the test score from linear regression 
> was very low ({{~ 0.24}}). I did a some investigation and it turns out the 
> the model parameters computed by the dml script are incorrect. In 
> systemml.12, the values of betas from linear regression model are 
> {{\[152.919, 938.237\]}}. This is what we expect from normal equation. (I 
> also tested this with sklearn). But the values of betas from systemml.13 
> (with spark 2.1.0) come out to be {{\[153.146, 458.489\]}}. These are not 
> correct and therefore the test score is much lower than expected. The data 
> going into DML script is correct. I printed out the valued of {{X}} and {{Y}} 
> in dml and I didn't see any issue there.
> Attached are the log files for two different tests (systemml0.12 and 0.13) 
> with explain flag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (SYSTEMML-1238) Python test failing for LinearRegCG

2017-02-08 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858442#comment-15858442
 ] 

Niketan Pansare edited comment on SYSTEMML-1238 at 2/8/17 7:36 PM:
---

Looks like both script have same plan. This looks like an algorithm-related or 
repeatability issue as the statistics after training are as follows:

python_LinearReg_test_spark.1.6.log:
{code}
||r|| initial value = 64725.64237405237,  target value = 0.06472564237405237
Iteration 1:  ||r|| / ||r init|| = 0.013822097249150999
Iteration 2:  ||r|| / ||r init|| = 7.063617429825055E-14
The CG algorithm is done.
Computing the statistics...
938.237
152.919
AVG_TOT_Y,153.36255924170615
STDEV_TOT_Y,77.21853383600028
AVG_RES_Y,-1.081722178918495E-11
STDEV_RES_Y,63.03850633761024
DISPERSION,3973.8532812769263
PLAIN_R2,0.3351312506863876
ADJUSTED_R2,0.33354822985468857
PLAIN_R2_NOBIAS,0.3351312506863876
ADJUSTED_R2_NOBIAS,0.33354822985468857
{code}

python_LinearReg_test_spark.2.1.log:
{code}
||r|| initial value = 64725.64237405237,  target value = 0.06472564237405237
Iteration 1:  ||r|| / ||r init|| = 0.0137881395137
Iteration 2:  ||r|| / ||r init|| = 4.3730800595678527E-14
The CG algorithm is done.
Computing the statistics...
458.489
153.146
AVG_TOT_Y,153.36255924170615
STDEV_TOT_Y,77.21853383600028
AVG_RES_Y,-6.688193969161777E-12
STDEV_RES_Y,67.06389890324985
DISPERSION,4497.566536105316
PLAIN_R2,0.24750834362605834
ADJUSTED_R2,0.24571669682516795
PLAIN_R2_NOBIAS,0.24750834362605834
ADJUSTED_R2_NOBIAS,0.24571669682516795
{code}


was (Author: niketanpansare):
Looks like both script have same plan. This looks like an algorithm-related or 
repeatability issue as the statistics after training are as follows:

python_LinearReg_test_spark.1.6.log:
{code}
||r|| initial value = 64725.64237405237,  target value = 0.06472564237405237
Iteration 1:  ||r|| / ||r init|| = 0.013822097249150999
Iteration 2:  ||r|| / ||r init|| = 7.063617429825055E-14
The CG algorithm is done.
Computing the statistics...
938.237
152.919
AVG_TOT_Y,153.36255924170615
STDEV_TOT_Y,77.21853383600028
AVG_RES_Y,-1.081722178918495E-11
STDEV_RES_Y,63.03850633761024
DISPERSION,3973.8532812769263
PLAIN_R2,0.3351312506863876
ADJUSTED_R2,0.33354822985468857
PLAIN_R2_NOBIAS,0.3351312506863876
ADJUSTED_R2_NOBIAS,0.33354822985468857
{code}

python_LinearReg_test_spark.2.1.log:
||r|| initial value = 64725.64237405237,  target value = 0.06472564237405237
Iteration 1:  ||r|| / ||r init|| = 0.0137881395137
Iteration 2:  ||r|| / ||r init|| = 4.3730800595678527E-14
The CG algorithm is done.
Computing the statistics...
458.489
153.146
AVG_TOT_Y,153.36255924170615
STDEV_TOT_Y,77.21853383600028
AVG_RES_Y,-6.688193969161777E-12
STDEV_RES_Y,67.06389890324985
DISPERSION,4497.566536105316
PLAIN_R2,0.24750834362605834
ADJUSTED_R2,0.24571669682516795
PLAIN_R2_NOBIAS,0.24750834362605834
ADJUSTED_R2_NOBIAS,0.24571669682516795

> Python test failing for LinearRegCG
> ---
>
> Key: SYSTEMML-1238
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1238
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, APIs
>Affects Versions: SystemML 0.13
>Reporter: Imran Younus
> Attachments: python_LinearReg_test_spark.1.6.log, 
> python_LinearReg_test_spark.2.1.log
>
>
> [~deron] discovered that the one of the python test ({{test_mllearn_df.py}}) 
> with spark 2.1.0 was failing because the test score from linear regression 
> was very low ({{~ 0.24}}). I did a some investigation and it turns out the 
> the model parameters computed by the dml script are incorrect. In 
> systemml.12, the values of betas from linear regression model are 
> {{\[152.919, 938.237\]}}. This is what we expect from normal equation. (I 
> also tested this with sklearn). But the values of betas from systemml.13 
> (with spark 2.1.0) come out to be {{\[153.146, 458.489\]}}. These are not 
> correct and therefore the test score is much lower than expected. The data 
> going into DML script is correct. I printed out the valued of {{X}} and {{Y}} 
> in dml and I didn't see any issue there.
> Attached are the log files for two different tests (systemml0.12 and 0.13) 
> with explain flag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1140) Sparse/Caching performance bugs related to deep learning scripts

2017-01-31 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15847664#comment-15847664
 ] 

Niketan Pansare commented on SYSTEMML-1140:
---

Sorry, I forgot to update this JIRA with series of improvements related to this 
PR:
1. Many CP convolution operators now have sparse support (except im2col). 
However, since CuDNN doesnot have a sparse equivalent, we only support dense 
convolution on GPU.
2. Fused operators such as relu_maxpooling and relu_backward has been added to 
reduce the conversion overhead of sparsity-introducing operators such as relu. 
In fact, the performance of relu_maxpooling is exactly same as that of 
maxpooling in CP, making relu a no-op in the fused implementation :)

[~mboehm7] I used Mike's Lenet script with MNIST dataset as an example. Please 
see 
https://github.com/apache/incubator-systemml/blob/master/scripts/staging/SystemML-NN/examples/Example%20-%20MNIST%20LeNet.ipynb
 ... Here is the Cache statistics from a sample run after adding the above 
mentioned fused operators (date: Jan 13th, 2017):

Cache hits (Mem, WB, FS, HDFS): 1096424/0/0/2.
Cache writes (WB, FS, HDFS): 603950/15/8.
Cache times (ACQr/m, RLS, EXP): 3.659/0.456/273.799/1.275 sec.

I have seen anywhere betweeh 250 to 500 seconds spent in Cache times.

You can also use Mike's Breast Cancer Project as an example workload.

> Sparse/Caching performance bugs related to deep learning scripts
> 
>
> Key: SYSTEMML-1140
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1140
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 1.0
>Reporter: Niketan Pansare
>Priority: Blocker
>
> We have identified two performance bugs that frequently occurs in deep 
> learning script.
> First, we repeatedly perform unnecessary conversion to sparse format. Also, 
> the operations such as matrix multiplication (including BLAS and CuBLAS) are  
> optimized for dense.
>   
> Second, even with large memory budget, we sometimes spend almost 20-30% time 
> in caching.
> [~mboehm7] [~reinwald] [~mwdus...@us.ibm.com] I am labeling this bug as 
> blocker for SystemML 1.0. Please feel free to assign this issue to yourself.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-1190) Allow Python/Scala UDF to be passed to SystemML

2017-01-24 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare closed SYSTEMML-1190.
-
   Resolution: Won't Fix
Fix Version/s: SystemML 1.0

> Allow Python/Scala UDF to be passed to SystemML
> ---
>
> Key: SYSTEMML-1190
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1190
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Niketan Pansare
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-1187) Documentation for removeEmpty with select is missing in DML Language reference

2017-01-20 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-1187:
-

 Summary: Documentation for removeEmpty with select is missing in 
DML Language reference
 Key: SYSTEMML-1187
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1187
 Project: SystemML
  Issue Type: Documentation
Reporter: Niketan Pansare






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (SYSTEMML-769) Improve the performance of im2col for dense input

2017-02-20 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare reassigned SYSTEMML-769:


Assignee: Niketan Pansare

> Improve the performance of im2col for dense input
> -
>
> Key: SYSTEMML-769
> URL: https://issues.apache.org/jira/browse/SYSTEMML-769
> Project: SystemML
>  Issue Type: Bug
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
>
> This will involve investigation of high cache release time
> [~mwdus...@us.ibm.com] [~mboehm7] [~prithvi_r_s]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-769) Improve the performance of im2col for dense input

2017-02-20 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare resolved SYSTEMML-769.
--
   Resolution: Fixed
Fix Version/s: SystemML 0.13

> Improve the performance of im2col for dense input
> -
>
> Key: SYSTEMML-769
> URL: https://issues.apache.org/jira/browse/SYSTEMML-769
> Project: SystemML
>  Issue Type: Bug
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
> Fix For: SystemML 0.13
>
>
> This will involve investigation of high cache release time
> [~mwdus...@us.ibm.com] [~mboehm7] [~prithvi_r_s]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (SYSTEMML-703) Install CUDA along with CuDNN on Jenkins

2017-02-20 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-703:
-
Affects Version/s: SystemML 1.0

> Install CUDA along with CuDNN on Jenkins
> 
>
> Key: SYSTEMML-703
> URL: https://issues.apache.org/jira/browse/SYSTEMML-703
> Project: SystemML
>  Issue Type: Task
>Affects Versions: SystemML 1.0
>Reporter: Niketan Pansare
>Assignee: Alan Chin
>Priority: Minor
>
> Please install:
> 1. CUDA 7.5
> 2. CuDNN v4 from 
> http://developer.download.nvidia.com/compute/redist/cudnn/v4/cudnn-7.0-win-x64-v4.0-prod.zip
> 3. Download JCuda binaries version 0.7.5b and JCudnn version 0.7.5. Link: 
> http://www.jcuda.org/downloads/downloads.html ... The library path for 
> test-cases is set in AutomatedTestbase class: 
> https://github.com/apache/incubator-systemml/pull/165/files#diff-bcda036e4c3ff62cb2648acbbd19f61aR113
> Once these changes are in (and once GPU backend in feature-complete) we can 
> set TEST_GPU flag in AutomatedTestbase class to true. Since it will take few 
> weeks to make GPU backend feature-complete, this is a low-priority task
> [~nakul02] [~akchin]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (SYSTEMML-1159) Enable Remote Hyperparameter Tuning

2017-02-20 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-1159:
--
Affects Version/s: SystemML 1.0

> Enable Remote Hyperparameter Tuning
> ---
>
> Key: SYSTEMML-1159
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1159
> Project: SystemML
>  Issue Type: Improvement
>Affects Versions: SystemML 1.0
>Reporter: Mike Dusenberry
>Priority: Blocker
>
> Training a parameterized machine learning model (such as a large neural net 
> in deep learning) requires learning a set of ideal model parameters from the 
> data, as well as determining appropriate hyperparameters (or "settings") for 
> the training process itself.  In the latter case, the hyperparameters (i.e. 
> learning rate, regularization strength, dropout percentage, model 
> architecture, etc.) can not be learned from the data, and instead are 
> determined via a search across a space for each hyperparameter.  For large 
> numbers of hyperparameters (such as in deep learning models), the current 
> literature points to performing staged, randomized grid searches over the 
> space to produce distributions of performance, narrowing the space after each 
> search \[1].  Thus, for efficient hyperparameter optimization, it is 
> desirable to train several models in parallel, with each model trained over 
> the full dataset.  For deep learning models, a mini-batch training approach 
> is currently state-of-the-art, and thus separate models with different 
> hyperparameters could, conceivably, be easily trained on each of the nodes in 
> a cluster.
> In order to allow for the training of deep learning models, SystemML needs to 
> determine a solution to enable this scenario with the Spark backend.  
> Specifically, if the user has a {{train}} function that takes a set of 
> hyperparameters and trains a model with a mini-batch approach (and thus is 
> only making use of single-node instructions within the function), the user 
> should be able to wrap this function with, for example, a remote {{parfor}} 
> construct that samples hyperparameters and calls the {{train}} function on 
> each machine in parallel.
> To be clear, each model would need access to the entire dataset, and each 
> model would be trained independently.
> \[1]: http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-824) Improve the performance of binary cell-wise operations

2017-02-20 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875005#comment-15875005
 ] 

Niketan Pansare commented on SYSTEMML-824:
--

In context of deep learning, we should consider adding fused operator for 
common updates. For example: 
https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/udf/lib/SGDNesterovUpdate.java

> Improve the performance of binary cell-wise operations
> --
>
> Key: SYSTEMML-824
> URL: https://issues.apache.org/jira/browse/SYSTEMML-824
> Project: SystemML
>  Issue Type: Task
>Affects Versions: SystemML 1.0
>Reporter: Niketan Pansare
>
> The cellwise (matrix-matrix as well as matrix-scalar) operations take 
> significant amount of time while training Lenet. Here are few ways to improve 
> the performance of cell-wise operations:
> 1. Inject in-place updates [1] (saving on zero-ing out the matrix).
> 2. Fused cell-wise operations (as an example, recently added axpy operations: 
> https://github.com/apache/incubator-systemml/commit/b584aecf6b3a1eb96ff83b78cc3ad7c7c6d15baa).
>  
> 3. Parallelize cellwise operations (initial investigation need to be 
> conducted before proceeding in this direction especially in sparse case: 
> https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/runtime/matrix/data/LibMatrixBincell.java#L274).
> [~nakul02] [~mwdus...@us.ibm.com] [~prithvi_r_s] [~mboehm7] [~reinwald]
> Reference:
> [1] http://www.diku.dk/hjemmesider/ansatte/torbenm/ICD/Register.pdf



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (SYSTEMML-1160) Enable Prefetching of Mini-Batches

2017-02-20 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-1160:
--
Affects Version/s: SystemML 1.0

> Enable Prefetching of Mini-Batches
> --
>
> Key: SYSTEMML-1160
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1160
> Project: SystemML
>  Issue Type: New Feature
>Affects Versions: SystemML 1.0
>Reporter: Mike Dusenberry
>Priority: Blocker
>
> For efficient training of large deep learning models, a mini-batch training 
> approach is preferred.  On SystemML with the Spark backend, this currently 
> equates to grabbing a mini-batch from an RDD (via a PartitionPruning RDD -- 
> see SYSTEMML-951), and then using entirely single-node instructions for each 
> mini-batch.  While the fetching of partitions has been made efficient, we 
> currently have to pause after each training step to grab the next partition.  
> For large models, training time is already an issue even for GPUs with 
> saturated input pipelines.  Thus, we need to enable prefetching of 
> mini-batches that runs in parallel to the training loop.  One possibility 
> would be to create an input queue that is fed from a prefetch thread, and 
> that then feeds the training loop. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-824) Improve the performance of binary cell-wise operations

2017-02-20 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875024#comment-15875024
 ] 

Niketan Pansare commented on SYSTEMML-824:
--

Sounds good, Let's wait for https://issues.apache.org/jira/browse/SYSTEMML-1284 
and then explore potential fused operators at runtime.

> Improve the performance of binary cell-wise operations
> --
>
> Key: SYSTEMML-824
> URL: https://issues.apache.org/jira/browse/SYSTEMML-824
> Project: SystemML
>  Issue Type: Task
>Affects Versions: SystemML 1.0
>Reporter: Niketan Pansare
>
> The cellwise (matrix-matrix as well as matrix-scalar) operations take 
> significant amount of time while training Lenet. Here are few ways to improve 
> the performance of cell-wise operations:
> 1. Inject in-place updates [1] (saving on zero-ing out the matrix).
> 2. Fused cell-wise operations (as an example, recently added axpy operations: 
> https://github.com/apache/incubator-systemml/commit/b584aecf6b3a1eb96ff83b78cc3ad7c7c6d15baa).
>  
> 3. Parallelize cellwise operations (initial investigation need to be 
> conducted before proceeding in this direction especially in sparse case: 
> https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/runtime/matrix/data/LibMatrixBincell.java#L274).
> [~nakul02] [~mwdus...@us.ibm.com] [~prithvi_r_s] [~mboehm7] [~reinwald]
> Reference:
> [1] http://www.diku.dk/hjemmesider/ansatte/torbenm/ICD/Register.pdf



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (SYSTEMML-824) Improve the performance of binary cell-wise operations

2017-02-20 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-824:
-
Affects Version/s: SystemML 1.0

> Improve the performance of binary cell-wise operations
> --
>
> Key: SYSTEMML-824
> URL: https://issues.apache.org/jira/browse/SYSTEMML-824
> Project: SystemML
>  Issue Type: Task
>Affects Versions: SystemML 1.0
>Reporter: Niketan Pansare
>
> The cellwise (matrix-matrix as well as matrix-scalar) operations take 
> significant amount of time while training Lenet. Here are few ways to improve 
> the performance of cell-wise operations:
> 1. Inject in-place updates [1] (saving on zero-ing out the matrix).
> 2. Fused cell-wise operations (as an example, recently added axpy operations: 
> https://github.com/apache/incubator-systemml/commit/b584aecf6b3a1eb96ff83b78cc3ad7c7c6d15baa).
>  
> 3. Parallelize cellwise operations (initial investigation need to be 
> conducted before proceeding in this direction especially in sparse case: 
> https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/runtime/matrix/data/LibMatrixBincell.java#L274).
> [~nakul02] [~mwdus...@us.ibm.com] [~prithvi_r_s] [~mboehm7] [~reinwald]
> Reference:
> [1] http://www.diku.dk/hjemmesider/ansatte/torbenm/ICD/Register.pdf



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (SYSTEMML-446) Phase 1: Exploit GPU BLAS libraries (integration)

2017-02-20 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-446:
-
Affects Version/s: SystemML 1.0

> Phase 1: Exploit GPU BLAS libraries (integration)
> -
>
> Key: SYSTEMML-446
> URL: https://issues.apache.org/jira/browse/SYSTEMML-446
> Project: SystemML
>  Issue Type: Task
>  Components: Compiler, Runtime
>Affects Versions: SystemML 1.0
>Reporter: Matthias Boehm
>Assignee: Niketan Pansare
> Fix For: SystemML 0.13
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1339) Autoencoder script for acoustic signal modeling

2017-02-20 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-1339:
-

 Summary: Autoencoder script for acoustic signal modeling
 Key: SYSTEMML-1339
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1339
 Project: SystemML
  Issue Type: Task
Reporter: Niketan Pansare
Assignee: Prithviraj Sen






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1339) Autoencoder script for acoustic signal modeling

2017-02-20 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare resolved SYSTEMML-1339.
---
   Resolution: Fixed
Fix Version/s: SystemML 1.0

Fixed in the commit 
https://github.com/apache/incubator-systemml/commit/8eed1ec94b8070710d532358906a050cd4f727fc

> Autoencoder script for acoustic signal modeling
> ---
>
> Key: SYSTEMML-1339
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1339
> Project: SystemML
>  Issue Type: Task
>Reporter: Niketan Pansare
>Assignee: Prithviraj Sen
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1238) Python test failing for LinearRegCG

2017-02-16 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871102#comment-15871102
 ] 

Niketan Pansare commented on SYSTEMML-1238:
---

1. I have verified that the mllearn API in 0.12.0 produces correct results.
2. No changes have been introduced in Python/Scala wrappers to affect this. The 
only change I see in algo since 0.12.0 is cbind. The bug is likely due to a 
side-effect of some other change.
3. I verified that Python wrappers are passing correct inputs to DML script by 
writing the input X,y to file and comparing it with original python data.

I tested LinRegDS:
A. commandline:
{code}
 ~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit SystemML.jar -f LinearRegDS.dml 
-nvargs X=X.csv Y=y.csv B=B.csv fmt=csv icpt=1 tol=0.01 reg=1
Calling the Direct Solver...
Computing the statistics...
17/02/16 21:02:52 INFO MapPartitionsRDD: Removing RDD 17 from persistence list
17/02/16 21:02:52 INFO BlockManager: Removing RDD 17
AVG_TOT_Y,152.13348416289594
STDEV_TOT_Y,77.09300453299106
AVG_RES_Y,-2.935409582574532E-14
STDEV_RES_Y,66.48545020578437
DISPERSION,4420.315089065834
PLAIN_R2,0.2579428201690507
ADJUSTED_R2,0.2562563265785258
PLAIN_R2_NOBIAS,0.2579428201690507
ADJUSTED_R2_NOBIAS,0.2562563265785258
Writing the output matrix...
END LINEAR REGRESSION SCRIPT
{code}

B. mllearn:
{code}
Calling the Direct Solver...
Computing the statistics...
AVG_TOT_Y,153.36255924170615
STDEV_TOT_Y,77.21853383600028
AVG_RES_Y,4.8020565933360324E-14
STDEV_RES_Y,67.06389890324985
DISPERSION,4497.566536105316
PLAIN_R2,0.24750834362605834
ADJUSTED_R2,0.24571669682516795
PLAIN_R2_NOBIAS,0.24750834362605834
ADJUSTED_R2_NOBIAS,0.24571669682516795
Writing the output matrix...
END LINEAR REGRESSION SCRIPT
lr
{code}

> Python test failing for LinearRegCG
> ---
>
> Key: SYSTEMML-1238
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1238
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, APIs
>Affects Versions: SystemML 0.13
>Reporter: Imran Younus
>Assignee: Niketan Pansare
> Attachments: python_LinearReg_test_spark.1.6.log, 
> python_LinearReg_test_spark.2.1.log
>
>
> [~deron] discovered that the one of the python test ({{test_mllearn_df.py}}) 
> with spark 2.1.0 was failing because the test score from linear regression 
> was very low ({{~ 0.24}}). I did a some investigation and it turns out the 
> the model parameters computed by the dml script are incorrect. In 
> systemml.12, the values of betas from linear regression model are 
> {{\[152.919, 938.237\]}}. This is what we expect from normal equation. (I 
> also tested this with sklearn). But the values of betas from systemml.13 
> (with spark 2.1.0) come out to be {{\[153.146, 458.489\]}}. These are not 
> correct and therefore the test score is much lower than expected. The data 
> going into DML script is correct. I printed out the valued of {{X}} and {{Y}} 
> in dml and I didn't see any issue there.
> Attached are the log files for two different tests (systemml0.12 and 0.13) 
> with explain flag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1238) Python test failing for LinearRegCG

2017-02-17 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare resolved SYSTEMML-1238.
---
   Resolution: Fixed
Fix Version/s: SystemML 0.13

Fixed in the commit 
https://github.com/apache/incubator-systemml/commit/9d0087cbbd250c9b486923555b450602f816cf19
 by setting regularization to 0 (similar to that of scikit-learn).

> Python test failing for LinearRegCG
> ---
>
> Key: SYSTEMML-1238
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1238
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, APIs
>Affects Versions: SystemML 0.13
>Reporter: Imran Younus
>Assignee: Niketan Pansare
> Fix For: SystemML 0.13
>
> Attachments: python_LinearReg_test_spark.1.6.log, 
> python_LinearReg_test_spark.2.1.log
>
>
> [~deron] discovered that the one of the python test ({{test_mllearn_df.py}}) 
> with spark 2.1.0 was failing because the test score from linear regression 
> was very low ({{~ 0.24}}). I did a some investigation and it turns out the 
> the model parameters computed by the dml script are incorrect. In 
> systemml.12, the values of betas from linear regression model are 
> {{\[152.919, 938.237\]}}. This is what we expect from normal equation. (I 
> also tested this with sklearn). But the values of betas from systemml.13 
> (with spark 2.1.0) come out to be {{\[153.146, 458.489\]}}. These are not 
> correct and therefore the test score is much lower than expected. The data 
> going into DML script is correct. I printed out the valued of {{X}} and {{Y}} 
> in dml and I didn't see any issue there.
> Attached are the log files for two different tests (systemml0.12 and 0.13) 
> with explain flag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1238) Python test failing for LinearRegCG

2017-02-17 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872777#comment-15872777
 ] 

Niketan Pansare commented on SYSTEMML-1238:
---

Thanks Imran :)

> Python test failing for LinearRegCG
> ---
>
> Key: SYSTEMML-1238
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1238
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, APIs
>Affects Versions: SystemML 0.13
>Reporter: Imran Younus
>Assignee: Niketan Pansare
> Fix For: SystemML 0.13
>
> Attachments: python_LinearReg_test_spark.1.6.log, 
> python_LinearReg_test_spark.2.1.log
>
>
> [~deron] discovered that the one of the python test ({{test_mllearn_df.py}}) 
> with spark 2.1.0 was failing because the test score from linear regression 
> was very low ({{~ 0.24}}). I did a some investigation and it turns out the 
> the model parameters computed by the dml script are incorrect. In 
> systemml.12, the values of betas from linear regression model are 
> {{\[152.919, 938.237\]}}. This is what we expect from normal equation. (I 
> also tested this with sklearn). But the values of betas from systemml.13 
> (with spark 2.1.0) come out to be {{\[153.146, 458.489\]}}. These are not 
> correct and therefore the test score is much lower than expected. The data 
> going into DML script is correct. I printed out the valued of {{X}} and {{Y}} 
> in dml and I didn't see any issue there.
> Attached are the log files for two different tests (systemml0.12 and 0.13) 
> with explain flag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (SYSTEMML-1238) Python test failing for LinearRegCG

2017-02-16 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871170#comment-15871170
 ] 

Niketan Pansare edited comment on SYSTEMML-1238 at 2/17/17 5:33 AM:


I am able to reproduce this bug (not sure if it is) with command-line as well. 
Here is the output of GLM-predict (after running LinRegDS):
{code}
$ cat y_predicted.csv
189.09660701586185
133.3260601238074
157.3739106185465
132.8144037303023
135.88434209133283
154.81562865102103
194.2131709509127
136.3959984848379
125.13955782772601
137.41931127184807
178.35182275225503
123.60458864721075
152.7690030770007
141.0009060263837
116.95305553164462
161.46716176658717
144.58250078091928
144.58250078091928
170.67697684967874
117.4647119251497
{code}

Here is the output of Python mllearn:
{code}
>>> import numpy as np
>>> from pyspark.context import SparkContext
>>> from pyspark.ml import Pipeline
>>> from pyspark.ml.feature import HashingTF, Tokenizer
from pyspark.sql import SparkSession
from sklearn import datasets, metrics, neighbors
>>> from pyspark.sql import SparkSession
>>> from sklearn import datasets, metrics, neighbors
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer

from systemml.mllearn import LinearRegression, LogisticRegression, NaiveBayes, 
SVM
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]
sparkSession = SparkSession.builder.getOrCreate()
regr = LinearRegression(sparkSession, solver="direct-solve")
regr.fit(diabetes_X_train, diabetes_y_train)>>> from sklearn.datasets import 
fetch_20newsgroups
>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>>
>>> from systemml.mllearn import LinearRegression, LogisticRegression, 
>>> NaiveBayes, SVM
>>> diabetes = datasets.load_diabetes()
>>> diabetes_X = diabetes.data[:, np.newaxis, 2]
>>> diabetes_X_train = diabetes_X[:-20]
>>> diabetes_X_test = diabetes_X[-20:]
>>> diabetes_y_train = diabetes.target[:-20]
>>> diabetes_y_test = diabetes.target[-20:]
>>> sparkSession = SparkSession.builder.getOrCreate()
>>> regr = LinearRegression(sparkSession, solver="direct-solve")
>>> regr.fit(diabetes_X_train, diabetes_y_train)

Welcome to Apache SystemML!

17/02/16 22:39:21 WARN RewriteRemovePersistentReadWrite: Non-registered 
persistent write of variable 'X' (line 87).
17/02/16 22:39:21 WARN RewriteRemovePersistentReadWrite: Non-registered 
persistent write of variable 'y' (line 88).
BEGIN LINEAR REGRESSION SCRIPT
Reading X and Y...
Calling the Direct Solver...
Computing the statistics...
AVG_TOT_Y,153.36255924170615
STDEV_TOT_Y,77.21853383600028
AVG_RES_Y,4.8020565933360324E-14
STDEV_RES_Y,67.06389890324985
DISPERSION,4497.566536105316
PLAIN_R2,0.24750834362605834
ADJUSTED_R2,0.24571669682516795
PLAIN_R2_NOBIAS,0.24750834362605834
ADJUSTED_R2_NOBIAS,0.24571669682516795
Writing the output matrix...
END LINEAR REGRESSION SCRIPT
lr
>>> regr.predict(diabetes_X_test)
17/02/16 22:39:35 WARN Expression: WARNING: null -- line 149, column 4 -- Read 
input file does not exist on FS (local mode):
17/02/16 22:39:35 WARN Expression: Metadata file:  .mtd not provided
array([[ 188.84521284],
   [ 134.98127765],
   [ 158.20701117],
   [ 134.4871131 ],
   [ 137.45210036],
   [ 155.73618846],
   [ 193.78685827],
   [ 137.94626491],
   [ 127.07464496],
   [ 138.93459399],
   [ 178.46775744],
   [ 125.59215133],
   [ 153.75953028],
   [ 142.39374579],
   [ 119.16801227],
   [ 162.16032752],
   [ 145.8528976 ],
   [ 145.8528976 ],
   [ 171.05528929],
   [ 119.66217681]])
{code}

To reproduce the command-line output, please dump the test data into csv:
{code}
import numpy as np
from sklearn import datasets
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]
diabetes_X_test.tofile('X_test.csv', sep="\n")
diabetes_X.tofile('X.csv', sep="\n")
diabetes.target.tofile('y.csv', sep="\n")
{code}

And execute following commands (you may have to edit dml script to add format 
or create metadata file):
{code}
~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit SystemML.jar -f LinearRegDS.dml 
-nvargs X=X.csv Y=y.csv B=B.csv fmt=csv icpt=1 tol=0.01 reg=1 
~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit SystemML.jar -f GLM-predict.dml 
-nvargs X=X_test.csv M=y_predicted.csv B=B.csv fmt=csv icpt=1 tol=0.01 reg=1
{code}

I also tested using SystemML 0.12.0 and got the same predictions:
{code}
$ ~/spark-1.6.1-bin-hadoop2.6/bin/spark-submit systemml-0.12.0-incubating.jar 
-f LinearRegDS.dml -nvargs X=X.csv 

[jira] [Created] (SYSTEMML-1340) Implement relu_maxpooling instruction for GPU

2017-02-21 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-1340:
-

 Summary: Implement relu_maxpooling instruction for GPU
 Key: SYSTEMML-1340
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1340
 Project: SystemML
  Issue Type: Sub-task
  Components: Runtime
Affects Versions: SystemML 1.0
Reporter: Niketan Pansare
Assignee: Niketan Pansare






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1341) Implement conv2d_bias_add instruction for GPU

2017-02-21 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-1341:
-

 Summary: Implement conv2d_bias_add instruction for GPU
 Key: SYSTEMML-1341
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1341
 Project: SystemML
  Issue Type: Sub-task
  Components: Runtime
Affects Versions: SystemML 1.0
Reporter: Niketan Pansare
Assignee: Niketan Pansare






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1342) Implement conv2d_bias_add instruction for GPU

2017-02-21 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-1342:
-

 Summary: Implement conv2d_bias_add instruction for GPU
 Key: SYSTEMML-1342
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1342
 Project: SystemML
  Issue Type: Sub-task
  Components: Runtime
Affects Versions: SystemML 1.0
Reporter: Niketan Pansare
Assignee: Niketan Pansare






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-1342) Implement conv2d_bias_add instruction for GPU

2017-02-21 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare closed SYSTEMML-1342.
-
Resolution: Duplicate

> Implement conv2d_bias_add instruction for GPU
> -
>
> Key: SYSTEMML-1342
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1342
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Runtime
>Affects Versions: SystemML 1.0
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
> Fix For: SystemML 0.13
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1343) Implement map-side instruction for prediction

2017-02-21 Thread Niketan Pansare (JIRA)
Niketan Pansare created SYSTEMML-1343:
-

 Summary: Implement map-side instruction for prediction
 Key: SYSTEMML-1343
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1343
 Project: SystemML
  Issue Type: Sub-task
Reporter: Niketan Pansare
Assignee: Niketan Pansare


This task includes implementing map-side instruction for convolution forward 
and maxpooling forward. This approach has an added penalty of reblocking the 
input RDD into rectangular format (if not already done). This operator will 
serve as an initial cut (and also a baseline).

[~mwdus...@us.ibm.com]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-941) Add support for cusparse axpy

2017-02-14 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866491#comment-15866491
 ] 

Niketan Pansare commented on SYSTEMML-941:
--

[~nakul02] Are you OK with closing this JIRA as resolved ?

> Add support for cusparse axpy
> -
>
> Key: SYSTEMML-941
> URL: https://issues.apache.org/jira/browse/SYSTEMML-941
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Niketan Pansare
>Assignee: Nakul Jindal
> Fix For: SystemML 0.11
>
>
> See LibMatrixCUDA's vectorScalarMultiply()
> [~nakul02]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-931) Error while allocating CSRPointer

2017-02-14 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866487#comment-15866487
 ] 

Niketan Pansare commented on SYSTEMML-931:
--

[~nakul02] I believe this issue has been fixed. Can you please confirm ?

> Error while allocating CSRPointer
> -
>
> Key: SYSTEMML-931
> URL: https://issues.apache.org/jira/browse/SYSTEMML-931
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Niketan Pansare
>
> org.apache.sysml.test.integration.functions.reorg.FullTransposeTest's 
> testTransposeRowVectorSparseSP() and testTransposeMatrixSparseSP() test cases 
> are failing while allocating CSRPointer.
> Caused by: jcuda.CudaException: cudaErrorMemoryAllocation
>   at jcuda.runtime.JCuda.checkResult(JCuda.java:437)
>   at jcuda.runtime.JCuda.cudaMalloc(JCuda.java:3811)
>   at 
> org.apache.sysml.runtime.instructions.gpu.context.JCudaObject$CSRPointer.allocateEmpty(JCudaObject.java:156)
>   at 
> org.apache.sysml.runtime.instructions.gpu.context.JCudaObject.allocateMemoryOnDevice(JCudaObject.java:464)
> [~nakul02]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (SYSTEMML-1033) Add LU and QR functionality to GPU backend

2017-02-14 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare reassigned SYSTEMML-1033:
-

Assignee: Nakul Jindal

> Add LU and QR functionality to GPU backend
> --
>
> Key: SYSTEMML-1033
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1033
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Runtime
>Reporter: Niketan Pansare
>Assignee: Nakul Jindal
> Fix For: SystemML 1.0
>
>
> For LU: See JCublas2's cublasDgetrfBatched method
> For QR: See JCublas2's cublasDgeqrfBatched method
> The key changes required:
> 1. Add GPU backend in 
> https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/hops/FunctionOp.java#L239
> 2. Add MultiReturnBuiltinGPUInstruction that invokes above functions either 
> directly or through LibMatrixCUDA.
> [~nakul02] Do you want to take a pass at this ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-938) Make sparse memory estimation robust by handling unknown nnz.

2017-02-14 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866615#comment-15866615
 ] 

Niketan Pansare commented on SYSTEMML-938:
--

[~nakul02] Can you confirm if this issue is resolved ?

> Make sparse memory estimation robust by handling unknown nnz.
> -
>
> Key: SYSTEMML-938
> URL: https://issues.apache.org/jira/browse/SYSTEMML-938
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Niketan Pansare
>
> What if CSRPointer.estimateSize(mat.getNnz(), mat.getNumRows()) ?
> [~nakul02]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-939) LibMatrixCUDA's vectorScalarMultiply() produces incorrect results.

2017-02-14 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866496#comment-15866496
 ] 

Niketan Pansare commented on SYSTEMML-939:
--

[~nakul02] I believe this issue has been fixed. Can you please confirm ?

> LibMatrixCUDA's vectorScalarMultiply() produces incorrect results.
> --
>
> Key: SYSTEMML-939
> URL: https://issues.apache.org/jira/browse/SYSTEMML-939
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Niketan Pansare
>
> Please use org.apache.sysml.test.integration.functions.aggregate.MinTest's 
> testGeneral() and uncomment lines 79 in MatrixScalarArithmeticGPUInstruction 
> once it is tested.
> [~tanuj]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-731) Conduct initial performance experiments for mat mult

2017-02-14 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare closed SYSTEMML-731.

   Resolution: Fixed
Fix Version/s: SystemML 0.12

> Conduct initial performance experiments for mat mult
> 
>
> Key: SYSTEMML-731
> URL: https://issues.apache.org/jira/browse/SYSTEMML-731
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Niketan Pansare
>Assignee: Nakul Jindal
> Fix For: SystemML 0.12
>
>
> Before the PR https://github.com/apache/incubator-systemml/pull/165 gets 
> merged, initial performance experiments needs to be conducted for dense-dense 
> mat mult.
> [~nakul02] [~mboehm7]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (SYSTEMML-938) Make sparse memory estimation robust by handling unknown nnz.

2017-02-14 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare reassigned SYSTEMML-938:


Assignee: Nakul Jindal

> Make sparse memory estimation robust by handling unknown nnz.
> -
>
> Key: SYSTEMML-938
> URL: https://issues.apache.org/jira/browse/SYSTEMML-938
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Niketan Pansare
>Assignee: Nakul Jindal
>
> What if CSRPointer.estimateSize(mat.getNnz(), mat.getNumRows()) ?
> [~nakul02]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-943) Create documentation explaining setup/usage for the GPU backend

2017-02-14 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866621#comment-15866621
 ] 

Niketan Pansare commented on SYSTEMML-943:
--

The documentation 
https://github.com/apache/incubator-systemml/blob/master/docs/devdocs/gpu-backend.md
 needs to be moved to doc folder after GPU is marked as stable.

> Create documentation explaining setup/usage for the GPU backend
> ---
>
> Key: SYSTEMML-943
> URL: https://issues.apache.org/jira/browse/SYSTEMML-943
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Niketan Pansare
> Fix For: SystemML 0.11
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (SYSTEMML-731) Conduct initial performance experiments for mat mult

2017-02-14 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare reassigned SYSTEMML-731:


Assignee: Nakul Jindal  (was: Niketan Pansare)

> Conduct initial performance experiments for mat mult
> 
>
> Key: SYSTEMML-731
> URL: https://issues.apache.org/jira/browse/SYSTEMML-731
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Niketan Pansare
>Assignee: Nakul Jindal
> Fix For: SystemML 0.12
>
>
> Before the PR https://github.com/apache/incubator-systemml/pull/165 gets 
> merged, initial performance experiments needs to be conducted for dense-dense 
> mat mult.
> [~nakul02] [~mboehm7]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (SYSTEMML-943) Create documentation explaining setup/usage for the GPU backend

2017-02-14 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare updated SYSTEMML-943:
-
Fix Version/s: (was: SystemML 0.11)
   SystemML 1.0

> Create documentation explaining setup/usage for the GPU backend
> ---
>
> Key: SYSTEMML-943
> URL: https://issues.apache.org/jira/browse/SYSTEMML-943
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Niketan Pansare
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (SYSTEMML-943) Create documentation explaining setup/usage for the GPU backend

2017-02-14 Thread Niketan Pansare (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketan Pansare reassigned SYSTEMML-943:


Assignee: Niketan Pansare

> Create documentation explaining setup/usage for the GPU backend
> ---
>
> Key: SYSTEMML-943
> URL: https://issues.apache.org/jira/browse/SYSTEMML-943
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler, Runtime
>Reporter: Niketan Pansare
>Assignee: Niketan Pansare
> Fix For: SystemML 1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   3   4   >