Re: Improve SystemML execution speed in Spark

2017-05-15 Thread arijit chakraborty
failed. So thank yo for nudging me towards looking at cmd. Sharing all these information here for letting you know that I could solve the issue raised here. And thank you for that. Secondly, can help someone if they have the same issue. Thanks again! Arijit

Re: Error while Testing with Larger dataset

2017-05-12 Thread arijit chakraborty
stion. A quick fix in this situation is to increase driver/executor memory. > On May 12, 2017, at 6:44 AM, arijit chakraborty wrote: > > Hi, > > > > I was testing my code with 10,000 observations. But the code is failing. Please find the log below. The code is working perfectly

Re: Improve SystemML execution speed in Spark

2017-05-12 Thread arijit chakraborty
h Jvm sizes, number of executors, etc), pyspark script (containing mlcontext code) and DML script. This will help me reproduce those numbers on my cluster. May be Matthias can comment on buildTree part. Thanks Niketan > On May 12, 2017, at 5:07 AM, arijit chakraborty wrote: > > Hi Niketa

Error while Testing with Larger dataset

2017-05-12 Thread arijit chakraborty
Hi, I was testing my code with 10,000 observations. But the code is failing. Please find the log below. The code is working perfectly with smaller datasets. In R it's taking around 2 hours to run this model. I'm using 4 core PC and running spark through jupyter notebook. In python: ---

Re: Improve SystemML execution speed in Spark

2017-05-12 Thread arijit chakraborty
_______ From: arijit chakraborty Sent: Friday, May 12, 2017 2:32:07 AM To: dev@systemml.incubator.apache.org Subject: Re: Improve SystemML execution speed in Spark Hi Niketan, Thank you for your suggestion! I tried what you suggested. ## Changed it here: from pyspark.sql

Re: Improve SystemML execution speed in Spark

2017-05-11 Thread arijit chakraborty
Also, since dataframe creation is lazy, you may to do persist() followed by an action such as count() to ensure you are measuring it correctly. > On May 11, 2017, at 1:27 PM, arijit chakraborty wrote: > > Thank you Niketan for your reply! I was actually putting the timer in the dml > code part

Re: Improve SystemML execution speed in Spark

2017-05-11 Thread arijit chakraborty
taFrame(test_data)) Also, you can pass pandas data frame directly to MLContext :) Thanks Niketan > On May 10, 2017, at 10:31 AM, arijit chakraborty wrote: > > Hi, > > > I'm creating a process in SystemML, and running it through spark. I'm running > the co

Improve SystemML execution speed in Spark

2017-05-10 Thread arijit chakraborty
Hi, I'm creating a process in SystemML, and running it through spark. I'm running the code in the following way: # Spark Specifications: import os import sys import pandas as pd import numpy as np spark_path = "C:\spark" os.environ['SPARK_HOME'] = spark_path os.environ['HADOOP_HOME'] = spar

Re: Combining Multiple Matrix

2017-05-03 Thread arijit chakraborty
ve listing) is an inexpensive operation. Fred [Inactive hide details for arijit chakraborty ---05/02/2017 09:48:13 AM---Hi, Our process is generating multiple matrix, of same]arijit chakraborty ---05/02/2017 09:48:13 AM---Hi, Our process is generating multiple matrix, of same size, but numb

Combining Multiple Matrix

2017-05-02 Thread arijit chakraborty
Hi, Our process is generating multiple matrix, of same size, but number of matrix is random. Finally, we combine all the smaller matrix to get a final large matrix. The way we are creating the large matrix is, we are creating a blank matrix and then update the matrix once each smaller matrix

Re: Updating A Vector

2017-05-01 Thread arijit chakraborty
you can use our replace builtin function to eliminate NaNs before that. Regards, Matthias On Thu, Apr 27, 2017 at 9:46 AM, arijit chakraborty wrote: > Hi, > > > I've 2 matrix: > > > matrix1 = matrix(1, 10, 1) > > matrix2 = matrix(seq(1,10,1),1,1) > > &g

Re: Randomly Selecting rows from a dataframe

2017-05-01 Thread arijit chakraborty
Empty with selection vector expects a non-zero indicator by position not by value (e.g., non-zero in 7th cell indicates that you want to select the 7th row which ignores the actual value you feed in). Regards, Matthias On Sun, Apr 30, 2017 at 1:47 AM, arijit chakraborty wrote: > Hi, > >

Re: Randomly Selecting rows from a dataframe

2017-04-30 Thread arijit chakraborty
in the data. I also tried the follow "sample.dlm" in "utils" folder, but that also not giving me the answer I'm looking for. We can use the for-loop in this case using "data_sample_matrix" matrix. But want to avoid looping. Can anyone please help? Thank you!

Updating A Vector

2017-04-27 Thread arijit chakraborty
Hi, I've 2 matrix: matrix1 = matrix(1, 10, 1) matrix2 = matrix(seq(1,10,1),1,1) matrix1 value will be updated based on matrix2. E.g. suppose in matrix2, only 2, 3, 4 position has value and rest has 0, then matrix1 value will be updated to, say 2, for position 2,3,4. So the new matrix looks

Re: Vector of Matrix

2017-04-23 Thread arijit chakraborty
don't support structs or complex objects. Regards, Matthias On 4/21/2017 4:17 AM, arijit chakraborty wrote: > Hi, > > > In R (as well as in python), we can store values list within list. Say I've 2 > matrix with different dimensions, > > x <- matrix(1:10,

Re: Randomly Selecting rows from a dataframe

2017-04-22 Thread arijit chakraborty
ax=1) <= 0.01); Xsample = removeEmpty(target=X, margin="rows", select=I); Both should be compiled internally to very similar plans. Regards, Matthias On Fri, Apr 21, 2017 at 1:42 PM, arijit chakraborty wrote: > Hi, > > > Suppose I've a dataframe of 10 variables (X1-X10) a

Randomly Selecting rows from a dataframe

2017-04-21 Thread arijit chakraborty
Hi, Suppose I've a dataframe of 10 variables (X1-X10) and have 1000 rows. Now I want to randomly select rows so that I've a subset of the dataset. Can anyone please help me to solve this problem? I tried the following code: randSample = sample(nrow(dataframe), 200); This gives me a colum

Re: Table

2017-04-21 Thread arijit chakraborty
seq(1,10) for column indexes and hence you get a 1x10 output matrix. Regards, Matthias On 4/21/2017 4:00 AM, arijit chakraborty wrote: > Hi, > > > I was trying to understand what the "table" function does. In the documents, > it says: > > > "Returns the cont

Vector of Matrix

2017-04-21 Thread arijit chakraborty
Hi, In R (as well as in python), we can store values list within list. Say I've 2 matrix with different dimensions, x <- matrix(1:10, ncol=2) y <- matrix(1:5, ncol=1) FinalList <- c(x, y) Is it possible to do such form in systemML? I'm not looking for cbind or rbind. Thank you! Arijit

Table

2017-04-21 Thread arijit chakraborty
Hi, I was trying to understand what the "table" function does. In the documents, it says: "Returns the contingency table of two vectors A and B. The resulting table F consists of max(A) rows and max(B) columns." Suppose I've 2 matrix A and B of this form: A = matrix(1, 1, 10) B = matrix(s

Re: Distinct Item of a column

2017-04-17 Thread arijit chakraborty
mes (http://apache.github.io/incubator-systemml/dml-language-reference.html#frames) that simplifies common data transformation operations such as recoding, dummy coding, binning and handling of missing values. [Inactive hide details for arijit chakraborty ---04/17/2017 08:50:51 AM---Hi, I'm

Re: SystemML query

2017-04-17 Thread arijit chakraborty
= rbind(matrix(1,1,1), (X[1:nrow(X)-1,]!=X[2:nrow(X),])); dX = removeEmpty(target=X, margin="rows", select=I); Regards, Matthias On 4/17/2017 8:40 AM, arijit chakraborty wrote: > > Hi, > > I've an issue regarding finding and removing the duplicate in a column/ row > of a

Distinct Item of a column

2017-04-17 Thread arijit chakraborty
Hi, I'm curious to know what's the advantage of systemML over pyspark? Especially in terms of performance. I tried looking for some reading on it, but hardly could find one. Thank you! Arijit

Re: SystemML query

2017-04-17 Thread arijit chakraborty
oing it? In systemML, we don't have method to find/remove/ count duplicate values. Thank you! Arijit From: Niketan Pansare Sent: Friday, April 14, 2017 9:14 PM To: arijit chakraborty Cc: Matthias Boehm1; Berthold Reinwald Subject: Re: SystemML query Th