If possible, could you share your code ? What kind of operation are you doing on the dataframe ?
On Mon, Nov 23, 2015 at 5:10 PM, Vipul Rai <vipulrai8...@gmail.com> wrote: > Hi Zeff, > > Thanks for the reply, but could you tell me why is it taking so much time. > What could be wrong , also when I remove the DataFrame from memory using > rm(). > It does not clear the memory but the object is deleted. > > Also , What about the R functions which are not supported in SparkR. > Like ddply ?? > > How to access the nth ROW of SparkR DataFrame. > > ​Regards, > Vipul​ > > On 23 November 2015 at 14:25, Jeff Zhang <zjf...@gmail.com> wrote: > >> >>> Do I need to create a new DataFrame for every update to the >> DataFrame like >> addition of new column or need to update the original sales DataFrame. >> >> Yes, DataFrame is immutable, and every mutation of DataFrame will produce >> a new DataFrame. >> >> >> >> On Mon, Nov 23, 2015 at 4:44 PM, Vipul Rai <vipulrai8...@gmail.com> >> wrote: >> >>> Hello Rui, >>> >>> Sorry , What I meant was the resultant of the original dataframe to >>> which a new column was added gives a new DataFrame. >>> >>> Please check this for more >>> >>> https://spark.apache.org/docs/1.5.1/api/R/index.html >>> >>> Check for >>> WithColumn >>> >>> >>> Thanks, >>> Vipul >>> >>> >>> On 23 November 2015 at 12:42, Sun, Rui <rui....@intel.com> wrote: >>> >>>> Vipul, >>>> >>>> Not sure if I understand your question. DataFrame is immutable. You >>>> can't update a DataFrame. >>>> >>>> Could you paste some log info for the OOM error? >>>> >>>> -----Original Message----- >>>> From: vipulrai [mailto:vipulrai8...@gmail.com] >>>> Sent: Friday, November 20, 2015 12:11 PM >>>> To: user@spark.apache.org >>>> Subject: SparkR DataFrame , Out of memory exception for very small file. >>>> >>>> Hi Users, >>>> >>>> I have a general doubt regarding DataFrames in SparkR. >>>> >>>> I am trying to read a file from Hive and it gets created as DataFrame. >>>> >>>> sqlContext <- sparkRHive.init(sc) >>>> >>>> #DF >>>> sales <- read.df(sqlContext, "hdfs://sample.csv", header ='true', >>>> source = "com.databricks.spark.csv", >>>> inferSchema='true') >>>> >>>> registerTempTable(sales,"Sales") >>>> >>>> Do I need to create a new DataFrame for every update to the DataFrame >>>> like addition of new column or need to update the original sales >>>> DataFrame. >>>> >>>> sales1<- SparkR::sql(sqlContext,"Select a.* , 607 as C1 from Sales as >>>> a") >>>> >>>> >>>> Please help me with this , as the orignal file is only 20MB but it >>>> throws out of memory exception on a cluster of 4GB Master and Two workers >>>> of 4GB each. >>>> >>>> Also, what is the logic with DataFrame do I need to register and drop >>>> tempTable after every update?? >>>> >>>> Thanks, >>>> Vipul >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-DataFrame-Out-of-memory-exception-for-very-small-file-tp25435.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For >>>> additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >>> >>> -- >>> Regards, >>> Vipul Rai >>> www.vipulrai.me >>> +91-8892598819 >>> <http://in.linkedin.com/in/vipulrai/> >>> >> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > > > > -- > Regards, > Vipul Rai > www.vipulrai.me > +91-8892598819 > <http://in.linkedin.com/in/vipulrai/> > -- Best Regards Jeff Zhang