Hi Jeff, This is only part of the actual code.
My questions are mentioned in comments near the code. SALES<- SparkR::sql(hiveContext, "select * from sales") PRICING<- SparkR::sql(hiveContext, "select * from pricing") ## renaming of columns ## #sales file# # Is this right ??? Do we have to create a new DF for every column Addition to the original DF. # And if we do that , then what about the older DF , they will also take memory ? names(SALES)[which(names(SALES)=="div_no")]<-"DIV_NO" names(SALES)[which(names(SALES)=="store_no")]<-"STORE_NO" #pricing file# names(PRICING)[which(names(PRICING)=="price_type_cd")]<-"PRICE_TYPE" names(PRICING)[which(names(PRICING)=="price_amt")]<-"PRICE_AMT" registerTempTable(SALES,"sales") registerTempTable(PRICING,"pricing") #merging sales and pricing file# merg_sales_pricing<- SparkR::sql(hiveContext,"select .....................") head(merg_sales_pricing) Thanks, Vipul On 23 November 2015 at 14:52, Jeff Zhang <zjf...@gmail.com> wrote: > If possible, could you share your code ? What kind of operation are you > doing on the dataframe ? > > On Mon, Nov 23, 2015 at 5:10 PM, Vipul Rai <vipulrai8...@gmail.com> wrote: > >> Hi Zeff, >> >> Thanks for the reply, but could you tell me why is it taking so much time. >> What could be wrong , also when I remove the DataFrame from memory using >> rm(). >> It does not clear the memory but the object is deleted. >> >> Also , What about the R functions which are not supported in SparkR. >> Like ddply ?? >> >> How to access the nth ROW of SparkR DataFrame. >> >> Regards, >> Vipul >> >> On 23 November 2015 at 14:25, Jeff Zhang <zjf...@gmail.com> wrote: >> >>> >>> Do I need to create a new DataFrame for every update to the >>> DataFrame like >>> addition of new column or need to update the original sales DataFrame. >>> >>> Yes, DataFrame is immutable, and every mutation of DataFrame will >>> produce a new DataFrame. >>> >>> >>> >>> On Mon, Nov 23, 2015 at 4:44 PM, Vipul Rai <vipulrai8...@gmail.com> >>> wrote: >>> >>>> Hello Rui, >>>> >>>> Sorry , What I meant was the resultant of the original dataframe to >>>> which a new column was added gives a new DataFrame. >>>> >>>> Please check this for more >>>> >>>> https://spark.apache.org/docs/1.5.1/api/R/index.html >>>> >>>> Check for >>>> WithColumn >>>> >>>> >>>> Thanks, >>>> Vipul >>>> >>>> >>>> On 23 November 2015 at 12:42, Sun, Rui <rui....@intel.com> wrote: >>>> >>>>> Vipul, >>>>> >>>>> Not sure if I understand your question. DataFrame is immutable. You >>>>> can't update a DataFrame. >>>>> >>>>> Could you paste some log info for the OOM error? >>>>> >>>>> -----Original Message----- >>>>> From: vipulrai [mailto:vipulrai8...@gmail.com] >>>>> Sent: Friday, November 20, 2015 12:11 PM >>>>> To: user@spark.apache.org >>>>> Subject: SparkR DataFrame , Out of memory exception for very small >>>>> file. >>>>> >>>>> Hi Users, >>>>> >>>>> I have a general doubt regarding DataFrames in SparkR. >>>>> >>>>> I am trying to read a file from Hive and it gets created as DataFrame. >>>>> >>>>> sqlContext <- sparkRHive.init(sc) >>>>> >>>>> #DF >>>>> sales <- read.df(sqlContext, "hdfs://sample.csv", header ='true', >>>>> source = "com.databricks.spark.csv", >>>>> inferSchema='true') >>>>> >>>>> registerTempTable(sales,"Sales") >>>>> >>>>> Do I need to create a new DataFrame for every update to the DataFrame >>>>> like addition of new column or need to update the original sales >>>>> DataFrame. >>>>> >>>>> sales1<- SparkR::sql(sqlContext,"Select a.* , 607 as C1 from Sales as >>>>> a") >>>>> >>>>> >>>>> Please help me with this , as the orignal file is only 20MB but it >>>>> throws out of memory exception on a cluster of 4GB Master and Two workers >>>>> of 4GB each. >>>>> >>>>> Also, what is the logic with DataFrame do I need to register and drop >>>>> tempTable after every update?? >>>>> >>>>> Thanks, >>>>> Vipul >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-DataFrame-Out-of-memory-exception-for-very-small-file-tp25435.html >>>>> Sent from the Apache Spark User List mailing list archive at >>>>> Nabble.com. >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For >>>>> additional commands, e-mail: user-h...@spark.apache.org >>>>> >>>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Vipul Rai >>>> www.vipulrai.me >>>> +91-8892598819 >>>> <http://in.linkedin.com/in/vipulrai/> >>>> >>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >> >> >> >> -- >> Regards, >> Vipul Rai >> www.vipulrai.me >> +91-8892598819 >> <http://in.linkedin.com/in/vipulrai/> >> > > > > -- > Best Regards > > Jeff Zhang > -- Regards, Vipul Rai www.vipulrai.me +91-8892598819 <http://in.linkedin.com/in/vipulrai/>