Hi Users, I have a general doubt regarding DataFrames in SparkR.
I am trying to read a file from Hive and it gets created as DataFrame. sqlContext <- sparkRHive.init(sc) #DF sales <- read.df(sqlContext, "hdfs://sample.csv", header ='true', source = "com.databricks.spark.csv", inferSchema='true') registerTempTable(sales,"Sales") Do I need to create a new DataFrame for every update to the DataFrame like addition of new column or need to update the original sales DataFrame. sales1<- SparkR::sql(sqlContext,"Select a.* , 607 as C1 from Sales as a") Please help me with this , as the orignal file is only 20MB but it throws out of memory exception on a cluster of 4GB Master and Two workers of 4GB each. Also, what is the logic with DataFrame do I need to register and drop tempTable after every update?? Thanks, Vipul -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-DataFrame-Out-of-memory-exception-for-very-small-file-tp25435.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org