Hi Users,

I have a general doubt regarding DataFrames in SparkR.

I am trying to read a file from Hive and it gets created as DataFrame.

sqlContext <- sparkRHive.init(sc)

#DF
sales <- read.df(sqlContext, "hdfs://sample.csv", header ='true', 
                 source = "com.databricks.spark.csv", inferSchema='true')

registerTempTable(sales,"Sales")

Do I need to create a new DataFrame for every update to the DataFrame like
addition of new column or  need to update the original sales DataFrame.

sales1<- SparkR::sql(sqlContext,"Select a.* , 607 as C1 from Sales as a")


Please help me with this , as the orignal file is only 20MB but it throws
out of memory exception on a cluster of 
4GB Master and Two workers of 4GB each.

Also, what is the logic with DataFrame do I need to register and drop
tempTable after every update??

Thanks,
Vipul



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-DataFrame-Out-of-memory-exception-for-very-small-file-tp25435.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to