Re: SparkR DataFrame , Out of memory exception for very small file.

Jeff Zhang Mon, 23 Nov 2015 00:56:34 -0800

>>> Do I need to create a new DataFrame for every update to the DataFrame
like
addition of new column or  need to update the original sales DataFrame.


Yes, DataFrame is immutable, and every mutation of DataFrame will produce a
new DataFrame.



On Mon, Nov 23, 2015 at 4:44 PM, Vipul Rai <vipulrai8...@gmail.com> wrote:

> Hello Rui,
>
> Sorry , What I meant was the resultant of the original dataframe to which
> a new column was added gives a new DataFrame.
>
> Please check this for more
>
> https://spark.apache.org/docs/1.5.1/api/R/index.html
>
> Check for
> WithColumn
>
>
> Thanks,
> Vipul
>
>
> On 23 November 2015 at 12:42, Sun, Rui <rui....@intel.com> wrote:
>
>> Vipul,
>>
>> Not sure if I understand your question. DataFrame is immutable. You can't
>> update a DataFrame.
>>
>> Could you paste some log info for the OOM error?
>>
>> -----Original Message-----
>> From: vipulrai [mailto:vipulrai8...@gmail.com]
>> Sent: Friday, November 20, 2015 12:11 PM
>> To: user@spark.apache.org
>> Subject: SparkR DataFrame , Out of memory exception for very small file.
>>
>> Hi Users,
>>
>> I have a general doubt regarding DataFrames in SparkR.
>>
>> I am trying to read a file from Hive and it gets created as DataFrame.
>>
>> sqlContext <- sparkRHive.init(sc)
>>
>> #DF
>> sales <- read.df(sqlContext, "hdfs://sample.csv", header ='true',
>>                  source = "com.databricks.spark.csv", inferSchema='true')
>>
>> registerTempTable(sales,"Sales")
>>
>> Do I need to create a new DataFrame for every update to the DataFrame
>> like addition of new column or  need to update the original sales DataFrame.
>>
>> sales1<- SparkR::sql(sqlContext,"Select a.* , 607 as C1 from Sales as a")
>>
>>
>> Please help me with this , as the orignal file is only 20MB but it throws
>> out of memory exception on a cluster of 4GB Master and Two workers of 4GB
>> each.
>>
>> Also, what is the logic with DataFrame do I need to register and drop
>> tempTable after every update??
>>
>> Thanks,
>> Vipul
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-DataFrame-Out-of-memory-exception-for-very-small-file-tp25435.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
>> commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>
> --
> Regards,
> Vipul Rai
> www.vipulrai.me
> +91-8892598819
> <http://in.linkedin.com/in/vipulrai/>
>



-- 
Best Regards

Jeff Zhang

Re: SparkR DataFrame , Out of memory exception for very small file.

Reply via email to