Re: SparkR DataFrame , Out of memory exception for very small file.

Jeff Zhang Mon, 23 Nov 2015 01:23:07 -0800

If possible, could you share your code ? What kind of operation are you
doing on the dataframe ?


On Mon, Nov 23, 2015 at 5:10 PM, Vipul Rai <vipulrai8...@gmail.com> wrote:

> Hi Zeff,
>
> Thanks for the reply, but could you tell me why is it taking so much time.
> What could be wrong , also when I remove the DataFrame from memory using
> rm().
> It does not clear the memory but the object is deleted.
>
> Also , What about the R functions which are not supported in SparkR.
> Like ddply ??
>
> How to access the nth ROW of SparkR DataFrame.
>
> Regards,
> Vipul
>
> On 23 November 2015 at 14:25, Jeff Zhang <zjf...@gmail.com> wrote:
>
>> >>> Do I need to create a new DataFrame for every update to the
>> DataFrame like
>> addition of new column or  need to update the original sales DataFrame.
>>
>> Yes, DataFrame is immutable, and every mutation of DataFrame will produce
>> a new DataFrame.
>>
>>
>>
>> On Mon, Nov 23, 2015 at 4:44 PM, Vipul Rai <vipulrai8...@gmail.com>
>> wrote:
>>
>>> Hello Rui,
>>>
>>> Sorry , What I meant was the resultant of the original dataframe to
>>> which a new column was added gives a new DataFrame.
>>>
>>> Please check this for more
>>>
>>> https://spark.apache.org/docs/1.5.1/api/R/index.html
>>>
>>> Check for
>>> WithColumn
>>>
>>>
>>> Thanks,
>>> Vipul
>>>
>>>
>>> On 23 November 2015 at 12:42, Sun, Rui <rui....@intel.com> wrote:
>>>
>>>> Vipul,
>>>>
>>>> Not sure if I understand your question. DataFrame is immutable. You
>>>> can't update a DataFrame.
>>>>
>>>> Could you paste some log info for the OOM error?
>>>>
>>>> -----Original Message-----
>>>> From: vipulrai [mailto:vipulrai8...@gmail.com]
>>>> Sent: Friday, November 20, 2015 12:11 PM
>>>> To: user@spark.apache.org
>>>> Subject: SparkR DataFrame , Out of memory exception for very small file.
>>>>
>>>> Hi Users,
>>>>
>>>> I have a general doubt regarding DataFrames in SparkR.
>>>>
>>>> I am trying to read a file from Hive and it gets created as DataFrame.
>>>>
>>>> sqlContext <- sparkRHive.init(sc)
>>>>
>>>> #DF
>>>> sales <- read.df(sqlContext, "hdfs://sample.csv", header ='true',
>>>>                  source = "com.databricks.spark.csv",
>>>> inferSchema='true')
>>>>
>>>> registerTempTable(sales,"Sales")
>>>>
>>>> Do I need to create a new DataFrame for every update to the DataFrame
>>>> like addition of new column or  need to update the original sales 
>>>> DataFrame.
>>>>
>>>> sales1<- SparkR::sql(sqlContext,"Select a.* , 607 as C1 from Sales as
>>>> a")
>>>>
>>>>
>>>> Please help me with this , as the orignal file is only 20MB but it
>>>> throws out of memory exception on a cluster of 4GB Master and Two workers
>>>> of 4GB each.
>>>>
>>>> Also, what is the logic with DataFrame do I need to register and drop
>>>> tempTable after every update??
>>>>
>>>> Thanks,
>>>> Vipul
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-DataFrame-Out-of-memory-exception-for-very-small-file-tp25435.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For
>>>> additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Vipul Rai
>>> www.vipulrai.me
>>> +91-8892598819
>>> <http://in.linkedin.com/in/vipulrai/>
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
> Regards,
> Vipul Rai
> www.vipulrai.me
> +91-8892598819
> <http://in.linkedin.com/in/vipulrai/>
>



-- 
Best Regards

Jeff Zhang

Re: SparkR DataFrame , Out of memory exception for very small file.

Reply via email to