Hi Jeff,
This is only part of the actual code.
My questions are mentioned in comments near the code.
SALES<- SparkR::sql(hiveContext, "select * from sales")
PRICING<- SparkR::sql(hiveContext, "select * from pricing")
## renaming of columns ##
#sales file#
# Is this right ??? Do we have to
>>> Do I need to create a new DataFrame for every update to the DataFrame
like
addition of new column or need to update the original sales DataFrame.
Yes, DataFrame is immutable, and every mutation of DataFrame will produce a
new DataFrame.
On Mon, Nov 23, 2015 at 4:44 PM, Vipul Rai
Hello Rui,
Sorry , What I meant was the resultant of the original dataframe to which a
new column was added gives a new DataFrame.
Please check this for more
https://spark.apache.org/docs/1.5.1/api/R/index.html
Check for
WithColumn
Thanks,
Vipul
On 23 November 2015 at 12:42, Sun, Rui
Hi Zeff,
Thanks for the reply, but could you tell me why is it taking so much time.
What could be wrong , also when I remove the DataFrame from memory using
rm().
It does not clear the memory but the object is deleted.
Also , What about the R functions which are not supported in SparkR.
Like
If possible, could you share your code ? What kind of operation are you
doing on the dataframe ?
On Mon, Nov 23, 2015 at 5:10 PM, Vipul Rai wrote:
> Hi Zeff,
>
> Thanks for the reply, but could you tell me why is it taking so much time.
> What could be wrong , also when
Vipul,
Not sure if I understand your question. DataFrame is immutable. You can't
update a DataFrame.
Could you paste some log info for the OOM error?
-Original Message-
From: vipulrai [mailto:vipulrai8...@gmail.com]
Sent: Friday, November 20, 2015 12:11 PM
To: user@spark.apache.org