Re: SparkR DataFrame , Out of memory exception for very small file.

2015-11-23 Thread Vipul Rai
Hi Jeff,

This is only part of the actual code.

My questions are mentioned in comments near the code.

SALES<- SparkR::sql(hiveContext, "select * from sales")
PRICING<- SparkR::sql(hiveContext, "select * from pricing")


## renaming of columns ##
#sales file#

# Is this right ??? Do we have to create a new DF for every column Addition
to the original DF.

# And if we do that , then what about the older DF , they will also take
memory ?

names(SALES)[which(names(SALES)=="div_no")]<-"DIV_NO"
names(SALES)[which(names(SALES)=="store_no")]<-"STORE_NO"

#pricing file#
names(PRICING)[which(names(PRICING)=="price_type_cd")]<-"PRICE_TYPE"
names(PRICING)[which(names(PRICING)=="price_amt")]<-"PRICE_AMT"

registerTempTable(SALES,"sales")
registerTempTable(PRICING,"pricing")

#merging sales and pricing file#
merg_sales_pricing<- SparkR::sql(hiveContext,"select .")

head(merg_sales_pricing)


​Thanks,
Vipul​


On 23 November 2015 at 14:52, Jeff Zhang <zjf...@gmail.com> wrote:

> If possible, could you share your code ? What kind of operation are you
> doing on the dataframe ?
>
> On Mon, Nov 23, 2015 at 5:10 PM, Vipul Rai <vipulrai8...@gmail.com> wrote:
>
>> Hi Zeff,
>>
>> Thanks for the reply, but could you tell me why is it taking so much time.
>> What could be wrong , also when I remove the DataFrame from memory using
>> rm().
>> It does not clear the memory but the object is deleted.
>>
>> Also , What about the R functions which are not supported in SparkR.
>> Like ddply ??
>>
>> How to access the nth ROW of SparkR DataFrame.
>>
>> ​Regards,
>> Vipul​
>>
>> On 23 November 2015 at 14:25, Jeff Zhang <zjf...@gmail.com> wrote:
>>
>>> >>> Do I need to create a new DataFrame for every update to the
>>> DataFrame like
>>> addition of new column or  need to update the original sales DataFrame.
>>>
>>> Yes, DataFrame is immutable, and every mutation of DataFrame will
>>> produce a new DataFrame.
>>>
>>>
>>>
>>> On Mon, Nov 23, 2015 at 4:44 PM, Vipul Rai <vipulrai8...@gmail.com>
>>> wrote:
>>>
>>>> Hello Rui,
>>>>
>>>> Sorry , What I meant was the resultant of the original dataframe to
>>>> which a new column was added gives a new DataFrame.
>>>>
>>>> Please check this for more
>>>>
>>>> https://spark.apache.org/docs/1.5.1/api/R/index.html
>>>>
>>>> Check for
>>>> WithColumn
>>>>
>>>>
>>>> Thanks,
>>>> Vipul
>>>>
>>>>
>>>> On 23 November 2015 at 12:42, Sun, Rui <rui@intel.com> wrote:
>>>>
>>>>> Vipul,
>>>>>
>>>>> Not sure if I understand your question. DataFrame is immutable. You
>>>>> can't update a DataFrame.
>>>>>
>>>>> Could you paste some log info for the OOM error?
>>>>>
>>>>> -Original Message-
>>>>> From: vipulrai [mailto:vipulrai8...@gmail.com]
>>>>> Sent: Friday, November 20, 2015 12:11 PM
>>>>> To: user@spark.apache.org
>>>>> Subject: SparkR DataFrame , Out of memory exception for very small
>>>>> file.
>>>>>
>>>>> Hi Users,
>>>>>
>>>>> I have a general doubt regarding DataFrames in SparkR.
>>>>>
>>>>> I am trying to read a file from Hive and it gets created as DataFrame.
>>>>>
>>>>> sqlContext <- sparkRHive.init(sc)
>>>>>
>>>>> #DF
>>>>> sales <- read.df(sqlContext, "hdfs://sample.csv", header ='true',
>>>>>  source = "com.databricks.spark.csv",
>>>>> inferSchema='true')
>>>>>
>>>>> registerTempTable(sales,"Sales")
>>>>>
>>>>> Do I need to create a new DataFrame for every update to the DataFrame
>>>>> like addition of new column or  need to update the original sales 
>>>>> DataFrame.
>>>>>
>>>>> sales1<- SparkR::sql(sqlContext,"Select a.* , 607 as C1 from Sales as
>>>>> a")
>>>>>
>>>>>
>>>>> Please help me with this , as the orignal file is only 20MB but it
>>>>> throws out of memory exception on a cluster of 4GB Master and Two workers
>>>>> of 4GB each.
>>>>>
>>>>> Also, what is the logic with DataFrame do I need to register and drop
>>>>> tempTable after every update??
>>>>>
>>>>> Thanks,
>>>>> Vipul
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-DataFrame-Out-of-memory-exception-for-very-small-file-tp25435.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> -
>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For
>>>>> additional commands, e-mail: user-h...@spark.apache.org
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Vipul Rai
>>>> www.vipulrai.me
>>>> +91-8892598819
>>>> <http://in.linkedin.com/in/vipulrai/>
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>>
>> --
>> Regards,
>> Vipul Rai
>> www.vipulrai.me
>> +91-8892598819
>> <http://in.linkedin.com/in/vipulrai/>
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Regards,
Vipul Rai
www.vipulrai.me
+91-8892598819
<http://in.linkedin.com/in/vipulrai/>


Re: SparkR DataFrame , Out of memory exception for very small file.

2015-11-23 Thread Jeff Zhang
>>> Do I need to create a new DataFrame for every update to the DataFrame
like
addition of new column or  need to update the original sales DataFrame.

Yes, DataFrame is immutable, and every mutation of DataFrame will produce a
new DataFrame.



On Mon, Nov 23, 2015 at 4:44 PM, Vipul Rai <vipulrai8...@gmail.com> wrote:

> Hello Rui,
>
> Sorry , What I meant was the resultant of the original dataframe to which
> a new column was added gives a new DataFrame.
>
> Please check this for more
>
> https://spark.apache.org/docs/1.5.1/api/R/index.html
>
> Check for
> WithColumn
>
>
> Thanks,
> Vipul
>
>
> On 23 November 2015 at 12:42, Sun, Rui <rui@intel.com> wrote:
>
>> Vipul,
>>
>> Not sure if I understand your question. DataFrame is immutable. You can't
>> update a DataFrame.
>>
>> Could you paste some log info for the OOM error?
>>
>> -Original Message-
>> From: vipulrai [mailto:vipulrai8...@gmail.com]
>> Sent: Friday, November 20, 2015 12:11 PM
>> To: user@spark.apache.org
>> Subject: SparkR DataFrame , Out of memory exception for very small file.
>>
>> Hi Users,
>>
>> I have a general doubt regarding DataFrames in SparkR.
>>
>> I am trying to read a file from Hive and it gets created as DataFrame.
>>
>> sqlContext <- sparkRHive.init(sc)
>>
>> #DF
>> sales <- read.df(sqlContext, "hdfs://sample.csv", header ='true',
>>  source = "com.databricks.spark.csv", inferSchema='true')
>>
>> registerTempTable(sales,"Sales")
>>
>> Do I need to create a new DataFrame for every update to the DataFrame
>> like addition of new column or  need to update the original sales DataFrame.
>>
>> sales1<- SparkR::sql(sqlContext,"Select a.* , 607 as C1 from Sales as a")
>>
>>
>> Please help me with this , as the orignal file is only 20MB but it throws
>> out of memory exception on a cluster of 4GB Master and Two workers of 4GB
>> each.
>>
>> Also, what is the logic with DataFrame do I need to register and drop
>> tempTable after every update??
>>
>> Thanks,
>> Vipul
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-DataFrame-Out-of-memory-exception-for-very-small-file-tp25435.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
>> commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>
> --
> Regards,
> Vipul Rai
> www.vipulrai.me
> +91-8892598819
> <http://in.linkedin.com/in/vipulrai/>
>



-- 
Best Regards

Jeff Zhang


Re: SparkR DataFrame , Out of memory exception for very small file.

2015-11-23 Thread Vipul Rai
Hello Rui,

Sorry , What I meant was the resultant of the original dataframe to which a
new column was added gives a new DataFrame.

Please check this for more

https://spark.apache.org/docs/1.5.1/api/R/index.html

Check for
WithColumn


Thanks,
Vipul


On 23 November 2015 at 12:42, Sun, Rui <rui@intel.com> wrote:

> Vipul,
>
> Not sure if I understand your question. DataFrame is immutable. You can't
> update a DataFrame.
>
> Could you paste some log info for the OOM error?
>
> -Original Message-
> From: vipulrai [mailto:vipulrai8...@gmail.com]
> Sent: Friday, November 20, 2015 12:11 PM
> To: user@spark.apache.org
> Subject: SparkR DataFrame , Out of memory exception for very small file.
>
> Hi Users,
>
> I have a general doubt regarding DataFrames in SparkR.
>
> I am trying to read a file from Hive and it gets created as DataFrame.
>
> sqlContext <- sparkRHive.init(sc)
>
> #DF
> sales <- read.df(sqlContext, "hdfs://sample.csv", header ='true',
>  source = "com.databricks.spark.csv", inferSchema='true')
>
> registerTempTable(sales,"Sales")
>
> Do I need to create a new DataFrame for every update to the DataFrame like
> addition of new column or  need to update the original sales DataFrame.
>
> sales1<- SparkR::sql(sqlContext,"Select a.* , 607 as C1 from Sales as a")
>
>
> Please help me with this , as the orignal file is only 20MB but it throws
> out of memory exception on a cluster of 4GB Master and Two workers of 4GB
> each.
>
> Also, what is the logic with DataFrame do I need to register and drop
> tempTable after every update??
>
> Thanks,
> Vipul
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-DataFrame-Out-of-memory-exception-for-very-small-file-tp25435.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
> commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Regards,
Vipul Rai
www.vipulrai.me
+91-8892598819
<http://in.linkedin.com/in/vipulrai/>


Re: SparkR DataFrame , Out of memory exception for very small file.

2015-11-23 Thread Vipul Rai
Hi Zeff,

Thanks for the reply, but could you tell me why is it taking so much time.
What could be wrong , also when I remove the DataFrame from memory using
rm().
It does not clear the memory but the object is deleted.

Also , What about the R functions which are not supported in SparkR.
Like ddply ??

How to access the nth ROW of SparkR DataFrame.

​Regards,
Vipul​

On 23 November 2015 at 14:25, Jeff Zhang <zjf...@gmail.com> wrote:

> >>> Do I need to create a new DataFrame for every update to the DataFrame
> like
> addition of new column or  need to update the original sales DataFrame.
>
> Yes, DataFrame is immutable, and every mutation of DataFrame will produce
> a new DataFrame.
>
>
>
> On Mon, Nov 23, 2015 at 4:44 PM, Vipul Rai <vipulrai8...@gmail.com> wrote:
>
>> Hello Rui,
>>
>> Sorry , What I meant was the resultant of the original dataframe to which
>> a new column was added gives a new DataFrame.
>>
>> Please check this for more
>>
>> https://spark.apache.org/docs/1.5.1/api/R/index.html
>>
>> Check for
>> WithColumn
>>
>>
>> Thanks,
>> Vipul
>>
>>
>> On 23 November 2015 at 12:42, Sun, Rui <rui@intel.com> wrote:
>>
>>> Vipul,
>>>
>>> Not sure if I understand your question. DataFrame is immutable. You
>>> can't update a DataFrame.
>>>
>>> Could you paste some log info for the OOM error?
>>>
>>> -Original Message-
>>> From: vipulrai [mailto:vipulrai8...@gmail.com]
>>> Sent: Friday, November 20, 2015 12:11 PM
>>> To: user@spark.apache.org
>>> Subject: SparkR DataFrame , Out of memory exception for very small file.
>>>
>>> Hi Users,
>>>
>>> I have a general doubt regarding DataFrames in SparkR.
>>>
>>> I am trying to read a file from Hive and it gets created as DataFrame.
>>>
>>> sqlContext <- sparkRHive.init(sc)
>>>
>>> #DF
>>> sales <- read.df(sqlContext, "hdfs://sample.csv", header ='true',
>>>  source = "com.databricks.spark.csv", inferSchema='true')
>>>
>>> registerTempTable(sales,"Sales")
>>>
>>> Do I need to create a new DataFrame for every update to the DataFrame
>>> like addition of new column or  need to update the original sales DataFrame.
>>>
>>> sales1<- SparkR::sql(sqlContext,"Select a.* , 607 as C1 from Sales as a")
>>>
>>>
>>> Please help me with this , as the orignal file is only 20MB but it
>>> throws out of memory exception on a cluster of 4GB Master and Two workers
>>> of 4GB each.
>>>
>>> Also, what is the logic with DataFrame do I need to register and drop
>>> tempTable after every update??
>>>
>>> Thanks,
>>> Vipul
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-DataFrame-Out-of-memory-exception-for-very-small-file-tp25435.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For
>>> additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>>
>> --
>> Regards,
>> Vipul Rai
>> www.vipulrai.me
>> +91-8892598819
>> <http://in.linkedin.com/in/vipulrai/>
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Regards,
Vipul Rai
www.vipulrai.me
+91-8892598819
<http://in.linkedin.com/in/vipulrai/>


Re: SparkR DataFrame , Out of memory exception for very small file.

2015-11-23 Thread Jeff Zhang
If possible, could you share your code ? What kind of operation are you
doing on the dataframe ?

On Mon, Nov 23, 2015 at 5:10 PM, Vipul Rai <vipulrai8...@gmail.com> wrote:

> Hi Zeff,
>
> Thanks for the reply, but could you tell me why is it taking so much time.
> What could be wrong , also when I remove the DataFrame from memory using
> rm().
> It does not clear the memory but the object is deleted.
>
> Also , What about the R functions which are not supported in SparkR.
> Like ddply ??
>
> How to access the nth ROW of SparkR DataFrame.
>
> ​Regards,
> Vipul​
>
> On 23 November 2015 at 14:25, Jeff Zhang <zjf...@gmail.com> wrote:
>
>> >>> Do I need to create a new DataFrame for every update to the
>> DataFrame like
>> addition of new column or  need to update the original sales DataFrame.
>>
>> Yes, DataFrame is immutable, and every mutation of DataFrame will produce
>> a new DataFrame.
>>
>>
>>
>> On Mon, Nov 23, 2015 at 4:44 PM, Vipul Rai <vipulrai8...@gmail.com>
>> wrote:
>>
>>> Hello Rui,
>>>
>>> Sorry , What I meant was the resultant of the original dataframe to
>>> which a new column was added gives a new DataFrame.
>>>
>>> Please check this for more
>>>
>>> https://spark.apache.org/docs/1.5.1/api/R/index.html
>>>
>>> Check for
>>> WithColumn
>>>
>>>
>>> Thanks,
>>> Vipul
>>>
>>>
>>> On 23 November 2015 at 12:42, Sun, Rui <rui@intel.com> wrote:
>>>
>>>> Vipul,
>>>>
>>>> Not sure if I understand your question. DataFrame is immutable. You
>>>> can't update a DataFrame.
>>>>
>>>> Could you paste some log info for the OOM error?
>>>>
>>>> -Original Message-
>>>> From: vipulrai [mailto:vipulrai8...@gmail.com]
>>>> Sent: Friday, November 20, 2015 12:11 PM
>>>> To: user@spark.apache.org
>>>> Subject: SparkR DataFrame , Out of memory exception for very small file.
>>>>
>>>> Hi Users,
>>>>
>>>> I have a general doubt regarding DataFrames in SparkR.
>>>>
>>>> I am trying to read a file from Hive and it gets created as DataFrame.
>>>>
>>>> sqlContext <- sparkRHive.init(sc)
>>>>
>>>> #DF
>>>> sales <- read.df(sqlContext, "hdfs://sample.csv", header ='true',
>>>>  source = "com.databricks.spark.csv",
>>>> inferSchema='true')
>>>>
>>>> registerTempTable(sales,"Sales")
>>>>
>>>> Do I need to create a new DataFrame for every update to the DataFrame
>>>> like addition of new column or  need to update the original sales 
>>>> DataFrame.
>>>>
>>>> sales1<- SparkR::sql(sqlContext,"Select a.* , 607 as C1 from Sales as
>>>> a")
>>>>
>>>>
>>>> Please help me with this , as the orignal file is only 20MB but it
>>>> throws out of memory exception on a cluster of 4GB Master and Two workers
>>>> of 4GB each.
>>>>
>>>> Also, what is the logic with DataFrame do I need to register and drop
>>>> tempTable after every update??
>>>>
>>>> Thanks,
>>>> Vipul
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-DataFrame-Out-of-memory-exception-for-very-small-file-tp25435.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> -
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For
>>>> additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Vipul Rai
>>> www.vipulrai.me
>>> +91-8892598819
>>> <http://in.linkedin.com/in/vipulrai/>
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
> Regards,
> Vipul Rai
> www.vipulrai.me
> +91-8892598819
> <http://in.linkedin.com/in/vipulrai/>
>



-- 
Best Regards

Jeff Zhang


RE: SparkR DataFrame , Out of memory exception for very small file.

2015-11-22 Thread Sun, Rui
Vipul,

Not sure if I understand your question. DataFrame is immutable. You can't 
update a DataFrame.

Could you paste some log info for the OOM error?

-Original Message-
From: vipulrai [mailto:vipulrai8...@gmail.com] 
Sent: Friday, November 20, 2015 12:11 PM
To: user@spark.apache.org
Subject: SparkR DataFrame , Out of memory exception for very small file.

Hi Users,

I have a general doubt regarding DataFrames in SparkR.

I am trying to read a file from Hive and it gets created as DataFrame.

sqlContext <- sparkRHive.init(sc)

#DF
sales <- read.df(sqlContext, "hdfs://sample.csv", header ='true', 
 source = "com.databricks.spark.csv", inferSchema='true')

registerTempTable(sales,"Sales")

Do I need to create a new DataFrame for every update to the DataFrame like 
addition of new column or  need to update the original sales DataFrame.

sales1<- SparkR::sql(sqlContext,"Select a.* , 607 as C1 from Sales as a")


Please help me with this , as the orignal file is only 20MB but it throws out 
of memory exception on a cluster of 4GB Master and Two workers of 4GB each.

Also, what is the logic with DataFrame do I need to register and drop tempTable 
after every update??

Thanks,
Vipul



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-DataFrame-Out-of-memory-exception-for-very-small-file-tp25435.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org