from:"Ghousia"

Re: unsubscribe

2022-03-02 Thread Ghousia

unsubscribe

On Thu, Mar 3, 2022 at 5:44 AM Basavaraj  wrote:

> unsubscribe

unsubscribe

2023-09-19 Thread Ghousia

unsubscribe

BSP realization on Spark

2014-06-18 Thread Ghousia

Hi,

We are trying to implement a BSP model in Spark with the help of GraphX.
One thing I encountered is a Pregel operator in Graph class. But what I
fail to understand is how the Master and Worker needs to be assigned (BSP),
and how barrier synchronization would happen. The pregel operator provides
a way to define a vertex program, but nothing is mentioned about the
barrier synchronization.

Any help in this regard is truly appreciated.

Many Thanks,
Ghousia.

Fwd: BSP realization on Spark

2014-06-18 Thread Ghousia

-- Forwarded message --
From: Ghousia 
Date: Wed, Jun 18, 2014 at 5:41 PM
Subject: BSP realization on Spark
To: user@spark.apache.org

Hi,

We are trying to implement a BSP model in Spark with the help of GraphX.
One thing I encountered is a Pregel operator in Graph class. But what I
fail to understand is how the Master and Worker needs to be assigned (BSP),
and how barrier synchronization would happen. The pregel operator provides
a way to define a vertex program, but nothing is mentioned about the
barrier synchronization.

Any help in this regard is truly appreciated.

Many Thanks,
Ghousia.

Re: OutOfMemory Error

2014-08-17 Thread Ghousia

Thanks for the answer Akhil. We are right now getting rid of this issue by
increasing the number of partitions. And we are persisting RDDs to
DISK_ONLY. But the issue is with heavy computations within an RDD. It would
be better if we have the option of spilling the intermediate transformation
results to local disk (only in case if memory consumption is high)  . Do we
have any such option available with Spark? If increasing the partitions is
the only the way, then one might end up with OutOfMemory Errors, when
working with certain algorithms where intermediate result is huge.


On Mon, Aug 18, 2014 at 12:02 PM, Akhil Das 
wrote:

> Hi Ghousia,
>
> You can try the following:
>
> 1. Increase the heap size
> <https://spark.apache.org/docs/0.9.0/configuration.html>
> 2. Increase the number of partitions
> <http://stackoverflow.com/questions/21698443/spark-best-practice-for-retrieving-big-data-from-rdd-to-local-machine>
> 3. You could try persisting the RDD to use DISK_ONLY
> <http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence>
>
>
>
> Thanks
> Best Regards
>
>
> On Mon, Aug 18, 2014 at 10:40 AM, Ghousia Taj 
> wrote:
>
>> Hi,
>>
>> I am trying to implement machine learning algorithms on Spark. I am
>> working
>> on a 3 node cluster, with each node having 5GB of memory. Whenever I am
>> working with slightly more number of records, I end up with OutOfMemory
>> Error. Problem is, even if number of records is slightly high, the
>> intermediate result from a transformation is huge and this results in
>> OutOfMemory Error. To overcome this, we are partitioning the data such
>> that
>> each partition has only a few records.
>>
>> Is there any better way to fix this issue. Some thing like spilling the
>> intermediate data to local disk?
>>
>> Thanks,
>> Ghousia.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemory-Error-tp12275.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: OutOfMemory Error

2014-08-18 Thread Ghousia

But this would be applicable only to operations that have a shuffle phase.

This might not be applicable to a simple Map operation where a record is
mapped to a new huge value, resulting in OutOfMemory Error.



On Mon, Aug 18, 2014 at 12:34 PM, Akhil Das 
wrote:

> I believe spark.shuffle.memoryFraction is the one you are looking for.
>
> spark.shuffle.memoryFraction : Fraction of Java heap to use for
> aggregation and cogroups during shuffles, if spark.shuffle.spill is true.
> At any given time, the collective size of all in-memory maps used for
> shuffles is bounded by this limit, beyond which the contents will begin to
> spill to disk. If spills are often, consider increasing this value at the
> expense of spark.storage.memoryFraction.
>
> You can give it a try.
>
>
> Thanks
> Best Regards
>
>
> On Mon, Aug 18, 2014 at 12:21 PM, Ghousia 
> wrote:
>
>> Thanks for the answer Akhil. We are right now getting rid of this issue
>> by increasing the number of partitions. And we are persisting RDDs to
>> DISK_ONLY. But the issue is with heavy computations within an RDD. It would
>> be better if we have the option of spilling the intermediate transformation
>> results to local disk (only in case if memory consumption is high)  . Do we
>> have any such option available with Spark? If increasing the partitions is
>> the only the way, then one might end up with OutOfMemory Errors, when
>> working with certain algorithms where intermediate result is huge.
>>
>>
>> On Mon, Aug 18, 2014 at 12:02 PM, Akhil Das 
>> wrote:
>>
>>> Hi Ghousia,
>>>
>>> You can try the following:
>>>
>>> 1. Increase the heap size
>>> <https://spark.apache.org/docs/0.9.0/configuration.html>
>>> 2. Increase the number of partitions
>>> <http://stackoverflow.com/questions/21698443/spark-best-practice-for-retrieving-big-data-from-rdd-to-local-machine>
>>> 3. You could try persisting the RDD to use DISK_ONLY
>>> <http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence>
>>>
>>>
>>>
>>> Thanks
>>> Best Regards
>>>
>>>
>>> On Mon, Aug 18, 2014 at 10:40 AM, Ghousia Taj 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to implement machine learning algorithms on Spark. I am
>>>> working
>>>> on a 3 node cluster, with each node having 5GB of memory. Whenever I am
>>>> working with slightly more number of records, I end up with OutOfMemory
>>>> Error. Problem is, even if number of records is slightly high, the
>>>> intermediate result from a transformation is huge and this results in
>>>> OutOfMemory Error. To overcome this, we are partitioning the data such
>>>> that
>>>> each partition has only a few records.
>>>>
>>>> Is there any better way to fix this issue. Some thing like spilling the
>>>> intermediate data to local disk?
>>>>
>>>> Thanks,
>>>> Ghousia.
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemory-Error-tp12275.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> -
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>
>

Re: OutOfMemory Error

2014-08-19 Thread Ghousia

Hi,

Any further info on this??

Do you think it would be useful if we have a in memory buffer implemented
that stores the content of the new RDD. In case the buffer reaches a
configured threshold, content of the buffer are spilled to the local disk.
This saves us from OutOfMememory Error.

Appreciate any suggestions in this regard.

Many Thanks,
Ghousia.


On Mon, Aug 18, 2014 at 4:05 PM, Ghousia  wrote:

> But this would be applicable only to operations that have a shuffle phase.
>
> This might not be applicable to a simple Map operation where a record is
> mapped to a new huge value, resulting in OutOfMemory Error.
>
>
>
> On Mon, Aug 18, 2014 at 12:34 PM, Akhil Das 
> wrote:
>
>> I believe spark.shuffle.memoryFraction is the one you are looking for.
>>
>> spark.shuffle.memoryFraction : Fraction of Java heap to use for
>> aggregation and cogroups during shuffles, if spark.shuffle.spill is
>> true. At any given time, the collective size of all in-memory maps used for
>> shuffles is bounded by this limit, beyond which the contents will begin to
>> spill to disk. If spills are often, consider increasing this value at the
>> expense of spark.storage.memoryFraction.
>>
>> You can give it a try.
>>
>>
>> Thanks
>> Best Regards
>>
>>
>> On Mon, Aug 18, 2014 at 12:21 PM, Ghousia 
>> wrote:
>>
>>> Thanks for the answer Akhil. We are right now getting rid of this issue
>>> by increasing the number of partitions. And we are persisting RDDs to
>>> DISK_ONLY. But the issue is with heavy computations within an RDD. It would
>>> be better if we have the option of spilling the intermediate transformation
>>> results to local disk (only in case if memory consumption is high)  . Do we
>>> have any such option available with Spark? If increasing the partitions is
>>> the only the way, then one might end up with OutOfMemory Errors, when
>>> working with certain algorithms where intermediate result is huge.
>>>
>>>
>>> On Mon, Aug 18, 2014 at 12:02 PM, Akhil Das 
>>> wrote:
>>>
>>>> Hi Ghousia,
>>>>
>>>> You can try the following:
>>>>
>>>> 1. Increase the heap size
>>>> <https://spark.apache.org/docs/0.9.0/configuration.html>
>>>> 2. Increase the number of partitions
>>>> <http://stackoverflow.com/questions/21698443/spark-best-practice-for-retrieving-big-data-from-rdd-to-local-machine>
>>>> 3. You could try persisting the RDD to use DISK_ONLY
>>>> <http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence>
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Best Regards
>>>>
>>>>
>>>> On Mon, Aug 18, 2014 at 10:40 AM, Ghousia Taj >>> > wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to implement machine learning algorithms on Spark. I am
>>>>> working
>>>>> on a 3 node cluster, with each node having 5GB of memory. Whenever I am
>>>>> working with slightly more number of records, I end up with OutOfMemory
>>>>> Error. Problem is, even if number of records is slightly high, the
>>>>> intermediate result from a transformation is huge and this results in
>>>>> OutOfMemory Error. To overcome this, we are partitioning the data such
>>>>> that
>>>>> each partition has only a few records.
>>>>>
>>>>> Is there any better way to fix this issue. Some thing like spilling the
>>>>> intermediate data to local disk?
>>>>>
>>>>> Thanks,
>>>>> Ghousia.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemory-Error-tp12275.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> -
>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>
>

Query on Merge Message (Graph: pregel operator)

2014-06-19 Thread Ghousia Taj

Hi,

Can someone please clarify a small query on Graph.pregel operator. As per
the documentation on merge Message function, only two inbound messages  can
be merged to a single value. Is it the actual case, if so how can one merge
n inbound messages .

Any help is truly appreciated.

Many Thanks,
Ghousia.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Query-on-Merge-Message-Graph-pregel-operator-tp7909.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

OutOfMemory Error

2014-08-17 Thread Ghousia Taj

Hi,

I am trying to implement machine learning algorithms on Spark. I am working
on a 3 node cluster, with each node having 5GB of memory. Whenever I am
working with slightly more number of records, I end up with OutOfMemory
Error. Problem is, even if number of records is slightly high, the
intermediate result from a transformation is huge and this results in
OutOfMemory Error. To overcome this, we are partitioning the data such that
each partition has only a few records. 

Is there any better way to fix this issue. Some thing like spilling the
intermediate data to local disk?

Thanks,
Ghousia.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemory-Error-tp12275.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: unsubscribe

unsubscribe

BSP realization on Spark

Fwd: BSP realization on Spark

Re: OutOfMemory Error

Re: OutOfMemory Error

Re: OutOfMemory Error

Query on Merge Message (Graph: pregel operator)

OutOfMemory Error

9 matches

Site Navigation

Mail list logo

Footer information