Re: Hardware requirements

2015-05-04 Thread Akhil Das
Assume your block size is 128MB.

Thanks
Best Regards

On Mon, May 4, 2015 at 2:38 PM, ayan guha  wrote:

> Hi
>
> How do you figure out 500gig~3900 partitions? I am trying to do the math.
> If I assume 64mb block size then 1G~16 blocks and 500g~8000 blocks. If we
> assume split and block sizes are same, shouldn't we end up with 8k
> partitions?
> On 4 May 2015 17:49, "Akhil Das"  wrote:
>
>> 500GB of data will have nearly 3900 partitions and if you can have nearly
>> that many number of cores and around 500GB of memory then things will be
>> lightening fast. :)
>>
>> Thanks
>> Best Regards
>>
>> On Sun, May 3, 2015 at 12:49 PM, sherine ahmed <
>> sherine.sha...@hotmail.com> wrote:
>>
>>> I need to use spark to upload a 500 GB data from hadoop on standalone
>>> mode
>>> cluster what are the minimum hardware requirements if it's known that it
>>> will be used for advanced analysis (social network analysis)?
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Hardware-requirements-tp22744.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>


Re: Hardware requirements

2015-05-04 Thread ayan guha
Hi

How do you figure out 500gig~3900 partitions? I am trying to do the math.
If I assume 64mb block size then 1G~16 blocks and 500g~8000 blocks. If we
assume split and block sizes are same, shouldn't we end up with 8k
partitions?
On 4 May 2015 17:49, "Akhil Das"  wrote:

> 500GB of data will have nearly 3900 partitions and if you can have nearly
> that many number of cores and around 500GB of memory then things will be
> lightening fast. :)
>
> Thanks
> Best Regards
>
> On Sun, May 3, 2015 at 12:49 PM, sherine ahmed  > wrote:
>
>> I need to use spark to upload a 500 GB data from hadoop on standalone mode
>> cluster what are the minimum hardware requirements if it's known that it
>> will be used for advanced analysis (social network analysis)?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Hardware-requirements-tp22744.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Hardware requirements

2015-05-04 Thread Akhil Das
500GB of data will have nearly 3900 partitions and if you can have nearly
that many number of cores and around 500GB of memory then things will be
lightening fast. :)

Thanks
Best Regards

On Sun, May 3, 2015 at 12:49 PM, sherine ahmed 
wrote:

> I need to use spark to upload a 500 GB data from hadoop on standalone mode
> cluster what are the minimum hardware requirements if it's known that it
> will be used for advanced analysis (social network analysis)?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Hardware-requirements-tp22744.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Hardware requirements

2015-05-03 Thread sherine ahmed
I need to use spark to upload a 500 GB data from hadoop on standalone mode
cluster what are the minimum hardware requirements if it's known that it
will be used for advanced analysis (social network analysis)?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Hardware-requirements-tp22744.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: General Purpose Spark Cluster Hardware Requirements?

2015-03-08 Thread Cui Lin
No woder I had out of memory issue before…

I doubt if we really need such configuration on production level…

Best regards,

Cui Lin

From: Krishna Sankar mailto:ksanka...@gmail.com>>
Date: Sunday, March 8, 2015 at 3:27 PM
To: Nasir Khan mailto:nasirkhan.onl...@gmail.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" 
mailto:user@spark.apache.org>>
Subject: Re: General Purpose Spark Cluster Hardware Requirements?

Without knowing the data size, computation & storage requirements ... :

  *   Dual 6 or 8 core machines, 256 GB memory each, 12-15 TB per machine. 
Probably 5-10 machines.
  *   Don't go for the most exotic machines, otoh don't go for cheapest ones 
either.
 *   Find a sweet spot with your vendor i.e. if dual 6 cores are a lot 
cheaper than dual 10 cores then go with the less expensive ones. Same with 
disks - may be 2TB is a lot cheaper than 3 TB.
  *   Decide if these are going to be storage intensive or compute intensive (I 
assume the latter) and configure accordingly
  *   Make sure you can add storage to the machines - ie have free storage bays.
 *   Or other way is to add more machines and buy the smaller speced 
machines.
  *   Unless one has very firm I/O and compute requirements, I have found that 
FLOPS, and things of that nature, do not make that much sense.
 *   Think in terms of RAM, CPU and storage - that is what will become the 
initial limitations.
 *   Once there are enough production jobs, you can then figure out the 
FLOPS et al
  *   10 G network is a better choice, so price-in a 24-48 port TOR switch.
 *   More concerned with the bandwidth between the cluster nodes, for 
shuffles et al

Cheers


On Sun, Mar 8, 2015 at 2:29 PM, Nasir Khan 
mailto:nasirkhan.onl...@gmail.com>> wrote:
HI, I am going to submit a proposal to my University to setup my Standalone
Spark Cluster, what hardware should i include in my proposal?

I will be Working on classification (Spark MLlib) of Data streams (Spark
Streams)

If some body can fill up this answers, that will be great! Thanks

*Cores *= (example 64 nodes, 1024 cores, your figures) ?

*Performance**= (example= ~5.12TFlops, ~2TFlops, your figures) ___?

*GPU*= YES/NO ___?

*Fat Node* = YES/NO ___?

*CPU Hrs/ Yr* = (example 2000, 8000, your figures) ___?

*RAM/CPU* = (example 256GB, your figures) ___?
*
Storage Processing* = (example 200TB, your figures) ___?

*Storage Output* = (example 5TB, 4TB HHD/SSD, your figures) ___?

*Most processors today carryout 4 FLOPS per cycle,  thus a single-core 2.5
GHz processor has a theoretical performance of 10 billion FLOPS = 10GFLOPS

Note:I Need a *general purpose* cluster, not very high end nor very low
specs. It will not be dedicated to just one project i guess. You people
already have experience in setting up clusters, that's the reason i posted
it here :)





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/General-Purpose-Spark-Cluster-Hardware-Requirements-tp21963.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>




Re: General Purpose Spark Cluster Hardware Requirements?

2015-03-08 Thread Krishna Sankar
Without knowing the data size, computation & storage requirements ... :

   - Dual 6 or 8 core machines, 256 GB memory each, 12-15 TB per machine.
   Probably 5-10 machines.
   - Don't go for the most exotic machines, otoh don't go for cheapest ones
   either.
  - Find a sweet spot with your vendor i.e. if dual 6 cores are a lot
  cheaper than dual 10 cores then go with the less expensive ones.
Same with
  disks - may be 2TB is a lot cheaper than 3 TB.
   - Decide if these are going to be storage intensive or compute intensive
   (I assume the latter) and configure accordingly
   - Make sure you can add storage to the machines - ie have free storage
   bays.
  - Or other way is to add more machines and buy the smaller speced
  machines.
   - Unless one has very firm I/O and compute requirements, I have found
   that FLOPS, and things of that nature, do not make that much sense.
  - Think in terms of RAM, CPU and storage - that is what will become
  the initial limitations.
  - Once there are enough production jobs, you can then figure out the
  FLOPS et al
   - 10 G network is a better choice, so price-in a 24-48 port TOR switch.
  - More concerned with the bandwidth between the cluster nodes, for
  shuffles et al

Cheers


On Sun, Mar 8, 2015 at 2:29 PM, Nasir Khan 
wrote:

> HI, I am going to submit a proposal to my University to setup my Standalone
> Spark Cluster, what hardware should i include in my proposal?
>
> I will be Working on classification (Spark MLlib) of Data streams (Spark
> Streams)
>
> If some body can fill up this answers, that will be great! Thanks
>
> *Cores *= (example 64 nodes, 1024 cores, your figures) ?
>
> *Performance**= (example= ~5.12TFlops, ~2TFlops, your figures) ___?
>
> *GPU*= YES/NO ___?
>
> *Fat Node* = YES/NO ___?
>
> *CPU Hrs/ Yr* = (example 2000, 8000, your figures) ___?
>
> *RAM/CPU* = (example 256GB, your figures) ___?
> *
> Storage Processing* = (example 200TB, your figures) ___?
>
> *Storage Output* = (example 5TB, 4TB HHD/SSD, your figures) ___?
>
> *Most processors today carryout 4 FLOPS per cycle,  thus a single-core 2.5
> GHz processor has a theoretical performance of 10 billion FLOPS = 10GFLOPS
>
> Note:I Need a *general purpose* cluster, not very high end nor very low
> specs. It will not be dedicated to just one project i guess. You people
> already have experience in setting up clusters, that's the reason i posted
> it here :)
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/General-Purpose-Spark-Cluster-Hardware-Requirements-tp21963.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: General Purpose Spark Cluster Hardware Requirements?

2015-03-08 Thread Ted Yu
Cycling related bits:
http://search-hadoop.com/m/LgpTk2DLMvc

On Sun, Mar 8, 2015 at 2:29 PM, Nasir Khan 
wrote:

> HI, I am going to submit a proposal to my University to setup my Standalone
> Spark Cluster, what hardware should i include in my proposal?
>
> I will be Working on classification (Spark MLlib) of Data streams (Spark
> Streams)
>
> If some body can fill up this answers, that will be great! Thanks
>
> *Cores *= (example 64 nodes, 1024 cores, your figures) ?
>
> *Performance**= (example= ~5.12TFlops, ~2TFlops, your figures) ___?
>
> *GPU*= YES/NO ___?
>
> *Fat Node* = YES/NO ___?
>
> *CPU Hrs/ Yr* = (example 2000, 8000, your figures) ___?
>
> *RAM/CPU* = (example 256GB, your figures) ___?
> *
> Storage Processing* = (example 200TB, your figures) ___?
>
> *Storage Output* = (example 5TB, 4TB HHD/SSD, your figures) ___?
>
> *Most processors today carryout 4 FLOPS per cycle,  thus a single-core 2.5
> GHz processor has a theoretical performance of 10 billion FLOPS = 10GFLOPS
>
> Note:I Need a *general purpose* cluster, not very high end nor very low
> specs. It will not be dedicated to just one project i guess. You people
> already have experience in setting up clusters, that's the reason i posted
> it here :)
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/General-Purpose-Spark-Cluster-Hardware-Requirements-tp21963.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


General Purpose Spark Cluster Hardware Requirements?

2015-03-08 Thread Nasir Khan
HI, I am going to submit a proposal to my University to setup my Standalone
Spark Cluster, what hardware should i include in my proposal?

I will be Working on classification (Spark MLlib) of Data streams (Spark
Streams)

If some body can fill up this answers, that will be great! Thanks

*Cores *= (example 64 nodes, 1024 cores, your figures) ?

*Performance**= (example= ~5.12TFlops, ~2TFlops, your figures) ___?

*GPU*= YES/NO ___? 

*Fat Node* = YES/NO ___?

*CPU Hrs/ Yr* = (example 2000, 8000, your figures) ___?

*RAM/CPU* = (example 256GB, your figures) ___?
*
Storage Processing* = (example 200TB, your figures) ___?

*Storage Output* = (example 5TB, 4TB HHD/SSD, your figures) ___?

*Most processors today carryout 4 FLOPS per cycle,  thus a single-core 2.5
GHz processor has a theoretical performance of 10 billion FLOPS = 10GFLOPS

Note:I Need a *general purpose* cluster, not very high end nor very low
specs. It will not be dedicated to just one project i guess. You people
already have experience in setting up clusters, that's the reason i posted
it here :)





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/General-Purpose-Spark-Cluster-Hardware-Requirements-tp21963.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org