Re: Hardware requirements
Assume your block size is 128MB. Thanks Best Regards On Mon, May 4, 2015 at 2:38 PM, ayan guha wrote: > Hi > > How do you figure out 500gig~3900 partitions? I am trying to do the math. > If I assume 64mb block size then 1G~16 blocks and 500g~8000 blocks. If we > assume split and block sizes are same, shouldn't we end up with 8k > partitions? > On 4 May 2015 17:49, "Akhil Das" wrote: > >> 500GB of data will have nearly 3900 partitions and if you can have nearly >> that many number of cores and around 500GB of memory then things will be >> lightening fast. :) >> >> Thanks >> Best Regards >> >> On Sun, May 3, 2015 at 12:49 PM, sherine ahmed < >> sherine.sha...@hotmail.com> wrote: >> >>> I need to use spark to upload a 500 GB data from hadoop on standalone >>> mode >>> cluster what are the minimum hardware requirements if it's known that it >>> will be used for advanced analysis (social network analysis)? >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Hardware-requirements-tp22744.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >>
Re: Hardware requirements
Hi How do you figure out 500gig~3900 partitions? I am trying to do the math. If I assume 64mb block size then 1G~16 blocks and 500g~8000 blocks. If we assume split and block sizes are same, shouldn't we end up with 8k partitions? On 4 May 2015 17:49, "Akhil Das" wrote: > 500GB of data will have nearly 3900 partitions and if you can have nearly > that many number of cores and around 500GB of memory then things will be > lightening fast. :) > > Thanks > Best Regards > > On Sun, May 3, 2015 at 12:49 PM, sherine ahmed > wrote: > >> I need to use spark to upload a 500 GB data from hadoop on standalone mode >> cluster what are the minimum hardware requirements if it's known that it >> will be used for advanced analysis (social network analysis)? >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Hardware-requirements-tp22744.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
Re: Hardware requirements
500GB of data will have nearly 3900 partitions and if you can have nearly that many number of cores and around 500GB of memory then things will be lightening fast. :) Thanks Best Regards On Sun, May 3, 2015 at 12:49 PM, sherine ahmed wrote: > I need to use spark to upload a 500 GB data from hadoop on standalone mode > cluster what are the minimum hardware requirements if it's known that it > will be used for advanced analysis (social network analysis)? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Hardware-requirements-tp22744.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Hardware requirements
I need to use spark to upload a 500 GB data from hadoop on standalone mode cluster what are the minimum hardware requirements if it's known that it will be used for advanced analysis (social network analysis)? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Hardware-requirements-tp22744.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: General Purpose Spark Cluster Hardware Requirements?
No woder I had out of memory issue before… I doubt if we really need such configuration on production level… Best regards, Cui Lin From: Krishna Sankar mailto:ksanka...@gmail.com>> Date: Sunday, March 8, 2015 at 3:27 PM To: Nasir Khan mailto:nasirkhan.onl...@gmail.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" mailto:user@spark.apache.org>> Subject: Re: General Purpose Spark Cluster Hardware Requirements? Without knowing the data size, computation & storage requirements ... : * Dual 6 or 8 core machines, 256 GB memory each, 12-15 TB per machine. Probably 5-10 machines. * Don't go for the most exotic machines, otoh don't go for cheapest ones either. * Find a sweet spot with your vendor i.e. if dual 6 cores are a lot cheaper than dual 10 cores then go with the less expensive ones. Same with disks - may be 2TB is a lot cheaper than 3 TB. * Decide if these are going to be storage intensive or compute intensive (I assume the latter) and configure accordingly * Make sure you can add storage to the machines - ie have free storage bays. * Or other way is to add more machines and buy the smaller speced machines. * Unless one has very firm I/O and compute requirements, I have found that FLOPS, and things of that nature, do not make that much sense. * Think in terms of RAM, CPU and storage - that is what will become the initial limitations. * Once there are enough production jobs, you can then figure out the FLOPS et al * 10 G network is a better choice, so price-in a 24-48 port TOR switch. * More concerned with the bandwidth between the cluster nodes, for shuffles et al Cheers On Sun, Mar 8, 2015 at 2:29 PM, Nasir Khan mailto:nasirkhan.onl...@gmail.com>> wrote: HI, I am going to submit a proposal to my University to setup my Standalone Spark Cluster, what hardware should i include in my proposal? I will be Working on classification (Spark MLlib) of Data streams (Spark Streams) If some body can fill up this answers, that will be great! Thanks *Cores *= (example 64 nodes, 1024 cores, your figures) ? *Performance**= (example= ~5.12TFlops, ~2TFlops, your figures) ___? *GPU*= YES/NO ___? *Fat Node* = YES/NO ___? *CPU Hrs/ Yr* = (example 2000, 8000, your figures) ___? *RAM/CPU* = (example 256GB, your figures) ___? * Storage Processing* = (example 200TB, your figures) ___? *Storage Output* = (example 5TB, 4TB HHD/SSD, your figures) ___? *Most processors today carryout 4 FLOPS per cycle, thus a single-core 2.5 GHz processor has a theoretical performance of 10 billion FLOPS = 10GFLOPS Note:I Need a *general purpose* cluster, not very high end nor very low specs. It will not be dedicated to just one project i guess. You people already have experience in setting up clusters, that's the reason i posted it here :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/General-Purpose-Spark-Cluster-Hardware-Requirements-tp21963.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>
Re: General Purpose Spark Cluster Hardware Requirements?
Without knowing the data size, computation & storage requirements ... : - Dual 6 or 8 core machines, 256 GB memory each, 12-15 TB per machine. Probably 5-10 machines. - Don't go for the most exotic machines, otoh don't go for cheapest ones either. - Find a sweet spot with your vendor i.e. if dual 6 cores are a lot cheaper than dual 10 cores then go with the less expensive ones. Same with disks - may be 2TB is a lot cheaper than 3 TB. - Decide if these are going to be storage intensive or compute intensive (I assume the latter) and configure accordingly - Make sure you can add storage to the machines - ie have free storage bays. - Or other way is to add more machines and buy the smaller speced machines. - Unless one has very firm I/O and compute requirements, I have found that FLOPS, and things of that nature, do not make that much sense. - Think in terms of RAM, CPU and storage - that is what will become the initial limitations. - Once there are enough production jobs, you can then figure out the FLOPS et al - 10 G network is a better choice, so price-in a 24-48 port TOR switch. - More concerned with the bandwidth between the cluster nodes, for shuffles et al Cheers On Sun, Mar 8, 2015 at 2:29 PM, Nasir Khan wrote: > HI, I am going to submit a proposal to my University to setup my Standalone > Spark Cluster, what hardware should i include in my proposal? > > I will be Working on classification (Spark MLlib) of Data streams (Spark > Streams) > > If some body can fill up this answers, that will be great! Thanks > > *Cores *= (example 64 nodes, 1024 cores, your figures) ? > > *Performance**= (example= ~5.12TFlops, ~2TFlops, your figures) ___? > > *GPU*= YES/NO ___? > > *Fat Node* = YES/NO ___? > > *CPU Hrs/ Yr* = (example 2000, 8000, your figures) ___? > > *RAM/CPU* = (example 256GB, your figures) ___? > * > Storage Processing* = (example 200TB, your figures) ___? > > *Storage Output* = (example 5TB, 4TB HHD/SSD, your figures) ___? > > *Most processors today carryout 4 FLOPS per cycle, thus a single-core 2.5 > GHz processor has a theoretical performance of 10 billion FLOPS = 10GFLOPS > > Note:I Need a *general purpose* cluster, not very high end nor very low > specs. It will not be dedicated to just one project i guess. You people > already have experience in setting up clusters, that's the reason i posted > it here :) > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/General-Purpose-Spark-Cluster-Hardware-Requirements-tp21963.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: General Purpose Spark Cluster Hardware Requirements?
Cycling related bits: http://search-hadoop.com/m/LgpTk2DLMvc On Sun, Mar 8, 2015 at 2:29 PM, Nasir Khan wrote: > HI, I am going to submit a proposal to my University to setup my Standalone > Spark Cluster, what hardware should i include in my proposal? > > I will be Working on classification (Spark MLlib) of Data streams (Spark > Streams) > > If some body can fill up this answers, that will be great! Thanks > > *Cores *= (example 64 nodes, 1024 cores, your figures) ? > > *Performance**= (example= ~5.12TFlops, ~2TFlops, your figures) ___? > > *GPU*= YES/NO ___? > > *Fat Node* = YES/NO ___? > > *CPU Hrs/ Yr* = (example 2000, 8000, your figures) ___? > > *RAM/CPU* = (example 256GB, your figures) ___? > * > Storage Processing* = (example 200TB, your figures) ___? > > *Storage Output* = (example 5TB, 4TB HHD/SSD, your figures) ___? > > *Most processors today carryout 4 FLOPS per cycle, thus a single-core 2.5 > GHz processor has a theoretical performance of 10 billion FLOPS = 10GFLOPS > > Note:I Need a *general purpose* cluster, not very high end nor very low > specs. It will not be dedicated to just one project i guess. You people > already have experience in setting up clusters, that's the reason i posted > it here :) > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/General-Purpose-Spark-Cluster-Hardware-Requirements-tp21963.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
General Purpose Spark Cluster Hardware Requirements?
HI, I am going to submit a proposal to my University to setup my Standalone Spark Cluster, what hardware should i include in my proposal? I will be Working on classification (Spark MLlib) of Data streams (Spark Streams) If some body can fill up this answers, that will be great! Thanks *Cores *= (example 64 nodes, 1024 cores, your figures) ? *Performance**= (example= ~5.12TFlops, ~2TFlops, your figures) ___? *GPU*= YES/NO ___? *Fat Node* = YES/NO ___? *CPU Hrs/ Yr* = (example 2000, 8000, your figures) ___? *RAM/CPU* = (example 256GB, your figures) ___? * Storage Processing* = (example 200TB, your figures) ___? *Storage Output* = (example 5TB, 4TB HHD/SSD, your figures) ___? *Most processors today carryout 4 FLOPS per cycle, thus a single-core 2.5 GHz processor has a theoretical performance of 10 billion FLOPS = 10GFLOPS Note:I Need a *general purpose* cluster, not very high end nor very low specs. It will not be dedicated to just one project i guess. You people already have experience in setting up clusters, that's the reason i posted it here :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/General-Purpose-Spark-Cluster-Hardware-Requirements-tp21963.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org