Re: how to make a spark cluster ?
I did some performance check on socLiveJournal PageRank b/w my local machine (8 cores, 16 gb ) in local mode and my small cluster (4 nodes, 12 cores, 40 gb) and i found cluster mode is way faster than local mode. So I confused. no. of iterations ---> Local mode(in mins) --> cluster mode(in mins) 1 20 1 231.3 1.2 3 39.5 1.3 556.4 1.6 10 117.26 2.6 with the help of this , I think , might be installing spark cluster on the same machine and instead of giving local[no. of cores] , I'll set to spark://host:7070. Please let me know If I wrong somewhere. On Tue, Apr 21, 2015 at 6:27 PM, Reynold Xin wrote: > Actually if you only have one machine, just use the Spark local mode. > > Just download the Spark tarball, untar it, set master to local[N], where N > = number of cores. You are good to go. There is no setup of job tracker or > Hadoop. > > > On Mon, Apr 20, 2015 at 3:21 PM, haihar nahak > wrote: > >> Thank you :) >> >> On Mon, Apr 20, 2015 at 4:46 PM, Jörn Franke >> wrote: >> >>> Hi, If you have just one physical machine then I would try out Docker >>> instead of a full VM (would be waste of memory and CPU). >>> >>> Best regards >>> Le 20 avr. 2015 00:11, "hnahak" a écrit : >>> >>>> Hi All, >>>> >>>> I've big physical machine with 16 CPUs , 256 GB RAM, 20 TB Hard disk. I >>>> just >>>> need to know what should be the best solution to make a spark cluster? >>>> >>>> If I need to process TBs of data then >>>> 1. Only one machine, which contain driver, executor, job tracker and >>>> task >>>> tracker everything. >>>> 2. create 4 VMs and each VM should consist 4 CPUs , 64 GB RAM >>>> 3. create 8 VMs and each VM should consist 2 CPUs , 32 GB RAM each >>>> >>>> please give me your views/suggestions >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/how-to-make-a-spark-cluster-tp22563.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> - >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >> >> >> -- >> {{{H2N}}}-(@: >> > > -- {{{H2N}}}-(@:
Re: how to make a spark cluster ?
Thank you :) On Mon, Apr 20, 2015 at 4:46 PM, Jörn Franke wrote: > Hi, If you have just one physical machine then I would try out Docker > instead of a full VM (would be waste of memory and CPU). > > Best regards > Le 20 avr. 2015 00:11, "hnahak" a écrit : > >> Hi All, >> >> I've big physical machine with 16 CPUs , 256 GB RAM, 20 TB Hard disk. I >> just >> need to know what should be the best solution to make a spark cluster? >> >> If I need to process TBs of data then >> 1. Only one machine, which contain driver, executor, job tracker and task >> tracker everything. >> 2. create 4 VMs and each VM should consist 4 CPUs , 64 GB RAM >> 3. create 8 VMs and each VM should consist 2 CPUs , 32 GB RAM each >> >> please give me your views/suggestions >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/how-to-make-a-spark-cluster-tp22563.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> -- {{{H2N}}}-(@:
Re: How to send user variables from Spark client to custom InputFormat or RecordReader ?
Thanks. I extract hadoop configuration and set a my arbitrary variable and able to get inside InputFormat from JobContext.configuration On Mon, Feb 23, 2015 at 12:04 PM, Tom Vacek wrote: > The SparkConf doesn't allow you to set arbitrary variables. You can use > SparkContext's HadoopRDD and create a JobConf (with whatever variables you > want), and then grab them out of the JobConf in your RecordReader. > > On Sun, Feb 22, 2015 at 4:28 PM, hnahak wrote: > >> Hi, >> >> I have written custom InputFormat and RecordReader for Spark, I need to >> use >> user variables from spark client program. >> >> I added them in SparkConf >> >> val sparkConf = new >> SparkConf().setAppName(args(0)).set("developer","MyName") >> >> *and in InputFormat class* >> >> protected boolean isSplitable(JobContext context, Path filename) { >> >> >> System.out.println("# Developer " >> + context.getConfiguration().get("developer") ); >> return false; >> } >> >> but its return me *null* , is there any way I can pass user variables to >> my >> custom code? >> >> Thanks !! >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-send-user-variables-from-Spark-client-to-custom-InputFormat-or-RecordReader-tp21755.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > -- {{{H2N}}}-(@:
Re: Posting to the list
I checked it but I didn't see any mail from user list. Let me do it one more time. [image: Inline image 1] --Harihar On Mon, Feb 23, 2015 at 11:50 AM, Ted Yu wrote: > bq. i didnt get any new subscription mail in my inbox. > > Have you checked your Spam folder ? > > Cheers > > On Sun, Feb 22, 2015 at 2:36 PM, hnahak wrote: > >> I'm also facing the same issue, this is third time whenever I post >> anything >> it never accept by the community and at the same time got a failure mail >> in >> my register mail id. >> >> and when click to "subscribe to this mailing list" link, i didnt get any >> new >> subscription mail in my inbox. >> >> Please anyone suggest a best way to subscribed the email ID >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Posting-to-the-list-tp21750p21756.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > -- {{{H2N}}}-(@: