spark benchmarking
Hi all, What is the most common used tool/product to benchmark spark job?
How to set up a Spark Client node?
I have following hadoop spark cluster nodes configuration: Nodes 1 2 are resourceManager and nameNode respectivly Nodes 3, 4, and 5 each includes nodeManager dataNode Node 7 is Spark-master configured to run yarn-client or yarn-master modes I have tested it and it works fine. Is there any instuctions on how to setup spark client in a cluster mode? I am not sure if I am doing it right. Thanks in advance
Re: Using spark streaming to load data from Kafka to HDFS
why not try https://github.com/linkedin/camus - camus is kafka to HDFS pipeline On Tue, May 5, 2015 at 11:13 PM, Rendy Bambang Junior rendy.b.jun...@gmail.com wrote: Hi all, I am planning to load data from Kafka to HDFS. Is it normal to use spark streaming to load data from Kafka to HDFS? What are concerns on doing this? There are no processing to be done by Spark, only to store data to HDFS from Kafka for storage and for further Spark processing Rendy
Spark + Hue
Hi all Is there any good documentation on how to integrate spark with Hue 3.7.x? Is the only way to install spark Job Server? Thanks in advance for your help
WebUI's Application count doesn't get updated
- HI all, - Application running and completed count does not get updated, it is always zero. I have ran - SparkPi application at least 10 times. please help - - *Workers:* 3 - *Cores:* 24 Total, 0 Used - *Memory:* 43.7 GB Total, 0.0 B Used - *Applications:* 0 Running, 0 Completed - *Drivers:* 0 Running, 0 Completed - *Status:* ALIVE
Re: WebUI's Application count doesn't get updated
Thanks for your reply Andrew. I am running applications directly on the master node. My cluster also contain three worker nodes, all are visible on WebUI. Spark Master at spark://sanjar-local-machine-1:7077 - *URL:* spark://sanjar-local-machine-1:7077 - *Workers:* 3 - *Cores:* 24 Total, 0 Used - *Memory:* 43.7 GB Total, 0.0 B Used - *Applications:* 0 Running, 0 Completed - *Drivers:* 0 Running, 0 Completed - *Status:* ALIVE WorkersIdAddressStateCoresMemory worker-20140603013834-sanjar-local-machine-2-43334 http://sanjar-local-machine-2:8081/sanjar-local-machine-2:43334ALIVE8 (0 Used)14.6 GB (0.0 B Used)worker-20140603015921-sanjar-local-machine-3-51926 http://sanjar-local-machine-3:8081/sanjar-local-machine-3:51926ALIVE8 (0 Used)14.6 GB (0.0 B Used)worker-20140603020250-sanjar-local-machine-4-43167 http://sanjar-local-machine-4:8081/sanjar-local-machine-4:43167ALIVE8 (0 Used)14.6 GB (0.0 B Used) Running ApplicationsIDNameCoresMemory per NodeSubmitted TimeUserState Duration Completed ApplicationsIDNameCoresMemory per NodeSubmitted TimeUserState Duration On Tue, Jun 3, 2014 at 2:33 AM, Andrew Ash and...@andrewash.com wrote: Your applications are probably not connecting to your existing cluster and instead running in local mode. Are you passing the master URL to the SparkPi application? Andrew On Tue, Jun 3, 2014 at 12:30 AM, MrAsanjar . afsan...@gmail.com wrote: - HI all, - Application running and completed count does not get updated, it is always zero. I have ran - SparkPi application at least 10 times. please help - - *Workers:* 3 - *Cores:* 24 Total, 0 Used - *Memory:* 43.7 GB Total, 0.0 B Used - *Applications:* 0 Running, 0 Completed - *Drivers:* 0 Running, 0 Completed - *Status:* ALIVE
Re: WebUI's Application count doesn't get updated
thanks guys, that fixed my problem. As you might have noticed, I am VERY new to spark. Building a spark cluster using LXC has been a challenge. On Tue, Jun 3, 2014 at 2:49 AM, Akhil Das ak...@sigmoidanalytics.com wrote: As Andrew said, your application is running on Standalone mode. You need to pass MASTER=spark://sanjar-local-machine-1:7077 before running your sparkPi example. Thanks Best Regards On Tue, Jun 3, 2014 at 1:12 PM, MrAsanjar . afsan...@gmail.com wrote: Thanks for your reply Andrew. I am running applications directly on the master node. My cluster also contain three worker nodes, all are visible on WebUI. Spark Master at spark://sanjar-local-machine-1:7077 - *URL:* spark://sanjar-local-machine-1:7077 - *Workers:* 3 - *Cores:* 24 Total, 0 Used - *Memory:* 43.7 GB Total, 0.0 B Used - *Applications:* 0 Running, 0 Completed - *Drivers:* 0 Running, 0 Completed - *Status:* ALIVE Workers Id AddressState CoresMemory worker-20140603013834-sanjar-local-machine-2-43334 http://sanjar-local-machine-2:8081/ sanjar-local-machine-2:43334 ALIVE 8 (0 Used)14.6 GB (0.0 B Used) worker-20140603015921-sanjar-local-machine-3-51926 http://sanjar-local-machine-3:8081/ sanjar-local-machine-3:51926 ALIVE8 (0 Used) 14.6 GB (0.0 B Used) worker-20140603020250-sanjar-local-machine-4-43167 http://sanjar-local-machine-4:8081/ sanjar-local-machine-4:43167 ALIVE8 (0 Used) 14.6 GB (0.0 B Used) Running Applications ID NameCores Memory per NodeSubmitted Time UserState Duration Completed Applications ID NameCores Memory per NodeSubmitted Time User State Duration On Tue, Jun 3, 2014 at 2:33 AM, Andrew Ash and...@andrewash.com wrote: Your applications are probably not connecting to your existing cluster and instead running in local mode. Are you passing the master URL to the SparkPi application? Andrew On Tue, Jun 3, 2014 at 12:30 AM, MrAsanjar . afsan...@gmail.com wrote: - HI all, - Application running and completed count does not get updated, it is always zero. I have ran - SparkPi application at least 10 times. please help - - *Workers:* 3 - *Cores:* 24 Total, 0 Used - *Memory:* 43.7 GB Total, 0.0 B Used - *Applications:* 0 Running, 0 Completed - *Drivers:* 0 Running, 0 Completed - *Status:* ALIVE