I start the Spark docker with this commands on the host machine curl -LO https://raw.githubusercontent.com/bitnami/bitnami-docker-spark/master/docker-compose.yml # edit and expose port 7077 to host machine $ docker-compose up
then, i run docker run --net=host apache/beam_spark_job_server:latest --spark-master-url=spark://localhost:7077 on the host machine lastly, i run this command on the host machine python -m apache_beam.examples.wordcount --input ./a_file_input \ --output ./counts \ --runner=PortableRunner --job_endpoint=localhost:8099 --environment_type=LOOPBACK I get below errors from the beam_spark_job_server docker container 21/07/28 13:41:56 INFO org.apache.beam.runners.spark.SparkJobInvoker: Invoking job BeamApp-root-0728134156-a85a64c5_019bd04e-aabc-4deb-8b12-8c61f173cd5b 21/07/28 13:41:56 INFO org.apache.beam.runners.jobsubmission.JobInvocation: Starting job invocation BeamApp-root-0728134156-a85a64c5_019bd04e-aabc-4deb-8b12-8c61f173cd5b 21/07/28 13:41:56 INFO org.apache.beam.runners.core.construction.resources.PipelineResources: PipelineOptions.filesToStage was not specified. Defaulting to files from the classpath: will stage 6 files. Enable logging at DEBUG level to see which files will be staged. 21/07/28 13:41:57 INFO org.apache.beam.runners.spark.translation.SparkContextFactory: Creating a brand new Spark Context. 21/07/28 13:41:57 WARN org.apache.spark.util.Utils: Your hostname, cometstrike resolves to a loopback address: 127.0.1.1; using 192.168.<real_ip> instead (on interface wlo1) 21/07/28 13:41:57 WARN org.apache.spark.util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address 21/07/28 13:41:57 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 21/07/28 13:42:57 ERROR org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. 21/07/28 13:42:57 WARN org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend: Application ID is not initialized yet. 21/07/28 13:42:57 WARN org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint: Drop UnregisterApplication(null) because has not yet connected to master 21/07/28 13:42:57 ERROR org.apache.spark.SparkContext: Error initializing SparkContext. java.lang.NullPointerException at org.apache.spark.SparkContext.<init>(SparkContext.scala:560) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at org.apache.beam.runners.spark.translation.SparkContextFactory.createSparkContext(SparkContextFactory.java:101) at org.apache.beam.runners.spark.translation.SparkContextFactory.getSparkContext(SparkContextFactory.java:67) at org.apache.beam.runners.spark.SparkPipelineRunner.run(SparkPipelineRunner.java:118) at org.apache.beam.runners.jobsubmission.JobInvocation.runPipeline(JobInvocation.java:86) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 21/07/28 13:42:57 ERROR org.apache.spark.scheduler.AsyncEventQueue: Listener AppStatusListener threw an exception java.lang.NullPointerException at org.apache.spark.status.AppStatusListener.onApplicationEnd(AppStatusListener.scala:167) at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:57) at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91) at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92) at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92) at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87) at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87) at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302) at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82) 21/07/28 13:42:57 ERROR org.apache.beam.runners.jobsubmission.JobInvocation: Error during job invocation BeamApp-root-0728134156-a85a64c5_019bd04e-aabc-4deb-8b12-8c61f173cd5b. java.lang.NullPointerException at org.apache.spark.SparkContext.<init>(SparkContext.scala:560) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at org.apache.beam.runners.spark.translation.SparkContextFactory.createSparkContext(SparkContextFactory.java:101) at org.apache.beam.runners.spark.translation.SparkContextFactory.getSparkContext(SparkContextFactory.java:67) at org.apache.beam.runners.spark.SparkPipelineRunner.run(SparkPipelineRunner.java:118) at org.apache.beam.runners.jobsubmission.JobInvocation.runPipeline(JobInvocation.java:86) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Error in bitnami spark container spark_1 | 21/07/28 13:46:20 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message. spark_1 | java.io.InvalidClassException: org.apache.spark.deploy.ApplicationDescription; local class incompatible: stream classdesc serialVersionUID = 6543101073799644159, local class serialVersionUID = 1574364215946805297 spark_1 | at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699) spark_1 | at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2003) spark_1 | at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1850) spark_1 | at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2160) spark_1 | at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) spark_1 | at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) spark_1 | at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) spark_1 | at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) spark_1 | at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) spark_1 | at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) spark_1 | at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) spark_1 | at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) spark_1 | at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:109) spark_1 | at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$deserialize$2(NettyRpcEnv.scala:299) spark_1 | at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) spark_1 | at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:352) spark_1 | at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$deserialize$1(NettyRpcEnv.scala:298) spark_1 | at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) spark_1 | at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:298) spark_1 | at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:647) spark_1 | at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:698) spark_1 | at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:690) spark_1 | at org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:255) spark_1 | at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:111) spark_1 | at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) spark_1 | at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) spark_1 | at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) spark_1 | at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) spark_1 | at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) spark_1 | at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) spark_1 | at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) spark_1 | at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) spark_1 | at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) spark_1 | at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) spark_1 | at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) spark_1 | at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) spark_1 | at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) spark_1 | at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) spark_1 | at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) spark_1 | at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) spark_1 | at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) spark_1 | at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) spark_1 | at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) spark_1 | at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) spark_1 | at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) spark_1 | at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) spark_1 | at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) spark_1 | at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) spark_1 | at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) spark_1 | at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) spark_1 | at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) spark_1 | at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) spark_1 | at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) spark_1 | at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) spark_1 | at java.lang.Thread.run(Thread.java:748) Can advice on how to get this working? Any simple example of running python code + beam_spark_job_serve + spark in docker ?
