Automatic Json Schema inference using Structured Streaming

2018-07-05 Thread SRK
Hi, Is there a way that Automatic Json Schema inference can be done using Structured Streaming? I do not want to supply a predefined schema and bind it. With Spark Kafka Direct I could do spark.read.json(). I see that this is not supported in Structured Streaming. Thanks! -- Sent from: http

Re: How to branch a Stream / have multiple Sinks / do multiple Queries on one Stream

2018-07-05 Thread Amiya Mishra
Hi Chandan/Jürgen, I had tried through a native code having single input data frame with multiple sinks as : Spark provides a method called awaitAnyTermination() in StreamingQueryManager.scala which provides all the required details to handle the query processed by spark.By observing documentatio

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-05 Thread Chetan Khatri
Prem sure, Thanks for suggestion. On Wed, Jul 4, 2018 at 8:38 PM, Prem Sure wrote: > try .pipe(.py) on RDD > > Thanks, > Prem > > On Wed, Jul 4, 2018 at 7:59 PM, Chetan Khatri > wrote: > >> Can someone please suggest me , thanks >> >> On Tue 3 Jul, 2018, 5:28 PM Chetan Khatri, >> wrote: >> >>>

Spark 2.3 Kubernetes error

2018-07-05 Thread Mamillapalli, Purna Pradeep
Hello, When I’m trying to set below options to spark-submit command on k8s Master getting below error in spark-driver pod logs --conf spark.executor.extraJavaOptions=" -Dhttps.proxyHost=myhost -Dhttps.proxyPort=8099 -Dhttp.useproxy=true -Dhttps.protocols=TLSv1.2" \ --conf spark.driver.extraJ

Spark 2.3 Kubernetes error

2018-07-05 Thread purna pradeep
Hello, When I’m trying to set below options to spark-submit command on k8s Master getting below error in spark-driver pod logs --conf spark.executor.extraJavaOptions=" -Dhttps.proxyHost=myhost -Dhttps.proxyPort=8099 -Dhttp.useproxy=true -Dhttps.protocols=TLSv1.2" \ --conf spark.driver.extraJ

Strange behavior of Spark Masters during rolling update

2018-07-05 Thread bsikander
We have a Spark standalone cluster running on 2.2.1 in HA mode using Zookeeper. Occasionally, we have a rolling update where first the Primary master goes down and then Secondary node and then zookeeper nodes running on there own VMs. In the image below,

Re: Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-05 Thread Jayant Shekhar
Hello Chetan, We have currently done it with .pipe(.py) as Prem suggested. That passes the RDD as CSV strings to the python script. The python script can either process it line by line, create the result and return it back. Or create things like Pandas Dataframe for processing and finally write t

Fwd: BeakerX 1.0 released

2018-07-05 Thread s...@draves.org
We are pleased to announce the release of BeakerX 1.0 . BeakerX is a collection of kernels and extensions to the Jupyter interactive computing environment. It provides JVM support, Spark cluster support, polyglot programming, interactive plots, tables, forms, publishing, and m

unsubscribe

2018-07-05 Thread Peter
unsubscribe

Re: How to branch a Stream / have multiple Sinks / do multiple Queries on one Stream

2018-07-05 Thread Tathagata Das
Hey all, In Spark 2.4.0, there will be a new feature called *foreachBatch* which will expose the output rows of every micro-batch as a dataframe, on which you apply a user-defined function. With that, you can reuse existing batch sources for writing results as well as write results to multiple loc