Re: Can I specify watermark using raw sql alone?

2018-07-14 Thread kant kodali
I don't see a withWatermark UDF to use it in Raw sql. I am currently using Spark 2.3.1 On Sat, Jul 14, 2018 at 4:19 PM, kant kodali wrote: > Hi All, > > Can I specify watermark using raw sql alone? other words without using > .withWatermark from > Dataset API. > > Thanks! >

Re: Pyspark access to scala/java libraries

2018-07-14 Thread Mohit Jaggi
Trying again…anyone know how to make this work? > On Jul 9, 2018, at 3:45 PM, Mohit Jaggi wrote: > > Folks, > I am writing some Scala/Java code and want it to be usable from pyspark. > > For example: > class MyStuff(addend: Int) { > def myMapFunction(x: Int) = x + addend > } > > I want

Can I specify watermark using raw sql alone?

2018-07-14 Thread kant kodali
Hi All, Can I specify watermark using raw sql alone? other words without using .withWatermark from Dataset API. Thanks!

how to decide broadcast join data size

2018-07-14 Thread Selvam Raman
Hi, I could not find useful formula or documentation which will help me to decide the broadcast join data size depends on the cluster size. Please let me know is there thumb rule available to find. For example cluster size - 20 node cluster, 32 gb per node and 8 core per node. executor-memory =

Re: Do GraphFrames support streaming?

2018-07-14 Thread Jörn Franke
No, streaming dataframe needs to be written to disk or similar (or an in-memory backend) then when the next stream arrive join them - create graph and store the next stream together with the existing stream on disk etc. > On 14. Jul 2018, at 17:19, kant kodali wrote: > > The question now would

Re: Do GraphFrames support streaming?

2018-07-14 Thread kant kodali
The question now would be can it be done in streaming fashion? Are you talking about the union of two streaming dataframes and then constructing a graphframe (also during streaming) ? On Sat, Jul 14, 2018 at 8:07 AM, Jörn Franke wrote: > For your use case one might indeed be able to work simply

Re: Do GraphFrames support streaming?

2018-07-14 Thread Jörn Franke
For your use case one might indeed be able to work simply with incremental graph updates. However they are not straight forward in Spark. You can union the new Data with the existing dataframes that represent your graph and create from that a new graph frame. However I am not sure if this will

Re: Do GraphFrames support streaming?

2018-07-14 Thread kant kodali
"You want to update incrementally an existing graph and run incrementally a graph algorithm suitable for this - you have to implement yourself as far as I am aware" I want to update the graph incrementally and want to run some graph queries similar to Cypher like give me all the vertices that are

Spark Shortcut

2018-07-14 Thread Deepu Raj
Hi Team, Using Spark 2.3 :paste -raw not working. Do ctrl+D after pasting the code get message //Exiting paste mode, now interpreting. Nothing happens. Please help. Thanks, Deepu