Structured Streaming: Row differences, e.g., with Window and lag()

2017-07-19 Thread Karamba
Hi, I am looking for approaches to compare a row with the next one to determine, e.g., differences in event-times/timestamps. I just found a couple of solutions that use Window class, but that does not seem to work on streaming data, such as

Assembly for Kafka >= 0.10.0, Spark 2.2.0, Scala 2.11

2017-01-18 Thread Karamba
endency: org.apache.spark#spark-streaming-kafka-0-10_2.11_2.11;2.1.0: not found Where do I find that a library? Thanks and best regards, karamba PS: Does anybody know when python support becomes available in spark-streaming-kafka-0-10 and w

Re: [Spark 2.0.2 HDFS]: no data locality

2016-12-28 Thread Karamba
ips, it depends on your > networking setup. You might want to try host networking so that the > containers share the ip with the host. > > On Wed, Dec 28, 2016 at 1:46 AM, Karamba <phantom...@web.de> wrote: >> Hi Sun Rui, >> >> thanks for answering! >> >

Re: [Spark 2.0.2 HDFS]: no data locality

2016-12-28 Thread Karamba
spark.cores.max to limit the cores to acquire, > which means executors are available on a subset of the cluster nodes? > >> On Dec 27, 2016, at 01:39, Karamba <phantom...@web.de> wrote: >> >> Hi, >> >> I am running a couple of docker hosts, each with an HDFS and a

[Spark 2.0.2 HDFS]: no data locality

2016-12-26 Thread Karamba
Hi, I am running a couple of docker hosts, each with an HDFS and a spark worker in a spark standalone cluster. In order to get data locality awareness, I would like to configure Racks for each host, so that a spark worker container knows from which hdfs node container it should load its data.