making query state checkpoint compatible in structured streaming

2018-06-17 Thread puneetloya
Consider there is a spark query(A) which is dependent on Kafka topics t1 and t2. After running this query in the streaming mode, a checkpoint(C1) directory for the query gets created with offsets and sources directories. Now I add a third topic(t3) on which the query is dependent. Now if I

Structured streaming checkpointing

2017-12-23 Thread puneetloya
Hi, Is s3 a good storage candidate for structured streaming checkpointing? According to the structured streaming guide, it must be HDFS compliant. https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html I see people using s3 for checkpointing but is it suitable for the

Spark Streaming Cluster queries

2018-01-27 Thread puneetloya
Hi All, A cluster of one spark driver and multiple executors(5) is setup with redis for spark processed data storage and s3 is used for checkpointing. I have a couple of queries about this setup. 1) How to analyze what part of code executes on Spark Driver and what part of code executes on the

mesos cluster dispatcher

2018-01-01 Thread puneetloya
hi, Would like an opinion on using *mesos cluster dispatcher*. It worked for me on 2 vagrant machines setup( i.e mesos master and slave). Is it better to start the spark driver using Marathon instead of dispatcher? the —supervise option can become a pain as you cannot stop the driver. please

Re: Structured Streaming on Kubernetes

2018-08-21 Thread puneetloya
Thanks for putting a comprehensive observation about Spark on Kubernetes. In mesos Spark deployment, it has a property called spark.mesos.extra.cores. The property means: * Set the extra number of cores for an executor to advertise. This does not result in more cores allocated. It instead means

[Spark Structued Streaming]: Read kafka offset from a timestamp

2018-11-17 Thread puneetloya
I would like to request a feature for reading data from Kafka Source based on a timestamp. So that if the application needs to process data from a certain time, it should be able to do it. I do agree, that there is checkpoint which gives us a continuation of stream process but what if I want to

K8s for spark-2.2.3

2019-03-12 Thread puneetloya
Hi all, If you are interested in using kubernetes with Spark 2.2.3, the branch is here: https://github.com/puneetloya/spark/tree/spark-2.2.3-k8s-0.5.0 I just rebased Spark 2.2.3 with https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0 Have tested it. Thanks, Puneet

Spark 2.4.3 - Structured Streaming - high on Storage Memory

2019-06-15 Thread puneetloya
Hi, Just upgraded Spark from 2.2.3 to 2.4.3. Ran a load test with a week worth of messages in kafka. Seeing an odd behavior, why is the storage memory so high? Have run similar workloads with Spark 2.2.3,

Re: Spark 2.4.3 - Structured Streaming - high on Storage Memory

2019-06-16 Thread puneetloya
Just More info on the above post: Have been seeing lot of these logs: 1) The state for version 15109(other numbers too) doesn't exist in loadedMaps. Reading snapshot file and delta files if needed...Note that this is normal for the first batch of starting query. 2) KafkaConsumer cache hitting

Re: Spark 2.4.3 - Structured Streaming - high on Storage Memory

2019-10-14 Thread puneetloya
Hi the amazing spark team, I was closely following these issues, https://issues.apache.org/jira/browse/SPARK-27648 and then recently this: https://issues.apache.org/jira/browse/SPARK-29055 Looks like all of it is fixed in this pull request: https://github.com/apache/spark/pull/25973 and it was

Spark 2.4.5 - Structured Streaming - Failed Jobs expire from the UI

2020-03-04 Thread puneetloya
Hi, I have been using Spark 2.4.5, for the past month. When a structured streaming query fails, it appears on the UI as a failed job. But after a while these failed jobs expire(disappear) from the UI. Is there a setting which expires failed jobs? I was using Spark 2.2 before this, I have never

Re: Spark 2.4.5 - Structured Streaming - Failed Jobs expire from the UI

2020-03-09 Thread puneetloya
Hi, Can I please get some love on this? Thanks, Puneet -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org