We are using Spark 2.3 and would want to know if it is recommended to create
a custom KairoDBSink by implementing the StreamSinkProvider ?
The interface is marked experimental and in-stable ?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
I am planning to send these metrics to our KairosDB. Let me know if there are
any examples that I can take a look
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail:
In our Spark Structured Streaming job we listen to Kafka and we filter out
some messages which we feel are malformed.
Currently we log that information using the LOGGER.
Is there a way to emit some kind of metrics for each time such a malformed
message is seen in Structured Streaming ?
Thanks
Hi all,
Do we have some logs or some metrics that get recorded in log files or some
metrics sinker about the number of events that are ignored due to watermark
in structured streaming?
Thanks
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Hi Spark Mailing list,
We are looking for pushing the output of the structured streaming query
output to KairosDB. (time series database)
What would be the recommended way of doing this? Do we implement the *Sink*
trait or do we use the *ForEachWriter*
At each trigger point if I do a
Hi,
We have a use case where we need to *sessionize* our data and for each
*session* emit some *metrics* we need to handle *repeated sessions* and
*sessions timeout*. We have come up with the following code structure and
would like to understand if we understand all the concept of *watermark*,
Hi,
We have two *flatMapGroupWithState* in our job and we have two
*withWatermark*
We are getting the event max time, event time and watermarks from
*QueryProgressEvent*.
Right now it just returns one *watermark* value.
Are two watermarks maintained by Spark or just one.
If one which one
Any one?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi,
I read somewhere that with Structured Streaming all the checkpoint data is
more readable (Json) like. Is there any documentation on how to read the
checkpoint data.
If I do `hadoop fs -ls` on the `state` directory I get some encoded data.
Thanks
Girish
--
Sent from:
Hi
I am trying to explore how I can use UDAF for my use case.
I have something like this in my Structured Streaming Job.
val counts: Dataset[(String, Double)] = events
.withWatermark("timestamp", "30 minutes")
.groupByKey(e => e._2.siteIdentifier + "|" +
Hi
We are migrating our Direct Streaming Spark job to Structured Streaming. We
have a batch size of 1 minute.
I am consistently seeing that the Structured Streaming job is always (3-5
minutes) behind the Direct Streaming job. Is there some kinda fine tuning
that will help Structured Streaming
thanks
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi
I am working on implementing my idea but here is how it goes:
1. Use this library https://github.com/groupon/spark-metrics
2. Have a cron job which periodically curl /metrics/json endpoint at driver
and all other nodes
3. Parse the response and send the data through a telegraf agent
Hi,
Currently we are using HDFS for our checkpointing but we are having issues
maintaining a HDFS cluster.
We tried glusterfs in the past for checkpointing but in our setup glusterfs
does not work well.
We are evaluating using Cassandra for storing the checkpoint data. Has any
one implemented
Hi,
This looks very daunting *trait* is there some blog post or some articles
which explains on how to implement this *trait*
Thanks
Girish
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe
Hi,
Are there any specific methods to fine tune our Structured Streaming job ?
Or is it similar to once mentioned here for RDDs
https://spark.apache.org/docs/latest/tuning.html
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Thanks
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi,
We have 2 spark streaming job one using DStreams and the other using
Structured Streaming. I have observed that the number of records per
micro-batch (Per Trigger in case of Structured Streaming) is not the same
between the two jobs. The Structured Streaming job has higher numbers
compared
We have a Spark Structured Streaming job which runs out of disk quota after
some days.
The primary reason is there are bunch of empty folders that are getting
created in the /work/tmp directory.
Any idea how to prune them?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Is there a way to dynamically change the value of *maxOffsetsPerTrigger* ?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi all,
We are using Spark 2.3 and we are facing a weird issue. The streaming job
works perfectly fine in one environment but in an another environment it
does not.
We feel that in low performing environment the checkpointing of the state is
not as performant as the other.
Is there any
21 matches
Mail list logo