Hello Team,
I’m using watermark to join two streams as you can see below:
val order_wm = order_details.withWatermark("tstamp_trans", "20 seconds")
val invoice_wm = invoice_details.withWatermark("tstamp_trans", "20 seconds")
val join_df = order_wm
.join(invoice_wm, order_wm.col("s_order_id")
Jean,
thanks for the pointer, but my question remains around building an odata
server that maps onto REST. This can be a non-trivial effort, and I was
hoping for people to jump in some discussion and maybe we can decide on how
to execute in open source rather than doing an insource effort.
To
Thanks for all responses.
1) I am not using YARN. I am using Spark Standalone.
2) yes I want to be able to kill the whole Application.
3) I want to be able to monitor the status of the Application which is
running a batch query and expected to run for an hour or so, therefore, I
am looking for
Hello,
I just finished that register UDF by a main jar(scala code) invoke a
jar(scala code).
but there is a question that : how to register UDF when main jar(scala code)
invoke python ?
Thanke
Meng
Hi Kant,
why would you want to kill a batch job at all, it leads to half written
data in to the disk, and sometimes other issues. The general practice is to
have exception handling code.
In case you are running into scenarios where the code is just consuming too
much resources and you are
Hi,
the apache spark community removed that option in the later releases of
SPARK, and even I am confused because the streaming UI tab was really
helpful. We can see the graphs in Databricks but instead of SPARK UI it is
available in their notebooks, I think.
Regards,
Gourav Sengupta
On Thu,
Hi All,
I am trying to read messages from Kafka, deserialize the values using Avro
and then convert the JSON content to a DF.
I would like to see a dataframe like the following for a Kafka message
value like {"a": "1" , "b": "1"}:
+---+
|a| b |
Hi,
I have a use case to process simple ETL like jobs. The data volume is very
less (less than few GB), and can fit easily on my running java application's
memory. I would like to take advantage of Spark dataset api, but don't need
any spark setup (Standalone / Cluster ). Can I embed spark in
Hi,
I must run spark cluster under standalone mode.I want to know Spark will
support capacity scheduler in standalone mode as a choice?
Regards
Conner
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To
See if https://spark.apache.org/docs/latest/monitoring.html helps.
Essentially whether you are running an app as spark-shell, via spark-submit
(local, Spark-Cluster, YARN, Kubernetes, mesos), the driver will provide a UI
on port 4040.
You can monitor via the UI and via a REST API
E.g. running
10 matches
Mail list logo