Join happening after watermark time

2018-12-06 Thread Abhijeet Kumar
Hello Team, I’m using watermark to join two streams as you can see below: val order_wm = order_details.withWatermark("tstamp_trans", "20 seconds") val invoice_wm = invoice_details.withWatermark("tstamp_trans", "20 seconds") val join_df = order_wm .join(invoice_wm, order_wm.col("s_order_id")

Re: OData compliant API for Spark

2018-12-06 Thread Affan Syed
Jean, thanks for the pointer, but my question remains around building an odata server that maps onto REST. This can be a non-trivial effort, and I was hoping for people to jump in some discussion and maybe we can decide on how to execute in open source rather than doing an insource effort. To

Re: How to track batch jobs in spark ?

2018-12-06 Thread kant kodali
Thanks for all responses. 1) I am not using YARN. I am using Spark Standalone. 2) yes I want to be able to kill the whole Application. 3) I want to be able to monitor the status of the Application which is running a batch query and expected to run for an hour or so, therefore, I am looking for

how to register UDF when scala code invoke python

2018-12-06 Thread mengmeng.m...@mathartsys.com
Hello, I just finished that register UDF by a main jar(scala code) invoke a jar(scala code). but there is a question that : how to register UDF when main jar(scala code) invoke python ? Thanke Meng

Re: How to track batch jobs in spark ?

2018-12-06 Thread Gourav Sengupta
Hi Kant, why would you want to kill a batch job at all, it leads to half written data in to the disk, and sometimes other issues. The general practice is to have exception handling code. In case you are running into scenarios where the code is just consuming too much resources and you are

Re: How to fix spark streaming missing tab

2018-12-06 Thread Gourav Sengupta
Hi, the apache spark community removed that option in the later releases of SPARK, and even I am confused because the streaming UI tab was really helpful. We can see the graphs in Databricks but instead of SPARK UI it is available in their notebooks, I think. Regards, Gourav Sengupta On Thu,

Spark Structured Streaming - DF shows only one column with list of byte array

2018-12-06 Thread salemi
Hi All, I am trying to read messages from Kafka, deserialize the values using Avro and then convert the JSON content to a DF. I would like to see a dataframe like the following for a Kafka message value like {"a": "1" , "b": "1"}: +---+ |a| b |

Spark Core - Embed in other application

2018-12-06 Thread sparkuser99
Hi, I have a use case to process simple ETL like jobs. The data volume is very less (less than few GB), and can fit easily on my running java application's memory. I would like to take advantage of Spark dataset api, but don't need any spark setup (Standalone / Cluster ). Can I embed spark in

In the future, will Spark support capacity scheduler in standalone mode?

2018-12-06 Thread conner
Hi, I must run spark cluster under standalone mode.I want to know Spark will support capacity scheduler in standalone mode as a choice? Regards Conner -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To

Re: How to track batch jobs in spark ?

2018-12-06 Thread Thakrar, Jayesh
See if https://spark.apache.org/docs/latest/monitoring.html helps. Essentially whether you are running an app as spark-shell, via spark-submit (local, Spark-Cluster, YARN, Kubernetes, mesos), the driver will provide a UI on port 4040. You can monitor via the UI and via a REST API E.g. running