Re: Data source v2 streaming sinks does not support Update mode

2021-01-12 Thread Jungtaek Lim
Would you mind if I ask for a simple reproducer? Would be nice if you could create a repository in Github and push the code including the build script. Thanks in advance! On Wed, Jan 13, 2021 at 3:46 PM Eric Beabes wrote: > I tried both. First tried 3.0.0. That didn't work so I tried 3.1.0. >

Re: Data source v2 streaming sinks does not support Update mode

2021-01-12 Thread Eric Beabes
I tried both. First tried 3.0.0. That didn't work so I tried 3.1.0. On Wed, Jan 13, 2021 at 11:35 AM Jungtaek Lim wrote: > Which exact Spark version did you use? Did you make sure the version for > Spark and the version for spark-sql-kafka artifact are the same? (I asked > this because you've

Re: Data source v2 streaming sinks does not support Update mode

2021-01-12 Thread Jungtaek Lim
Which exact Spark version did you use? Did you make sure the version for Spark and the version for spark-sql-kafka artifact are the same? (I asked this because you've said you've used Spark 3.0 but spark-sql-kafka dependency pointed to 3.1.0.) On Tue, Jan 12, 2021 at 11:04 PM Eric Beabes wrote:

Re: Customizing K-Means for Anomaly Detection

2021-01-12 Thread Sean Owen
You could fit the k-means pipeline, get the cluster centers, create a Transformer using that info, then create a new PipelineModel including all the original elements and the new Transformer. Does that work? It's not out of the question to expose a new parameter in KMeansModel that lets you also

Customizing K-Means for Anomaly Detection

2021-01-12 Thread Artemis User
First some background: * We want to use the k-means model for anomaly detection against a multi-dimensional dataset.  The current k-means implementation in Spark is designed for clustering purpose, not exactly for anomaly detection.  Once a model is trained and pipeline is

Re: [Spark SQL]HiveQL and Spark SQL producing different results

2021-01-12 Thread Terry Kim
Ying, Can you share a query that produces different results? Thanks, Terry On Sun, Jan 10, 2021 at 1:48 PM Ying Zhou wrote: > Hi, > > I run some SQL using both Hive and Spark. Usually we get the same results. > However when a window function is in the script Hive and Spark can produce >

Re: Spark 3.0.1 not connecting with Hive 2.1.1

2021-01-12 Thread Pradyumn Agrawal
Hi Michael, Sure will give it a try once more. Regards Pradyumn Agrawal Media.net (India) On Sun, Jan 10, 2021 at 9:35 PM michael.yang wrote: > Hi Pradyumn, > > It seems you did not configure spark-default.conf file well. > Below configurations are needed to use hive 2.1.1 as metastore and >

Re: Understanding Executors UI

2021-01-12 Thread Eric Beabes
I reduced the 'state timeout' from 10 minutes to 2 minutes so that memory would be released quicker & the new numbers for Storage Memory are: 54.7GB out of 598.5GB BUT I still don't trust these numbers. As Amit pointed out, it seems there's a bug in the Spark 2.4 UI. I am requesting 2TB of Memory

Re: Data source v2 streaming sinks does not support Update mode

2021-01-12 Thread Jacek Laskowski
Hi, Can you post the whole message? I'm trying to find what might be causing it. A small reproducible example would be of help too. Thank you. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books Follow me on