Re: Multiple sessions in one application?

2018-12-21 Thread Mark Hamstra
On the contrary, it is a common occurrence in a Spark Jobserver style of application with multiple users. On Thu, Dec 20, 2018 at 6:09 PM Jiaan Geng wrote: > This scene is rare. > When you provide a web server for spark. maybe you need it. > > > > -- > Sent from:

Re: Custom Metric Sink on Executor Always ClassNotFound

2018-12-21 Thread Thakrar, Jayesh
Just curious - is this HttpSink your own custom sink or Dropwizard configuration? If your own custom code, I would suggest looking/trying out the Dropwizard. See http://spark.apache.org/docs/latest/monitoring.html#metrics https://metrics.dropwizard.io/4.0.0/ Also, from what I know, the metrics

Spark 2 - How to order keys in sparse vector (K-means)?

2018-12-21 Thread ddebarbieux
Dear all, I am using Spark 2 in order to cluster data with the K-means algorithm. My input data is flat and K-means requires sparse vectors with ordered keys. Here is an example of an input and the expected output: [id, key, value] [1, 10, 100] [1, 30, 300] [2, 40, 400] [1, 20, 200] [id,

Re: Connection issue with AWS S3 from PySpark 2.3.1

2018-12-21 Thread Riccardo Ferrari
Hi Aakash, Can you share how are you adding those jars? Are you using the package method ? I assume you're running in a cluster, and those dependencies might have not properly distributed. How are you submitting your app? What kind of resource manager are you using standalone, yarn, ... Best,

Re: Connection issue with AWS S3 from PySpark 2.3.1

2018-12-21 Thread Aakash Basu
Any help, anyone? On Fri, Dec 21, 2018 at 2:21 PM Aakash Basu wrote: > Hey Shuporno, > > With the updated config too, I am getting the same error. While trying to > figure that out, I found this link which says I need aws-java-sdk (which I > already have): >

Re: running updates using SPARK

2018-12-21 Thread Gourav Sengupta
Hi Jiaan, Spark does support UPDATES but in the version that Databricks has. The question to the community was asking when are they going to support it. Regards, Gourav On Fri, 21 Dec 2018, 03:36 Jiaan Geng I think Spark is a Calculation engine design for OLAP or Ad-hoc.Spark is > not > a

Re: Connection issue with AWS S3 from PySpark 2.3.1

2018-12-21 Thread Aakash Basu
Hey Shuporno, With the updated config too, I am getting the same error. While trying to figure that out, I found this link which says I need aws-java-sdk (which I already have): https://github.com/amazon-archives/kinesis-storm-spout/issues/8 Now, this is my java details: java version

Re: Connection issue with AWS S3 from PySpark 2.3.1

2018-12-21 Thread Shuporno Choudhury
Hi, I don't know whether the following config (that you have tried) are correct: fs.s3a.awsAccessKeyId fs.s3a.awsSecretAccessKey The correct ones probably are: fs.s3a.access.key fs.s3a.secret.key On Fri, 21 Dec 2018 at 13:21, Aakash Basu-2 [via Apache Spark User List] <