Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-21 Thread Fawze Abujaber
It's super amazing i see it was tested on spark 2.0.0 and above, what about Spark 1.6 which is still part of Cloudera's main versions? We have a vast Spark applications with version 1.6.0 On Thu, Mar 22, 2018 at 6:38 AM, Holden Karau wrote: > Super exciting! I look

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-21 Thread Holden Karau
Super exciting! I look forward to digging through it this weekend. On Wed, Mar 21, 2018 at 9:33 PM ☼ R Nair (रविशंकर नायर) < ravishankar.n...@gmail.com> wrote: > Excellent. You filled a missing link. > > Best, > Passion > > On Wed, Mar 21, 2018 at 11:36 PM, Rohit Karlupia >

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-21 Thread रविशंकर नायर
Excellent. You filled a missing link. Best, Passion On Wed, Mar 21, 2018 at 11:36 PM, Rohit Karlupia wrote: > Hi, > > Happy to announce the availability of Sparklens as open source project. It > helps in understanding the scalability limits of spark applications and > can

Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-21 Thread Rohit Karlupia
Hi, Happy to announce the availability of Sparklens as open source project. It helps in understanding the scalability limits of spark applications and can be a useful guide on the path towards tuning applications for lower runtime or cost. Please clone from here:

Is there a mutable dataframe spark structured streaming 2.3.0?

2018-03-21 Thread kant kodali
Hi All, Is there a mutable dataframe spark structured streaming 2.3.0? I am currently reading from Kafka and if I cannot parse the messages that I get from Kafka I want to write them to say some "dead_queue" topic. I wonder what is the best way to do this? Thanks!

Re: [Structured Streaming] Application Updates in Production

2018-03-21 Thread Tathagata Das
Why do you want to start the new code in parallel to the old one? Why not stop the old one, and then start the new one? Structured Streaming ensures that all checkpoint information (offsets and state) are future-compatible (as long as state schema is unchanged), hence new code should be able to

Re: Rest API for Spark2.3 submit on kubernetes(version 1.8.*) cluster

2018-03-21 Thread Gourav Sengupta
Hi Lucas, Thanks a ton for responding. have you used livy and SPARK in EMR? I am genuinely not sure how adding a spark-submit in EMR is hard, it is just one line of code. I must be missing something here Regards, Gourav Sengupta On Wed, Mar 21, 2018 at 2:37 PM, lucas.g...@gmail.com

Re: Rest API for Spark2.3 submit on kubernetes(version 1.8.*) cluster

2018-03-21 Thread Josh Goldsborough
Purna, It's a bit tangental to your original question but heads up that Amazon EKS is in Preview right now: https://aws.amazon.com/eks/ I don't know if it actually allows a nice interface between k8s hosted Spark & Lamda functions (my suspicion is it won't fix your problem), but might be

[Structured Streaming] Application Updates in Production

2018-03-21 Thread Priyank Shrivastava
I am using Structured Streaming with Spark 2.2. We are using Kafka as our source and are using checkpoints for failure recovery and e2e exactly once guarantees. I would like to get some more information on how to handle updates to the application when there is a change in stateful operations

Re: HadoopDelegationTokenProvider

2018-03-21 Thread Marcelo Vanzin
They should be available in the current user. UserGroupInformation.getCurrentUser().getCredentials() On Wed, Mar 21, 2018 at 7:32 AM, Jorge Machado wrote: > Hey spark group, > > I want to create a Delegation Token Provider for Accumulo I have One > Question: > > How can I get the

Re: Rest API for Spark2.3 submit on kubernetes(version 1.8.*) cluster

2018-03-21 Thread lucas.g...@gmail.com
Speaking from experience, if you're already operating a kubernetes cluster. Getting a spark workload operating there is nearly an order of magnitude simpler than working with / around EMR. That's not say EMR is excessively hard, just that Kubernetes is easier, all the steps to getting your

HadoopDelegationTokenProvider

2018-03-21 Thread Jorge Machado
Hey spark group, I want to create a Delegation Token Provider for Accumulo I have One Question: How can I get the token that I added to the credentials from the Executor side ? the SecurityManager class is private… Thanks Jorge Machado

Re: Rest API for Spark2.3 submit on kubernetes(version 1.8.*) cluster

2018-03-21 Thread Gourav Sengupta
Hi, just out of curiosity, but since it in AWS, is there any specific reason not to use EMR? Or any particular reason to use Kubernetes? Regards, Gourav Sengupta On Wed, Mar 21, 2018 at 2:47 AM, purna pradeep wrote: > Im using kubernetes cluster on AWS to run spark

Re: Rest API for Spark2.3 submit on kubernetes(version 1.8.*) cluster

2018-03-21 Thread purna pradeep
Thanks Yinan, Looks like this is stil in alpha version. Would like to know if there is any rest-interface for spark2.3 job submission similar to spark 2.2 as I need to submit spark applications to k8 master based on different events (cron or s3 file based trigger) On Tue, Mar 20, 2018 at 11:50

Wait for 30 seconds before terminating Spark Streaming

2018-03-21 Thread Aakash Basu
Hi, Using: *Spark 2.3 + Kafka 0.10* How to wait for 30 seconds after the latest stream and if there's no more streaming data, gracefully exit. Is it running - query.awaitTermination(30) Or is it something else? I tried with this, keeping - option("startingOffsets", "latest") for both my

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-21 Thread Aakash Basu
Thanks Chris! On Fri, Mar 16, 2018 at 10:13 PM, Bowden, Chris wrote: > 2. You must decide. If multiple streaming queries are launched in a single > / simple application, only you can dictate if a single failure should cause > the application to exit. If you use