date:20200512

Re: [PySpark] Tagging descriptions

2020-05-12 Thread Rishi Shah

Thanks ZHANG! Please find details below:

# of rows: ~25B, row size would be somewhere around ~3-5MB (it's a parquet
formatted data so, need to worry about only the columns to be tagged)

avg length of the text to be parsed : ~300

Unfortunately don't have sample data or regex which I can share freely.
However about data being parsed - assume these are purchases made online
and we are trying to parse the transaction details. Like purchases made on
amazon can be tagged to amazon as well as other vendors etc.

Appreciate your response!

On Tue, May 12, 2020 at 6:23 AM ZHANG Wei  wrote:

> May I get some requirement details?
>
> Such as:
> 1. The row count and one row data size
> 2. The avg length of text to be parsed by RegEx
> 3. The sample format of text to be parsed
> 4. The sample of current RegEx
>
> --
> Cheers,
> -z
>
> On Mon, 11 May 2020 18:40:49 -0400
> Rishi Shah  wrote:
>
> > Hi All,
> >
> > I have a tagging problem at hand where we currently use regular
> expressions
> > to tag records. Is there a recommended way to distribute & tag? Data is
> > about 10TB large.
> >
> > --
> > Regards,
> >
> > Rishi Shah
>

-- 
Regards,

Rishi Shah

Re: XPATH_INT behavior - XML - Function in Spark

2020-05-12 Thread Chetan Khatri

Thank you for the clarification.
What do you suggest to get this use case achieved.

On Tue, May 12, 2020 at 5:35 PM Jeff Evans 
wrote:

> It sounds like you're expecting the XPath expression to evaluate embedded
> Spark SQL expressions?  From the documentation
> , there
> appears to be no reason to expect that to work.
>
> On Tue, May 12, 2020 at 2:09 PM Chetan Khatri 
> wrote:
>
>> Can someone please help.. Thanks in advance.
>>
>> On Mon, May 11, 2020 at 5:29 PM Chetan Khatri <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Hi Spark Users,
>>>
>>> I want to parse xml coming in the query columns and get the value, I am
>>> using *xpath_int* which works as per my requirement but When I am
>>> embedding in the Spark SQL query columns it is failing.
>>>
>>> select timesheet_profile_id,
>>> *xpath_int(timesheet_profile_code, '(/timesheetprofile/weeks/week[*
>>> *td.current_week**]/**td.day**)[1]')*
>>>
>>> *this failed *
>>> where Hardcoded values work for the above scenario
>>>
>>> scala> spark.sql("select timesheet_profile_id,
>>> xpath_int(timesheet_profile_code,
>>> '(/timesheetprofile/weeks/week[2]/friday)[1]') from
>>> TIMESHEET_PROFILE_ATT").show(false)
>>>
>>> Anyone has worked on this? Thanks in advance.
>>>
>>> Thanks
>>> - Chetan
>>>
>>>

RE: [Spark SQL][reopen SPARK-16951]:Alternative implementation of NOT IN to Anti-join

2020-05-12 Thread Shuang, Linna1

Hi Talebzadeh,

Thank you reply, the background is to use a common benchmark(here we use TPC-H) 
to compare different platform’s performance.

Our current solution is

  1.  remove Q16 out of test
  2.  rewrite Q16 without using “NOT IN”

Both solutions are not perfect. For Solution b which is suggested by the JIRA, 
the common benchmark already being changed, sometimes the result will be 
questioned when comparing with other platforms. In my understanding, the best 
solution is to find a better way to support “NOT IN” in Spark SQL, instead of 
suggesting not to use “NOT IN” in query.

Thanks,
Linna

From: Mich Talebzadeh 
Sent: Tuesday, May 12, 2020 11:16 PM
To: Shuang, Linna1 
Cc: user@spark.apache.org
Subject: Re: [Spark SQL][reopen SPARK-16951]:Alternative implementation of NOT 
IN to Anti-join

Hi Linna,

Please provide a background to it and your solution. The assumption is that 
there is a solution. as suggested.

Thanks,


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.




On Tue, 12 May 2020 at 04:47, Shuang, Linna1 
mailto:linna1.shu...@intel.com>> wrote:
Hello,

This JIRA (SPARK-16951) already being closed with the resolution of “Won’t Fix” 
on 23/Feb/17.

But in TPC-H test, we met performance issue of Q16, which used NOT IN subquery 
and being translated into broadcast nested loop join. This query uses almost 
half time of total 22 queries. For example, 512GB data set, totally execution 
time is 1400 seconds, while Q16’s execution time is 630 seconds.

TPC-H is a common spark sql performance benchmark, this performance issue will 
be met usually. Is it possible to reopen this JIRA and fix this issue?

Thanks,
Linna

Re: dynamic executor scalling spark on kubernetes client mode

2020-05-12 Thread Steven Stetzler

Oh, thanks for mentioning that, it looks l dynamic allocation on Kubernetes
works in client mode in Spark 3.0.0. I just had to set the following
configurations:

spark.dynamicAllocation.enabled=true

spark.dynamicAllocation.shuffleTracking.enabled=true


to enable dynamic allocation and disable the need for the external shuffle
service (which looks like it is experimental right now). My executor pods
couldn't connect to the  external shuffle service when it was enabled. This
seems to be working okay for me.

Thanks,
Steven


On Tue, May 12, 2020 at 4:42 AM Pradeepta Choudhury <
pradeeptachoudhu...@gmail.com> wrote:

> Hey guys i was able to run dynamic scaling in both cluster and client mode
> . would document and send it over this weekend
>
> On Tue 12 May, 2020, 1:26 PM Roland Johann, 
> wrote:
>
>> Hi all,
>>
>> don’t want to interrupt the conversation but are keen where I can find
>> information regarding dynamic allocation on kubernetes. As far as I know
>> the docs just point to future work.
>>
>> Thanks a lot,
>> Roland
>>
>>
>>
>> Am 12.05.2020 um 09:25 schrieb Steven Stetzler > >:
>>
>> Hi all,
>>
>> I am interested in this as well. My use-case could benefit from dynamic
>> executor scaling but we are restricted to using client mode since we are
>> only using Spark shells.
>>
>> Could anyone help me understand the barriers to getting dynamic executor
>> scaling to work in client mode on Kubernetes?
>>
>> Thanks,
>> Steven
>>
>> On Sat, May 9, 2020 at 9:48 AM Pradeepta Choudhury <
>> pradeeptachoudhu...@gmail.com> wrote:
>>
>>> Hiii ,
>>>
>>> The dynamic executor scalling is working fine for spark on kubernetes
>>> (latest from spark master repository ) in cluster mode . is the dynamic
>>> executor scalling available for client mode ? if yes where can i find the
>>> usage doc for same .
>>> If no is there any PR open for this ?
>>>
>>> Thanks ,
>>> Pradeepta
>>>
>>
>>

Re: XPATH_INT behavior - XML - Function in Spark

2020-05-12 Thread Jeff Evans

It sounds like you're expecting the XPath expression to evaluate embedded
Spark SQL expressions?  From the documentation
, there
appears to be no reason to expect that to work.

On Tue, May 12, 2020 at 2:09 PM Chetan Khatri 
wrote:

> Can someone please help.. Thanks in advance.
>
> On Mon, May 11, 2020 at 5:29 PM Chetan Khatri 
> wrote:
>
>> Hi Spark Users,
>>
>> I want to parse xml coming in the query columns and get the value, I am
>> using *xpath_int* which works as per my requirement but When I am
>> embedding in the Spark SQL query columns it is failing.
>>
>> select timesheet_profile_id,
>> *xpath_int(timesheet_profile_code, '(/timesheetprofile/weeks/week[*
>> *td.current_week**]/**td.day**)[1]')*
>>
>> *this failed *
>> where Hardcoded values work for the above scenario
>>
>> scala> spark.sql("select timesheet_profile_id,
>> xpath_int(timesheet_profile_code,
>> '(/timesheetprofile/weeks/week[2]/friday)[1]') from
>> TIMESHEET_PROFILE_ATT").show(false)
>>
>> Anyone has worked on this? Thanks in advance.
>>
>> Thanks
>> - Chetan
>>
>>

to_avro/from_avro inserts extra values from Kafka

2020-05-12 Thread Alex Nastetsky

Hi all,

I create a dataframe, convert it to Avro with to_avro and write it to
Kafka.
Then I read it back out with from_avro.
(Not using Schema Registry.)
The problem is that the values skip every other field in the result.

I expect:
+-++-+---+
|firstName|lastName|color|   mood|
+-++-+---+
| Suzy|  Samson   |  indigo   |  grim |
| Jim|   Johnson  |   blue  | grimmer |
+-++-+---+

Instead I get:

+-++-+---+
|firstName|lastName|color|   mood|
+-++-+---+
| |Suzy| | Samson|
| | Jim| |Johnson|
+-++-+---+

Here's what I'm doing --

$ kt admin -createtopic persons-avro-spark9 -topicdetail <(jsonify
=NumPartitions 1 =ReplicationFactor 1)

$ cat person.avsc
{
  "type": "record",
  "name": "Person",
  "namespace": "com.ippontech.kafkatutorials",
  "fields": [
{
  "name": "firstName",
  "type": "string"
},
{
  "name": "lastName",
  "type": "string"
},
{
  "name": "color",
  "type": "string"
},
{
  "name": "mood",
  "type": "string"
}
  ]

$ spark-shell --packages
org.apache.spark:spark-avro_2.11:2.4.5,org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.5

scala> :paste
// Entering paste mode (ctrl-D to finish)

import org.apache.spark.sql.avro._
import java.nio.file.Files;
import java.nio.file.Paths;

val topic = "persons-avro-spark9"


// `from_avro` requires Avro schema in JSON string format.
val jsonFormatSchema = new
String(Files.readAllBytes(Paths.get("person.avsc")))


val personDF = sc.parallelize(Seq(
("Jim","Johnson","indigo","grim"),
("Suzy","Samson","blue","grimmer")
)).toDF("firstName","lastName","color","mood")

personDF.select(to_avro(struct(personDF.columns.map(column):_*)).alias("value"))
.write
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("topic",topic)
.option("avroSchema",jsonFormatSchema)
.save()

val stream = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", topic)
.option("startingOffsets", "earliest")
.load()
.select(from_avro('value, jsonFormatSchema) as 'person)

.select($"person.firstName",$"person.lastName",$"person.color",$"person.mood")
.writeStream
.format("console")
.start()

// Exiting paste mode, now interpreting.

import org.apache.spark.sql.avro._

import java.nio.file.Files
import java.nio.file.Paths
topic: String = persons-avro-spark9
jsonFormatSchema: String =
{
  "type": "record",
  "name": "Person",
  "namespace": "com.ippontech.kafkatutorials",
  "fields": [
{
  "name": "firstName",
  "type": "string"
},
{
  "name": "lastName",
  "type": "string"
},
{
  "name": "color",
  "type": "string"
},
{
  "name": "mood",
  "type": "string"
}
  ]
}
personDF: org.apache.spark.sql.DataFrame = [firstName: string, lastName:
string ... 2 more fields]
stream: org.apache.spark.sql.streaming.StreamingQuery =
org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@3990c36c

scala> ---
Batch: 0
---
+-++-+---+
|firstName|lastName|color|   mood|
+-++-+---+
| |Suzy| | Samson|
| | Jim| |Johnson|
+-++-+---+

See the raw bytes:

$ kt consume -topic persons-avro-spark9
{
  "partition": 0,
  "offset": 0,
  "key": null,
  "value":
"\u\u0008Suzy\u\u000cSamson\u\u0008blue\u\u000egrimmer",
  "timestamp": "2020-05-12T17:18:53.858-04:00"
}
{
  "partition": 0,
  "offset": 1,
  "key": null,
  "value":
"\u\u0006Jim\u\u000eJohnson\u\u000cindigo\u\u0008grim",
  "timestamp": "2020-05-12T17:18:53.859-04:00"
}

Thanks,
Alex.

Dependency management using https in spark on kubernetes

2020-05-12 Thread Pradeepta Choudhury

Hey guys ,
I have an external api from which i can download the main jar from . when i
do a spark-submit ...all confs...https:api.call.com/somefile.jar . it gives
an error file already exist in the tmp directory and file content doesn't
match error . how can i fix this? Do i need to use an kubernetes init
container ?


Thanks

Re: XPATH_INT behavior - XML - Function in Spark

2020-05-12 Thread Chetan Khatri

Can someone please help.. Thanks in advance.

On Mon, May 11, 2020 at 5:29 PM Chetan Khatri 
wrote:

> Hi Spark Users,
>
> I want to parse xml coming in the query columns and get the value, I am
> using *xpath_int* which works as per my requirement but When I am
> embedding in the Spark SQL query columns it is failing.
>
> select timesheet_profile_id,
> *xpath_int(timesheet_profile_code, '(/timesheetprofile/weeks/week[*
> *td.current_week**]/**td.day**)[1]')*
>
> *this failed *
> where Hardcoded values work for the above scenario
>
> scala> spark.sql("select timesheet_profile_id,
> xpath_int(timesheet_profile_code,
> '(/timesheetprofile/weeks/week[2]/friday)[1]') from
> TIMESHEET_PROFILE_ATT").show(false)
>
> Anyone has worked on this? Thanks in advance.
>
> Thanks
> - Chetan
>
>

Re: GrupState limits

2020-05-12 Thread Srinivas V

If you are talking about total number of objects the state can hold, that
depends on the executor memory you have on your cluster apart from rest of
the memory required for processing. The state is stored in hdfs and
retrieved while processing the next events.
If you maintain million objects with each 20 bytes , it would be 20MB,
which is pretty reasonable to maintain in a executor allocated with few GB
memory. But if you need heavy objects to be stored you need to do the math.
And also it will have a cost in transferring this data back and forth to
hdfs checkpoint location.

Regards
Srini

On Tue, May 12, 2020 at 2:48 AM tleilaxu  wrote:

> Hi,
> I am tracking states in my Spark streaming application with
> MapGroupsWithStateFunction described here:
> https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/streaming/GroupState.html
> Which are the limiting factors on the number of states a job can track at
> the same time? Is it memory? Could be a bounded data structure in the
> internal implementation? Anything else ...
> You might have valuable input on this while I am trying to setup and test
> this.
>
> Thanks,
> Arnold
>

Re: [Spark SQL][reopen SPARK-16951]:Alternative implementation of NOT IN to Anti-join

2020-05-12 Thread Mich Talebzadeh

Hi Linna,

Please provide a background to it and your solution. The assumption is
that there is a solution. as suggested.

Thanks,

Dr Mich Talebzadeh

LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Tue, 12 May 2020 at 04:47, Shuang, Linna1 
wrote:

> Hello,
>
>
>
> This JIRA (SPARK-16951) already being closed with the resolution of “Won’t
> Fix” on 23/Feb/17.
>
>
>
> But in TPC-H test, we met performance issue of Q16, which used NOT IN
> subquery and being translated into broadcast nested loop join. This query
> uses almost half time of total 22 queries. For example, 512GB data set,
> totally execution time is 1400 seconds, while Q16’s execution time is 630
> seconds.
>
>
>
> TPC-H is a common spark sql performance benchmark, this performance issue
> will be met usually. Is it possible to reopen this JIRA and fix this issue?
>
>
>
> Thanks,
>
> Linna
>
>
>

Re: dynamic executor scalling spark on kubernetes client mode

2020-05-12 Thread Pradeepta Choudhury

Hey guys i was able to run dynamic scaling in both cluster and client mode
. would document and send it over this weekend

On Tue 12 May, 2020, 1:26 PM Roland Johann, 
wrote:

> Hi all,
>
> don’t want to interrupt the conversation but are keen where I can find
> information regarding dynamic allocation on kubernetes. As far as I know
> the docs just point to future work.
>
> Thanks a lot,
> Roland
>
>
>
> Am 12.05.2020 um 09:25 schrieb Steven Stetzler  >:
>
> Hi all,
>
> I am interested in this as well. My use-case could benefit from dynamic
> executor scaling but we are restricted to using client mode since we are
> only using Spark shells.
>
> Could anyone help me understand the barriers to getting dynamic executor
> scaling to work in client mode on Kubernetes?
>
> Thanks,
> Steven
>
> On Sat, May 9, 2020 at 9:48 AM Pradeepta Choudhury <
> pradeeptachoudhu...@gmail.com> wrote:
>
>> Hiii ,
>>
>> The dynamic executor scalling is working fine for spark on kubernetes
>> (latest from spark master repository ) in cluster mode . is the dynamic
>> executor scalling available for client mode ? if yes where can i find the
>> usage doc for same .
>> If no is there any PR open for this ?
>>
>> Thanks ,
>> Pradeepta
>>
>
>

Re: [PySpark] Tagging descriptions

2020-05-12 Thread ZHANG Wei

May I get some requirement details?

Such as:
1. The row count and one row data size
2. The avg length of text to be parsed by RegEx
3. The sample format of text to be parsed
4. The sample of current RegEx

-- 
Cheers,
-z

On Mon, 11 May 2020 18:40:49 -0400
Rishi Shah  wrote:

> Hi All,
> 
> I have a tagging problem at hand where we currently use regular expressions
> to tag records. Is there a recommended way to distribute & tag? Data is
> about 10TB large.
> 
> -- 
> Regards,
> 
> Rishi Shah

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: java.lang.OutOfMemoryError Spark Worker

2020-05-12 Thread Hrishikesh Mishra

Configuration:

Driver memory we tried: 2GB / 4GB / 5GB
Executor memory we tried: 4G / 5GB
Even reduced: *spark.memory.fraction *to 0.2  (we are not using cache)
VM Memory: 32 GB and 8 core
We tried for SPARK_WORKER_MEMORY:  30GB / 24GB
SPARK_WORKER_CORES: 32 (because jobs are not CPU bound )
SPARK_WORKER_INSTANCES: 1


What we feel there is not enable space for user classes / objects or clean
up for these is not happening frequently.





On Sat, May 9, 2020 at 12:30 AM Amit Sharma  wrote:

> What memory you are assigning per executor. What is the driver memory
> configuration?
>
>
> Thanks
> Amit
>
> On Fri, May 8, 2020 at 12:59 PM Hrishikesh Mishra 
> wrote:
>
>> We submit spark job through spark-submit command, Like below one.
>>
>>
>> sudo /var/lib/pf-spark/bin/spark-submit \
>> --total-executor-cores 30 \
>> --driver-cores 2 \
>> --class com.hrishikesh.mishra.Main\
>> --master spark://XX.XX.XXX.19:6066  \
>> --deploy-mode cluster  \
>> --supervise
>> http://XX.XX.XXX.19:90/jar/fk-runner-framework-1.0-SNAPSHOT.jar
>>
>>
>>
>>
>> We have python http server, where we hosted all jars.
>>
>> The user kill the driver driver-20200508153502-1291 and its visible in
>> log also, but this is not problem. OOM is separate from this.
>>
>> 20/05/08 15:36:55 INFO Worker: Asked to kill driver
>> driver-20200508153502-1291
>>
>> 20/05/08 15:36:55 INFO DriverRunner: Killing driver process!
>>
>> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
>> /grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed
>>
>> 20/05/08 15:36:55 INFO CommandUtils: Redirection to
>> /grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed
>>
>> 20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application
>> app-20200508153654-11776 removed, cleanupLocalDirs = true
>>
>> 20/05/08 *15:36:55* INFO Worker: Driver* driver-20200508153502-1291 was
>> killed by user*
>>
>> *20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception
>> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
>> stacktrace] was thrown by a user handler's exceptionCaught() method while
>> handling the following exception:*
>>
>> *java.lang.OutOfMemoryError: Java heap space*
>>
>> *20/05/08 15:43:23 ERROR SparkUncaughtExceptionHandler: Uncaught
>> exception in thread Thread[dispatcher-event-loop-6,5,main]*
>>
>> *java.lang.OutOfMemoryError: Java heap space*
>>
>> *20/05/08 15:43:17 WARN AbstractChannelHandlerContext: An exception
>> 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
>> stacktrace] was thrown by a user handler's exceptionCaught() method while
>> handling the following exception:*
>>
>> *java.lang.OutOfMemoryError: Java heap space*
>>
>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>
>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>
>> 20/05/08 15:43:33 INFO ExecutorRunner: Killing process!
>>
>> 20/05/08 15:43:33 INFO ShutdownHookManager: Shutdown hook called
>>
>> 20/05/08 15:43:33 INFO ShutdownHookManager: Deleting directory
>> /grid/1/spark/local/spark-e045e069-e126-4cff-9512-d36ad30ee922
>>
>>
>> On Fri, May 8, 2020 at 9:27 PM Jacek Laskowski  wrote:
>>
>>> Hi,
>>>
>>> It's been a while since I worked with Spark Standalone, but I'd check
>>> the logs of the workers. How do you spark-submit the app?
>>>
>>> DId you check /grid/1/spark/work/driver-20200508153502-1291 directory?
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> 
>>> https://about.me/JacekLaskowski
>>> "The Internals Of" Online Books 
>>> Follow me on https://twitter.com/jaceklaskowski
>>>
>>> 
>>>
>>>
>>> On Fri, May 8, 2020 at 2:32 PM Hrishikesh Mishra 
>>> wrote:
>>>
 Thanks Jacek for quick response.
 Due to our system constraints, we can't move to Structured Streaming
 now. But definitely YARN can be tried out.

 But my problem is I'm able to figure out where is the issue, Driver,
 Executor, or Worker. Even exceptions are clueless.  Please see the below
 exception, I'm unable to spot the issue for OOM.

 20/05/08 15:36:55 INFO Worker: Asked to kill driver
 driver-20200508153502-1291

 20/05/08 15:36:55 INFO DriverRunner: Killing driver process!

 20/05/08 15:36:55 INFO CommandUtils: Redirection to
 /grid/1/spark/work/driver-20200508153502-1291/stderr closed: Stream closed

 20/05/08 15:36:55 INFO CommandUtils: Redirection to
 /grid/1/spark/work/driver-20200508153502-1291/stdout closed: Stream closed

 20/05/08 15:36:55 INFO ExternalShuffleBlockResolver: Application
 app-20200508153654-11776 removed, cleanupLocalDirs = true

 20/05/08 15:36:55 INFO Worker: Driver driver-20200508153502-1291 was
 killed by user

 *20/05/08 15:43:06 WARN AbstractChannelHandlerContext: An exception
 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full
 stacktrace] was thrown by a user handler's

unsubscribe

2020-05-12 Thread Kiran B

Thank you,
Kiran,

Re: dynamic executor scalling spark on kubernetes client mode

2020-05-12 Thread Roland Johann

Hi all,

don’t want to interrupt the conversation but are keen where I can find 
information regarding dynamic allocation on kubernetes. As far as I know the 
docs just point to future work.

Thanks a lot,
Roland



> Am 12.05.2020 um 09:25 schrieb Steven Stetzler :
> 
> Hi all,
> 
> I am interested in this as well. My use-case could benefit from dynamic 
> executor scaling but we are restricted to using client mode since we are only 
> using Spark shells.
> 
> Could anyone help me understand the barriers to getting dynamic executor 
> scaling to work in client mode on Kubernetes?
> 
> Thanks,
> Steven
> 
> On Sat, May 9, 2020 at 9:48 AM Pradeepta Choudhury 
> mailto:pradeeptachoudhu...@gmail.com>> wrote:
> Hiii ,
> 
> The dynamic executor scalling is working fine for spark on kubernetes (latest 
> from spark master repository ) in cluster mode . is the dynamic executor 
> scalling available for client mode ? if yes where can i find the usage doc 
> for same .
> If no is there any PR open for this ?
> 
> Thanks ,
> Pradeepta

Re: dynamic executor scalling spark on kubernetes client mode

2020-05-12 Thread Steven Stetzler

Hi all,

I am interested in this as well. My use-case could benefit from dynamic
executor scaling but we are restricted to using client mode since we are
only using Spark shells.

Could anyone help me understand the barriers to getting dynamic executor
scaling to work in client mode on Kubernetes?

Thanks,
Steven

On Sat, May 9, 2020 at 9:48 AM Pradeepta Choudhury <
pradeeptachoudhu...@gmail.com> wrote:

> Hiii ,
>
> The dynamic executor scalling is working fine for spark on kubernetes
> (latest from spark master repository ) in cluster mode . is the dynamic
> executor scalling available for client mode ? if yes where can i find the
> usage doc for same .
> If no is there any PR open for this ?
>
> Thanks ,
> Pradeepta
>

Re: [PySpark] Tagging descriptions

Re: XPATH_INT behavior - XML - Function in Spark

RE: [Spark SQL][reopen SPARK-16951]:Alternative implementation of NOT IN to Anti-join

Re: dynamic executor scalling spark on kubernetes client mode

Re: XPATH_INT behavior - XML - Function in Spark

to_avro/from_avro inserts extra values from Kafka

Dependency management using https in spark on kubernetes

Re: XPATH_INT behavior - XML - Function in Spark

Re: GrupState limits

Re: [Spark SQL][reopen SPARK-16951]:Alternative implementation of NOT IN to Anti-join

Re: dynamic executor scalling spark on kubernetes client mode

Re: [PySpark] Tagging descriptions

Re: java.lang.OutOfMemoryError Spark Worker

unsubscribe

Re: dynamic executor scalling spark on kubernetes client mode

Re: dynamic executor scalling spark on kubernetes client mode

16 matches

Site Navigation

Mail list logo

Footer information