[jira] [Created] (SPARK-42395) The code logic of the configmap max size validation lacks extra content

2023-02-10 Thread Wei Yan (Jira)
Wei Yan created SPARK-42395:
---

 Summary: The code logic of the configmap max size validation lacks 
extra content
 Key: SPARK-42395
 URL: https://issues.apache.org/jira/browse/SPARK-42395
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.5.0
Reporter: Wei Yan
 Fix For: 3.3.1


In each configmap, Spark adds an extra content in a fixed format,this extra 
content of the configmap is as belows:
  spark.kubernetes.namespace: default
  spark.properties: |
    #Java properties built from Kubernetes config map with name: 
spark-exec-b47b438630eec12d-conf-map
    #Wed Feb 08 20:10:19 CST 2023
    spark.kubernetes.namespace=default

But the max size validation code logic does not consider this part 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42344) The default size of the CONFIG_MAP_MAXSIZE should not be greater than 1048576

2023-02-03 Thread Wei Yan (Jira)
Wei Yan created SPARK-42344:
---

 Summary: The default size of the CONFIG_MAP_MAXSIZE should not be 
greater than 1048576
 Key: SPARK-42344
 URL: https://issues.apache.org/jira/browse/SPARK-42344
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes, Spark Submit
Affects Versions: 3.3.1
 Environment: Kubernetes: 1.22.0

ETCD: 3.5.0

Spark: 3.3.2
Reporter: Wei Yan
 Fix For: 3.5.0


Exception in thread "main" 
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST 
at: https://172.18.123.24:6443/api/v1/namespaces/default/configmaps. Message: 
ConfigMap "spark-exec-ed9f2c861aa40b48-conf-map" is invalid: []: Too long: must 
have at most 1048576 bytes. Received status: Status(apiVersion=v1, code=422, 
details=StatusDetails(causes=[StatusCause(field=[], message=Too long: must have 
at most 1048576 bytes, reason=FieldValueTooLong, additionalProperties={})], 
group=null, kind=ConfigMap, name=spark-exec-ed9f2c861aa40b48-conf-map, 
retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, 
message=ConfigMap "spark-exec-ed9f2c861aa40b48-conf-map" is invalid: []: Too 
long: must have at most 1048576 bytes, metadata=ListMeta(_continue=null, 
remainingItemCount=null, resourceVersion=null, selfLink=null, 
additionalProperties={}), reason=Invalid, status=Failure, 
additionalProperties={}).
        at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682)
        at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661)
        at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)
        at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555)
        at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518)
        at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:305)
        at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:644)
        at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:83)
        at 
io.fabric8.kubernetes.client.dsl.base.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:61)
        at 
org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.setUpExecutorConfigMap(KubernetesClusterSchedulerBackend.scala:88)
        at 
org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.start(KubernetesClusterSchedulerBackend.scala:112)
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:222)
        at org.apache.spark.SparkContext.(SparkContext.scala:595)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2714)
        at 
org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
        at scala.Option.getOrElse(Option.scala:189)
        at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
        at org.apache.spark.examples.JavaSparkPi.main(JavaSparkPi.java:37)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24374) SPIP: Support Barrier Scheduling in Apache Spark

2018-05-25 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490307#comment-16490307
 ] 

Wei Yan commented on SPARK-24374:
-

Thanks [~mengxr] for the initiative and the doc. cc [~leftnoteasy] [~zhz] , as 
may need some support from YARN side if running as yarn-cluster.

> SPIP: Support Barrier Scheduling in Apache Spark
> 
>
> Key: SPARK-24374
> URL: https://issues.apache.org/jira/browse/SPARK-24374
> Project: Spark
>  Issue Type: Epic
>  Components: ML, Spark Core
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Major
>  Labels: SPIP
> Attachments: SPIP_ Support Barrier Scheduling in Apache Spark.pdf
>
>
> (See details in the linked/attached SPIP doc.)
> {quote}
> The proposal here is to add a new scheduling model to Apache Spark so users 
> can properly embed distributed DL training as a Spark stage to simplify the 
> distributed training workflow. For example, Horovod uses MPI to implement 
> all-reduce to accelerate distributed TensorFlow training. The computation 
> model is different from MapReduce used by Spark. In Spark, a task in a stage 
> doesn’t depend on any other tasks in the same stage, and hence it can be 
> scheduled independently. In MPI, all workers start at the same time and pass 
> messages around. To embed this workload in Spark, we need to introduce a new 
> scheduling model, tentatively named “barrier scheduling”, which launches 
> tasks at the same time and provides users enough information and tooling to 
> embed distributed DL training. Spark can also provide an extra layer of fault 
> tolerance in case some tasks failed in the middle, where Spark would abort 
> all tasks and restart the stage.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org