unsubscribe

2022-01-18 Thread 马殿军
unsubscribe


Ma Dianjun





unsuscribe

2022-01-18 Thread Sasha Kacanski
unsuscribe

-- 
Aleksandar Kacanski - Sasha

*Python Rocks, natural or digital, regardless ...*


Re: Spark on k8s : spark 3.0.1 spark.kubernetes.executor.deleteontermination issue

2022-01-18 Thread Pralabh Kumar
Does this property spark.kubernetes.executor.deleteontermination checks
whether the executor which is deleted have shuffle data or not ?

On Tue, 18 Jan 2022, 11:20 Pralabh Kumar,  wrote:

> Hi spark team
>
> Have cluster wide property spark.kubernetis.executor.deleteontermination
> to true.
> During the long running job, some of the executor got deleted which have
> shuffle data. Because of this,  in the subsequent stage , we get lot of
> spark shuffle fetch fail exceptions.
>
>
> Please let me know , is there a way to fix it. Note if setting above
> property to false , I face no shuffle fetch exception.
>
>
> Regards
> Pralabh
>


Re: [VOTE] Release Spark 3.2.1 (RC1)

2022-01-18 Thread huaxin gao
Hi Bjorn,
Thanks for testing 3.2.1 RC1!
DataFrame.to_pandas_on_spark is deprecated in 3.3.0, not in 3.2.1. That's
why you didn't get any Warnings.

Huaxin

On Sat, Jan 15, 2022 at 4:12 PM Dongjoon Hyun 
wrote:

> Hi, Bjorn.
>
> It seems that you are confused about my announcement. The test coverage
> announcement is about the `master` branch which is for the upcoming Apache
> Spark 3.3.0. Apache Spark 3.3 will start to support Java 17, not old
> release branches like Apache Spark 3.2.x/3.1.x/3.0.x.
>
> > 1. If I change the java version to 17 I did get an error which I did not
> copy. But have you built this with java 11 or java 17? I have notis that we
> test using java 17, so I was hoping to update java to version 17.
>
> The Apache Spark community is still actively developing, stabilizing, and
> optimizing Spark on Java 17. For the details, please see the following.
>
> SPARK-33772: Build and Run Spark on Java 17
> SPARK-35781: Support Spark on Apple Silicon on macOS natively on Java 17
> SPARK-37593: Optimize HeapMemoryAllocator to avoid memory waste when using
> G1GC
>
> In short, please don't expect Java 17 with Spark 3.2.x and older versions.
>
> Thanks,
> Dongjoon.
>
>
>
> On Sat, Jan 15, 2022 at 11:19 AM Bjørn Jørgensen 
> wrote:
>
>> 2. Things
>>
>> I did change the dockerfile from jupyter/docker-stacks to
>> https://github.com/bjornjorgensen/docker-stacks/blob/master/pyspark-notebook/Dockerfile
>> then I build, tag and push.
>> And I start it with docker-compose like
>>
>> version: '2.1'
>> services:
>> jupyter:
>> image: bjornjorgensen/spark-notebook:spark-3.2.1RC-1
>> restart: 'no'
>> volumes:
>> - ./notebooks:/home/jovyan/notebooks
>> ports:
>> - "8881:"
>> - "8181:8080"
>> - "7077:7077"
>> - "4040:4040"
>> environment:
>> NB_UID: ${UID}
>> NB_GID: ${GID}
>>
>>
>> 1. If I change the java version to 17 I did get an error which I did not
>> copy. But have you built this with java 11 or java 17? I have notis that we
>> test using java 17, so I was hoping to update java to version 17.
>>
>> 2.
>>
>> In a notebook I start spark by
>>
>> from pyspark import pandas as ps
>> import re
>> import numpy as np
>> import os
>> #import pandas as pd
>>
>> from pyspark import SparkContext, SparkConf
>> from pyspark.sql import SparkSession
>> from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
>> from pyspark.sql.types import StructType, StructField,
>> StringType,IntegerType
>>
>> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
>>
>> def get_spark_session(app_name: str, conf: SparkConf):
>> conf.setMaster('local[*]')
>> conf \
>>   .set('spark.driver.memory', '64g')\
>>   .set("fs.s3a.access.key", "minio") \
>>   .set("fs.s3a.secret.key", "KEY") \
>>   .set("fs.s3a.endpoint", "http://192.168.1.127:9000;) \
>>   .set("spark.hadoop.fs.s3a.impl",
>> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>>   .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>>   .set("spark.sql.repl.eagerEval.enabled", "True") \
>>   .set("spark.sql.adaptive.enabled", "True") \
>>   .set("spark.serializer",
>> "org.apache.spark.serializer.KryoSerializer") \
>>   .set("spark.sql.repl.eagerEval.maxNumRows", "1")
>>
>> return
>> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
>>
>> spark = get_spark_session("Falk", SparkConf())
>>
>> Then I run this code
>>
>> f06 =
>> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/f06.json")
>>
>> pf06 = f06.to_pandas_on_spark()
>>
>> pf06.info()
>>
>>
>>
>> And I did not get any errors or warnings. But acording to
>> https://github.com/apache/spark/commit/bc7d55fc1046a55df61fdb380629699e9959fcc6
>>
>> (Spark)DataFrame.to_pandas_on_spark is deprecated.
>>
>> So I was supposed to get some info to change to pandas_api. Which I did
>> not get.
>>
>>
>>
>>
>>
>> fre. 14. jan. 2022 kl. 07:04 skrev huaxin gao :
>>
>>> The two regressions have been fixed. I will cut RC2 tomorrow late
>>> afternoon.
>>>
>>> Thanks,
>>> Huaxin
>>>
>>> On Wed, Jan 12, 2022 at 9:11 AM huaxin gao 
>>> wrote:
>>>
 Thank you all for testing and voting!

 I will -1 this RC because
 https://issues.apache.org/jira/browse/SPARK-37855 and
 https://issues.apache.org/jira/browse/SPARK-37859 are regressions.
 These are not blockers but I think it's better to fix them in 3.2.1. I will
 prepare for RC2.

 Thanks,
 Huaxin

 On Wed, Jan 12, 2022 at 2:03 AM Kent Yao  wrote:

> +1 (non-binding).
>
> Chao Sun  于2022年1月12日周三 16:10写道:
>
>> +1 (non-binding). Thanks Huaxin for driving the release!
>>
>> On Tue, Jan 11, 2022 at 11:56 PM Ruifeng Zheng 
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Thanks, ruifeng zheng
>>>
>>> -- Original --
>>> *From:*