Hi, Bjorn.
It seems that you are confused about my announcement. The test coverage
announcement is about the `master` branch which is for the upcoming Apache
Spark 3.3.0. Apache Spark 3.3 will start to support Java 17, not old
release branches like Apache Spark 3.2.x/3.1.x/3.0.x.
> 1. If I change the java version to 17 I did get an error which I did not
copy. But have you built this with java 11 or java 17? I have notis that we
test using java 17, so I was hoping to update java to version 17.
The Apache Spark community is still actively developing, stabilizing, and
optimizing Spark on Java 17. For the details, please see the following.
SPARK-33772: Build and Run Spark on Java 17
SPARK-35781: Support Spark on Apple Silicon on macOS natively on Java 17
SPARK-37593: Optimize HeapMemoryAllocator to avoid memory waste when using
G1GC
In short, please don't expect Java 17 with Spark 3.2.x and older versions.
Thanks,
Dongjoon.
On Sat, Jan 15, 2022 at 11:19 AM Bjørn Jørgensen
wrote:
> 2. Things
>
> I did change the dockerfile from jupyter/docker-stacks to
> https://github.com/bjornjorgensen/docker-stacks/blob/master/pyspark-notebook/Dockerfile
> then I build, tag and push.
> And I start it with docker-compose like
>
> version: '2.1'
> services:
> jupyter:
> image: bjornjorgensen/spark-notebook:spark-3.2.1RC-1
> restart: 'no'
> volumes:
> - ./notebooks:/home/jovyan/notebooks
> ports:
> - "8881:"
> - "8181:8080"
> - "7077:7077"
> - "4040:4040"
> environment:
> NB_UID: ${UID}
> NB_GID: ${GID}
>
>
> 1. If I change the java version to 17 I did get an error which I did not
> copy. But have you built this with java 11 or java 17? I have notis that we
> test using java 17, so I was hoping to update java to version 17.
>
> 2.
>
> In a notebook I start spark by
>
> from pyspark import pandas as ps
> import re
> import numpy as np
> import os
> #import pandas as pd
>
> from pyspark import SparkContext, SparkConf
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
> from pyspark.sql.types import StructType, StructField,
> StringType,IntegerType
>
> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
>
> def get_spark_session(app_name: str, conf: SparkConf):
> conf.setMaster('local[*]')
> conf \
> .set('spark.driver.memory', '64g')\
> .set("fs.s3a.access.key", "minio") \
> .set("fs.s3a.secret.key", "KEY") \
> .set("fs.s3a.endpoint", "http://192.168.1.127:9000;) \
> .set("spark.hadoop.fs.s3a.impl",
> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
> .set("spark.hadoop.fs.s3a.path.style.access", "true") \
> .set("spark.sql.repl.eagerEval.enabled", "True") \
> .set("spark.sql.adaptive.enabled", "True") \
> .set("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer") \
> .set("spark.sql.repl.eagerEval.maxNumRows", "1")
>
> return
> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
>
> spark = get_spark_session("Falk", SparkConf())
>
> Then I run this code
>
> f06 =
> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/f06.json")
>
> pf06 = f06.to_pandas_on_spark()
>
> pf06.info()
>
>
>
> And I did not get any errors or warnings. But acording to
> https://github.com/apache/spark/commit/bc7d55fc1046a55df61fdb380629699e9959fcc6
>
> (Spark)DataFrame.to_pandas_on_spark is deprecated.
>
> So I was supposed to get some info to change to pandas_api. Which I did
> not get.
>
>
>
>
>
> fre. 14. jan. 2022 kl. 07:04 skrev huaxin gao :
>
>> The two regressions have been fixed. I will cut RC2 tomorrow late
>> afternoon.
>>
>> Thanks,
>> Huaxin
>>
>> On Wed, Jan 12, 2022 at 9:11 AM huaxin gao
>> wrote:
>>
>>> Thank you all for testing and voting!
>>>
>>> I will -1 this RC because
>>> https://issues.apache.org/jira/browse/SPARK-37855 and
>>> https://issues.apache.org/jira/browse/SPARK-37859 are regressions.
>>> These are not blockers but I think it's better to fix them in 3.2.1. I will
>>> prepare for RC2.
>>>
>>> Thanks,
>>> Huaxin
>>>
>>> On Wed, Jan 12, 2022 at 2:03 AM Kent Yao wrote:
>>>
+1 (non-binding).
Chao Sun 于2022年1月12日周三 16:10写道:
> +1 (non-binding). Thanks Huaxin for driving the release!
>
> On Tue, Jan 11, 2022 at 11:56 PM Ruifeng Zheng
> wrote:
>
>> +1 (non-binding)
>>
>> Thanks, ruifeng zheng
>>
>> -- Original --
>> *From:* "Cheng Su" ;
>> *Date:* Wed, Jan 12, 2022 02:54 PM
>> *To:* "Qian Sun";"huaxin gao"<
>> huaxin.ga...@gmail.com>;
>> *Cc:* "dev";
>> *Subject:* Re: [VOTE] Release Spark 3.2.1 (RC1)
>>
>> +1 (non-binding). Checked commit history and ran some local tests.
>>
>>
>>
>> Thanks,
>>
>> Cheng Su
>>
>>
>>
>> *From: *Qian Sun