Re: [ANNOUNCE] Apache Spark 3.2.3 released

2022-11-30 Thread Yang,Jie(INF)
Thanks, Chao!

发件人: Maxim Gekk 
日期: 2022年11月30日 星期三 19:40
收件人: Jungtaek Lim 
抄送: Wenchen Fan , Chao Sun , dev 
, user 
主题: Re: [ANNOUNCE] Apache Spark 3.2.3 released

Thank you, Chao!

On Wed, Nov 30, 2022 at 12:42 PM Jungtaek Lim 
mailto:kabhwan.opensou...@gmail.com>> wrote:
Thanks Chao for driving the release!

On Wed, Nov 30, 2022 at 6:03 PM Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:
Thanks, Chao!

On Wed, Nov 30, 2022 at 1:33 AM Chao Sun 
mailto:sunc...@apache.org>> wrote:
We are happy to announce the availability of Apache Spark 3.2.3!

Spark 3.2.3 is a maintenance release containing stability fixes. This
release is based on the branch-3.2 maintenance branch of Spark. We strongly
recommend all 3.2 users to upgrade to this stable release.

To download Spark 3.2.3, head over to the download page:
https://spark.apache.org/downloads.html

To view the release notes:
https://spark.apache.org/releases/spark-release-3-2-3.html

We would like to acknowledge all community members for contributing to this
release. This release would not have been possible without you.

Chao

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org


Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Yang,Jie(INF)
Thanks Yuming and all developers ~

Yang Jie

发件人: Maxim Gekk 
日期: 2022年10月26日 星期三 15:19
收件人: Hyukjin Kwon 
抄送: "L. C. Hsieh" , Dongjoon Hyun , 
Yuming Wang , dev , User 

主题: Re: [ANNOUNCE] Apache Spark 3.3.1 released

Congratulations everyone with the new release, and thanks to Yuming for his 
efforts.

Maxim Gekk

Software Engineer

Databricks, Inc.


On Wed, Oct 26, 2022 at 10:14 AM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
Thanks, Yuming.

On Wed, 26 Oct 2022 at 16:01, L. C. Hsieh 
mailto:vii...@gmail.com>> wrote:
Thank you for driving the release of Apache Spark 3.3.1, Yuming!

On Tue, Oct 25, 2022 at 11:38 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
>
> It's great. Thank you so much, Yuming!
>
> Dongjoon
>
> On Tue, Oct 25, 2022 at 11:23 PM Yuming Wang 
> mailto:wgy...@gmail.com>> wrote:
>>
>> We are happy to announce the availability of Apache Spark 3.3.1!
>>
>> Spark 3.3.1 is a maintenance release containing stability fixes. This
>> release is based on the branch-3.3 maintenance branch of Spark. We strongly
>> recommend all 3.3 users to upgrade to this stable release.
>>
>> To download Spark 3.3.1, head over to the download page:
>> https://spark.apache.org/downloads.html
>>
>> To view the release notes:
>> https://spark.apache.org/releases/spark-release-3-3-1.html
>>
>> We would like to acknowledge all community members for contributing to this
>> release. This release would not have been possible without you.
>>
>>

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org


Re: [Java 17] --add-exports required?

2022-06-23 Thread Yang,Jie(INF)
So the above issue occurs at build and test a maven project with Spark 3.3.0 
and Java 17, rather than test spark-3.3 source code?

If yes, you may need to add the following Java Options to `argLine` of 
`maven-surefire-plugin` for Java 17:

```
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.net=ALL-UNNAMED
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED
--add-opens=java.base/sun.security.action=ALL-UNNAMED
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED
```

These options are used to pass all Spark UTs, but maybe you don't need all.

However, these Options needn't explicit add when using spark-shell, spark-sql 
and spark-submit, but may need to add others as needed for Java 17.

Maybe some instructions should be added to the document

Yang Jie





发件人: Greg Kopff 
日期: 2022年6月23日 星期四 14:11
收件人: "Yang,Jie(INF)" 
抄送: "user@spark.apache.org" 
主题: Re: [Java 17] --add-exports required?

Hi.

I am running on macOS 12.4, using an ‘Adoptium’ JDK from 
https://adoptium.net/download<https://mailshield.baidu.com/check?q=U8F1V2tHFnSLZMX%2fpIYOpCo623EkCAJTvS41G4mer6y1V2iN>.
 The version details are:

$ java -version
openjdk version "17.0.3" 2022-04-19
OpenJDK Runtime Environment Temurin-17.0.3+7 (build 17.0.3+7)
OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (build 17.0.3+7, mixed mode, sharing)
I have attached an example maven project which demonstrates the error.



If you run 'mvn clean test' it should fail with:

[ERROR] ExampleTest  Time elapsed: 1.194 s  <<< ERROR!
java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in 
unnamed module @0x41a962cf) cannot access class sun.nio.ch.DirectBuffer (in 
module java.base) because module java.base does not export 
sun.nio.ch<http://sun.nio.ch> to unnamed module @0x41a962cf

Some of the diagnostic output from running with Maven with the -X flag is:

Apache Maven 3.8.6 (84538c9988a25aec085021c365c560670ad80f63)
Maven home: /usr/local/apache-maven/apache-maven-3.8.6
Java version: 17.0.3, vendor: Eclipse Adoptium, runtime: 
/Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home
Default locale: en_AU, platform encoding: UTF-8
OS name: "mac os x", version: "12.4", arch: "x86_64", family: “mac"

[DEBUG] boot(compact) classpath:  surefire-booter-3.0.0-M7.jar  
surefire-api-3.0.0-M7.jar  surefire-logger-api-3.0.0-M7.jar  
surefire-shared-utils-3.0.0-M7.jar  surefire-extensions-spi-3.0.0-M7.jar  
test-classes  classes  junit-4.13.2.jar  hamcrest-core-1.3.jar  
hamcrest-all-1.3.jar  spark-core_2.12-3.3.0.jar  avro-1.11.0.jar  
jackson-core-2.12.5.jar  commons-compress-1.21.jar  avro-mapred-1.11.0.jar  
avro-ipc-1.11.0.jar  xz-1.9.jar  chill_2.12-0.10.0.jar  kryo-shaded-4.0.2.jar  
minlog-1.3.0.jar  objenesis-2.5.1.jar  chill-java-0.10.0.jar  
xbean-asm9-shaded-4.20.jar  hadoop-client-api-3.3.2.jar  
hadoop-client-runtime-3.3.2.jar  commons-logging-1.1.3.jar  
spark-launcher_2.12-3.3.0.jar  spark-kvstore_2.12-3.3.0.jar  
leveldbjni-all-1.8.jar  jackson-annotations-2.13.3.jar  
spark-network-common_2.12-3.3.0.jar  tink-1.6.1.jar  gson-2.8.6.jar  
spark-network-shuffle_2.12-3.3.0.jar  spark-unsafe_2.12-3.3.0.jar  
activation-1.1.1.jar  curator-recipes-2.13.0.jar  curator-framework-2.13.0.jar  
curator-client-2.13.0.jar  guava-16.0.1.jar  zookeeper-3.6.2.jar  
commons-lang-2.6.jar  zookeeper-jute-3.6.2.jar  audience-annotations-0.5.0.jar  
jakarta.servlet-api-4.0.3.jar  commons-codec-1.15.jar  commons-lang3-3.12.0.jar 
 commons-math3-3.6.1.jar  commons-text-1.9.jar  commons-io-2.11.0.jar  
commons-collections-3.2.2.jar  commons-collections4-4.4.jar  jsr305-3.0.0.jar  
slf4j-api-1.7.32.jar  jul-to-slf4j-1.7.32.jar  jcl-over-slf4j-1.7.32.jar  
log4j-slf4j-impl-2.17.2.jar  log4j-api-2.17.2.jar  log4j-core-2.17.2.jar  
log4j-1.2-api-2.17.2.jar  compress-lzf-1.1.jar  snappy-java-1.1.8.4.jar  
lz4-java-1.8.0.jar  zstd-jni-1.5.2-1.jar  RoaringBitmap-0.9.25.jar  
shims-0.9.25.jar  scala-xml_2.12-1.2.0.jar  scala-library-2.12.15.jar  
scala-reflect-2.12.15.jar  json4s-jackson_2.12-3.7.0-M11.jar  
json4s-core_2.12-3.7.0-M11.jar  json4s-ast_2.12-3.7.0-M11.jar  
json4s-scalap_2.12-3.7.0-M11.jar  jersey-client-2.34.jar  
jakarta.ws.rs<http://jakarta.ws.rs>-api-2.1.6.jar  jakarta.inject-2.6.1.jar  
jersey-common-2.34.jar  jakarta.annotation-api-1.3.5.jar  
osgi-resource-locator-1.0.3.jar  jersey-server-2.34.jar  
jakarta.validation-api-2.0.2.jar  jersey-container-servlet-2.34.jar  
jersey-container-servlet-core-2.34.jar  jersey-hk2-2.34.jar  
hk2-locator-2.6.1.jar  aopall

Re: [Java 17] --add-exports required?

2022-06-22 Thread Yang,Jie(INF)
Hi, Greg

"--add-exports java.base/sun.nio.ch=ALL-UNNAMED " does not need to be added 
when SPARK-33772 is completed, so in order to answer your question, I need more 
details for testing:
1.  Where can I download Java 17 (Temurin-17+35)?
2.  What test commands do you use?

Yang Jie

在 2022/6/23 12:54,“Greg Kopff” 写入:

Hi.

According to the release notes[1], and specifically the ticket Build and 
Run Spark on Java 17 (SPARK-33772)[2], Spark now supports running on Java 17.

However, using Java 17 (Temurin-17+35) with Maven (3.8.6) and 
maven-surefire-plugin (3.0.0-M7), when running a unit test that uses Spark 
(3.3.0), it fails with:

java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ 
(in unnamed module @0x1e7ba8d9) cannot access class sun.nio.ch.DirectBuffer (in 
module java.base) because module java.base does not export sun.nio.ch to 
unnamed module @0x1e7ba8d9

The full stack is:

java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ 
(in unnamed module @0x1e7ba8d9) cannot access class sun.nio.ch.DirectBuffer (in 
module java.base) because module java.base does not export sun.nio.ch to 
unnamed module @0x1e7ba8d9
  at org.apache.spark.storage.StorageUtils$.(StorageUtils.scala:213)
  at org.apache.spark.storage.StorageUtils$.(StorageUtils.scala)
  at 
org.apache.spark.storage.BlockManagerMasterEndpoint.(BlockManagerMasterEndpoint.scala:114)
  at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:353)
  at 
org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:290)
  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:339)
  at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194)
  at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:279)
  at org.apache.spark.SparkContext.(SparkContext.scala:464)
  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
  at 
org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
  at scala.Option.getOrElse(Option.scala:189)
  at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
  […]

There is a recent StackOverflow question "Java 17 solution for Spark - 
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.spark.storage.StorageUtils"[3], which was asked only 2 months ago, 
but this pre-dated the Spark 3.3.0 release, and thus predated official support 
for Java 17.  The solution proposed there results in us adding this 
configuration to the Surefire plugin:


  --add-exports java.base/sun.nio.ch=ALL-UNNAMED


And, yes, this works.

Now, I understand what this flag achieves … without it the JVM module 
system won’t allow Spark to use the sun.nio.ch.DirectBuffer class.  My question 
is if the requirement to add this flag is currently documented somewhere?  I 
couldn’t find its and it’s likely to start affecting people when they switch to 
Java 17.  Right now the web is mostly full of suggestions to use an earlier 
version of Java.

Cheers,

—
Greg.


[1]: https://spark.apache.org/releases/spark-release-3-3-0.html
[2]: https://issues.apache.org/jira/browse/SPARK-33772
[3]: https://stackoverflow.com/questions/72230174
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org




Re: Spark Parquet write OOM

2022-03-01 Thread Yang,Jie(INF)
This is a DirectByteBuffer OOM,so plan 2 may not work, we can increase the 
capacity of DirectByteBuffer size by configuring  `-XX:MaxDirectMemorySize` and 
this is a Java opts.

However, we'd better check the length of memory to be allocated,  because  
`-XX:MaxDirectMemorySize` and `-Xmx` should have the same capacity by default.


发件人: Anil Dasari 
日期: 2022年3月2日 星期三 09:45
收件人: "user@spark.apache.org" 
主题: Spark Parquet write OOM

Hello everyone,

We are writing Spark Data frame to s3 in parquet and it is failing with below 
exception.

I wanted to try following to avoid OOM


  1.  increase the default sql shuffle partitions to reduce load on parquet 
writer tasks to avoid OOM and
  2.  Increase user memory (reduce memory fraction) to have more memory for 
other data structures assuming parquet writer uses user memory.

I am not sure if these fixes the OOM issue. So wanted to reach out community 
for any suggestions. Please let me know.

Exception:

org.apache.spark.SparkException: Task failed while writing rows.
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:257)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
 at org.apache.spark.scheduler.Task.run(Task.scala:123)
 at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
 at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405)
 at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.OutOfMemoryError
 at sun.misc.Unsafe.allocateMemory(Native Method)
 at java.nio.DirectByteBuffer.(DirectByteBuffer.java:127)
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
 at 
org.apache.parquet.hadoop.codec.SnappyCompressor.setInput(SnappyCompressor.java:97)
 at 
org.apache.parquet.hadoop.codec.NonBlockedCompressorStream.write(NonBlockedCompressorStream.java:48)
 at 
org.apache.parquet.bytes.CapacityByteArrayOutputStream.writeToOutput(CapacityByteArrayOutputStream.java:227)
 at 
org.apache.parquet.bytes.CapacityByteArrayOutputStream.writeTo(CapacityByteArrayOutputStream.java:247)
 at 
org.apache.parquet.bytes.BytesInput$CapacityBAOSBytesInput.writeAllTo(BytesInput.java:405)
 at 
org.apache.parquet.bytes.BytesInput$SequenceBytesIn.writeAllTo(BytesInput.java:296)
 at 
org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.compress(CodecFactory.java:164)
 at 
org.apache.parquet.hadoop.ColumnChunkPageWriteStore$ColumnChunkPageWriter.writePage(ColumnChunkPageWriteStore.java:95)
 at 
org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:147)
 at 
org.apache.parquet.column.impl.ColumnWriterV1.flush(ColumnWriterV1.java:235)
 at 
org.apache.parquet.column.impl.ColumnWriteStoreV1.flush(ColumnWriteStoreV1.java:122)
 at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:172)
 at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.checkBlockSizeReached(InternalParquetRecordWriter.java:148)
 at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:130)
 at 
org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:182)
 at 
org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:44)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.write(ParquetOutputWriter.scala:40)
 at 
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:137)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:245)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242)
 at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1439)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:248)
 ... 10 more
 Suppressed: 

Re: Log4J 2 Support

2021-11-10 Thread Yang,Jie(INF)
It may be more feasible to replace the current slf4j + log4j with log4j2-api,  
some projects that spark relies on may also use log4j at the code level, such 
as EventCounter and ContainerLogAppender in Hadoop, directly removing the 
dependency on log4j may lead to some code dependencies loss.



发件人: Stephen Coy 
日期: 2021年11月10日 星期三 07:16
收件人: Sean Owen 
抄送: User , Ajay Kumar 
主题: Re: Log4J 2 Support

Hi Sean,

I have had a more detailed look at what Spark is doing with log4 APIs and at 
this point I suspect that a logj 2.x migration might be more appropriate at the 
code level.

That still does not solve the libraries issue though. That would need more 
investigation.

I could be tempted to tackle it if there is enough interest.

Cheers,

Steve C


On 10 Nov 2021, at 9:42 am, Sean Owen 
mailto:sro...@gmail.com>> wrote:

Yep that's what I tried, roughly - there is an old jira about it. The issue is 
that Spark does need to configure some concrete logging framework in a few 
cases, as do other libs, and that isn't what the shims cover. Could be possible 
now or with more cleverness but the simple thing didn't work out IIRC.
On Tue, Nov 9, 2021, 4:32 PM Stephen Coy 
mailto:s...@infomedia.com.au>> wrote:
Hi there,

It’s true that the preponderance of log4j 1.2.x in many existing live projects 
is kind of a pain in the butt.

But there is a solution.

1. Migrate all Spark code to use slf4j APIs;

2. Exclude log4j 1.2.x from any dependencies sucking it in;

3. Include the log4j-over-slf4j bridge jar and slf4j-api jars;

4. Choose your favourite modern logging implementation and add it as a 
“runtime" dependency together with it’s slf4j binding jar (if needed).

In fact in the short term you can replace steps 1 and 2 with "remove the log4j 
1.2.17 jar from the distribution" and it should still work.

The slf4j project also includes a commons-logging shim for capturing its output 
too.

FWIW, the slf4j project is run by one of the original log4j developers.

Cheers,

Steve C



On 9 Nov 2021, at 11:11 pm, Sean Owen 
mailto:sro...@gmail.com>> wrote:

No plans that I know of. It's not that Spark uses it so much as its 
dependencies. I tried and failed to upgrade it a couple years ago. you are 
welcome to try, and open a PR if successful.

On Tue, Nov 9, 2021 at 6:09 AM Ajay Kumar 
mailto:ajay.praja...@gmail.com>> wrote:
Hi Team,
We wanted to send Spark executor logs to a centralized logging server using TCP 
Socket. I see that the spark log4j version is very old(1.2.17) and it does not 
support JSON logs over tcp sockets on containers.
I wanted to konw what is the plan for upgrading the log4j version to log4j2.
Thanks in advance.
Regards,
Ajay

This email contains confidential information of and is the copyright of 
Infomedia. It must not be forwarded, amended or disclosed without consent of 
the sender. If you received this message by mistake, please advise the sender 
and delete all copies. Security of transmission on the internet cannot be 
guaranteed, could be infected, intercepted, or corrupted and you should ensure 
you have suitable antivirus protection in place. By sending us your or any 
third party personal details, you consent to (or confirm you have obtained 
consent from such third parties) to Infomedia’s privacy policy. 
http://www.infomedia.com.au/privacy-policy/