date:20230908

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-08 Thread Yuanjian Li

@Dongjoon Hyun  Thank you for reporting this and
for your prompt response.

The vote has failed. I'll cut RC5 tonight, PST time.

Dongjoon Hyun  于2023年9月8日周五 15:57写道：

> Sorry but I'm -1 because there exists a late-arrival correctness patch
> although it's not a regression.
>
> - https://issues.apache.org/jira/browse/SPARK-44805
> "Data lost after union using
> spark.sql.parquet.enableNestedColumnVectorizedReader=true"
>
> - https://github.com/apache/spark/pull/42850
> -
> https://github.com/apache/spark/commit/b2b2ba97d3003d25d159943ab8a4bf50e421fdab
> (branch-3.5)
>
> Dongjoon.
>
>
>
> On Wed, Sep 6, 2023 at 8:11 AM Yuanjian Li 
> wrote:
>
> Please vote on releasing the following candidate(RC4) as Apache Spark
> version 3.5.0.
>
>
>
> The vote is open until 11:59pm Pacific time *Sep 8th* and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
>
>
> [ ] +1 Release this package as Apache Spark 3.5.0
>
> [ ] -1 Do not release this package because ...
>
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
>
>
> The tag to be voted on is v3.5.0-rc4 (commit
> c2939589a29dd0d6a2d3d31a8d833877a37ee02a):
>
> https://github.com/apache/spark/tree/v3.5.0-rc4
>
>
>
> The release files, including signatures, digests, etc. can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc4-bin/
>
>
>
> Signatures used for Spark RCs can be found in this file:
>
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
>
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1448
>
>
>
> The documentation corresponding to this release can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc4-docs/
>
>
>
> The list of bug fixes going into 3.5.0 can be found at the following
> URL:
>
> https://issues.apache.org/jira/projects/SPARK/versions/12352848
>
>
>
> This release is using the release script of the tag v3.5.0-rc4.
>
>
>
> FAQ
>
>
>
> =
>
> How can I help test this release?
>
> =
>
> If you are a Spark user, you can help us test this release by taking
>
> an existing Spark workload and running on this release candidate, then
>
> reporting any regressions.
>
>
>
> If you're working in PySpark you can set up a virtual env and install
>
> the current RC and see if anything important breaks, in the Java/Scala
>
> you can add the staging repository to your projects resolvers and test
>
> with the RC (make sure to clean up the artifact cache before/after so
>
> you don't end up building with an out of date RC going forward).
>
>
>
> ===
>
> What should happen to JIRA tickets still targeting 3.5.0?
>
> ===
>
> The current list of open tickets targeted at 3.5.0 can be found at:
>
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.5.0
>
>
>
> Committers should look at those and triage. Extremely important bug
>
> fixes, documentation, and API tweaks that impact compatibility should
>
> be worked on immediately. Everything else please retarget to an
>
> appropriate release.
>
>
>
> ==
>
> But my bug isn't fixed?
>
> ==
>
> In order to make timely releases, we will typically not hold the
>
> release unless the bug in question is a regression from the previous
>
> release. That being said, if there is something which is a regression
>
> that has not been correctly targeted please ping me or a committer to
>
> help target the issue.
>
>
>
> Thanks,
>
> Yuanjian Li
>
>
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-08 Thread Xinrong Meng

+1

Thank you for driving the release!

On Fri, Sep 8, 2023 at 10:12 AM Jungtaek Lim 
wrote:

> +1 (non-binding)
>
> Thanks for driving this release!
>
> On Fri, Sep 8, 2023 at 11:29 AM Holden Karau  wrote:
>
>> +1 pip installing seems to function :)
>>
>> On Thu, Sep 7, 2023 at 7:22 PM Yuming Wang  wrote:
>>
>>> +1.
>>>
>>> On Thu, Sep 7, 2023 at 10:33 PM yangjie01 
>>> wrote:
>>>
 +1



 *发件人**: *Gengliang Wang 
 *日期**: *2023年9月7日 星期四 12:53
 *收件人**: *Yuanjian Li 
 *抄送**: *Xiao Li , "her...@databricks.com.invalid"
 , Spark dev list 
 *主题**: *Re: [VOTE] Release Apache Spark 3.5.0 (RC4)



 +1



 On Wed, Sep 6, 2023 at 9:46 PM Yuanjian Li 
 wrote:

 +1 (non-binding)

 Xiao Li  于2023年9月6日周三 15:27写道：

 +1



 Xiao



 Herman van Hovell  于2023年9月6日周三 22:08写道：

 Tested connect, and everything looks good.



 +1



 On Wed, Sep 6, 2023 at 8:11 AM Yuanjian Li 
 wrote:

 Please vote on releasing the following candidate(RC4) as Apache Spark
 version 3.5.0.



 The vote is open until 11:59pm Pacific time *Sep 8th* and passes if a
 majority +1 PMC votes are cast, with a minimum of 3 +1 votes.



 [ ] +1 Release this package as Apache Spark 3.5.0

 [ ] -1 Do not release this package because ...



 To learn more about Apache Spark, please see http://spark.apache.org/



 The tag to be voted on is v3.5.0-rc4 (commit
 c2939589a29dd0d6a2d3d31a8d833877a37ee02a):

 https://github.com/apache/spark/tree/v3.5.0-rc4



 The release files, including signatures, digests, etc. can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc4-bin/



 Signatures used for Spark RCs can be found in this file:

 https://dist.apache.org/repos/dist/dev/spark/KEYS



 The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-1448



 The documentation corresponding to this release can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc4-docs/



 The list of bug fixes going into 3.5.0 can be found at the following
 URL:

 https://issues.apache.org/jira/projects/SPARK/versions/12352848



 This release is using the release script of the tag v3.5.0-rc4.



 FAQ



 =

 How can I help test this release?

 =

 If you are a Spark user, you can help us test this release by taking

 an existing Spark workload and running on this release candidate, then

 reporting any regressions.



 If you're working in PySpark you can set up a virtual env and install

 the current RC and see if anything important breaks, in the Java/Scala

 you can add the staging repository to your projects resolvers and test

 with the RC (make sure to clean up the artifact cache before/after so

 you don't end up building with an out of date RC going forward).



 ===

 What should happen to JIRA tickets still targeting 3.5.0?

 ===

 The current list of open tickets targeted at 3.5.0 can be found at:

 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.5.0



 Committers should look at those and triage. Extremely important bug

 fixes, documentation, and API tweaks that impact compatibility should

 be worked on immediately. Everything else please retarget to an

 appropriate release.



 ==

 But my bug isn't fixed?

 ==

 In order to make timely releases, we will typically not hold the

 release unless the bug in question is a regression from the previous

 release. That being said, if there is something which is a regression

 that has not been correctly targeted please ping me or a committer to

 help target the issue.



 Thanks,

 Yuanjian Li


>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-08 Thread Dongjoon Hyun

Sorry but I'm -1 because there exists a late-arrival correctness patch
although it's not a regression.

- https://issues.apache.org/jira/browse/SPARK-44805
"Data lost after union using
spark.sql.parquet.enableNestedColumnVectorizedReader=true"

- https://github.com/apache/spark/pull/42850
-
https://github.com/apache/spark/commit/b2b2ba97d3003d25d159943ab8a4bf50e421fdab
(branch-3.5)

Dongjoon.



 On Wed, Sep 6, 2023 at 8:11 AM Yuanjian Li 
 wrote:

 Please vote on releasing the following candidate(RC4) as Apache Spark
 version 3.5.0.



 The vote is open until 11:59pm Pacific time *Sep 8th* and passes if a
 majority +1 PMC votes are cast, with a minimum of 3 +1 votes.



 [ ] +1 Release this package as Apache Spark 3.5.0

 [ ] -1 Do not release this package because ...



 To learn more about Apache Spark, please see http://spark.apache.org/



 The tag to be voted on is v3.5.0-rc4 (commit
 c2939589a29dd0d6a2d3d31a8d833877a37ee02a):

 https://github.com/apache/spark/tree/v3.5.0-rc4



 The release files, including signatures, digests, etc. can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc4-bin/



 Signatures used for Spark RCs can be found in this file:

 https://dist.apache.org/repos/dist/dev/spark/KEYS



 The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-1448



 The documentation corresponding to this release can be found at:

 https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc4-docs/



 The list of bug fixes going into 3.5.0 can be found at the following
 URL:

 https://issues.apache.org/jira/projects/SPARK/versions/12352848



 This release is using the release script of the tag v3.5.0-rc4.



 FAQ



 =

 How can I help test this release?

 =

 If you are a Spark user, you can help us test this release by taking

 an existing Spark workload and running on this release candidate, then

 reporting any regressions.



 If you're working in PySpark you can set up a virtual env and install

 the current RC and see if anything important breaks, in the Java/Scala

 you can add the staging repository to your projects resolvers and test

 with the RC (make sure to clean up the artifact cache before/after so

 you don't end up building with an out of date RC going forward).



 ===

 What should happen to JIRA tickets still targeting 3.5.0?

 ===

 The current list of open tickets targeted at 3.5.0 can be found at:

 https://issues.apache.org/jira/projects/SPARK and search for "Target
 Version/s" = 3.5.0



 Committers should look at those and triage. Extremely important bug

 fixes, documentation, and API tweaks that impact compatibility should

 be worked on immediately. Everything else please retarget to an

 appropriate release.



 ==

 But my bug isn't fixed?

 ==

 In order to make timely releases, we will typically not hold the

 release unless the bug in question is a regression from the previous

 release. That being said, if there is something which is a regression

 that has not been correctly targeted please ping me or a committer to

 help target the issue.



 Thanks,

 Yuanjian Li


>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

2023-09-08 Thread Jungtaek Lim

+1 (non-binding)

Thanks for driving this release!

On Fri, Sep 8, 2023 at 11:29 AM Holden Karau  wrote:

> +1 pip installing seems to function :)
>
> On Thu, Sep 7, 2023 at 7:22 PM Yuming Wang  wrote:
>
>> +1.
>>
>> On Thu, Sep 7, 2023 at 10:33 PM yangjie01 
>> wrote:
>>
>>> +1
>>>
>>>
>>>
>>> *发件人**: *Gengliang Wang 
>>> *日期**: *2023年9月7日 星期四 12:53
>>> *收件人**: *Yuanjian Li 
>>> *抄送**: *Xiao Li , "her...@databricks.com.invalid"
>>> , Spark dev list 
>>> *主题**: *Re: [VOTE] Release Apache Spark 3.5.0 (RC4)
>>>
>>>
>>>
>>> +1
>>>
>>>
>>>
>>> On Wed, Sep 6, 2023 at 9:46 PM Yuanjian Li 
>>> wrote:
>>>
>>> +1 (non-binding)
>>>
>>> Xiao Li  于2023年9月6日周三 15:27写道：
>>>
>>> +1
>>>
>>>
>>>
>>> Xiao
>>>
>>>
>>>
>>> Herman van Hovell  于2023年9月6日周三 22:08写道：
>>>
>>> Tested connect, and everything looks good.
>>>
>>>
>>>
>>> +1
>>>
>>>
>>>
>>> On Wed, Sep 6, 2023 at 8:11 AM Yuanjian Li 
>>> wrote:
>>>
>>> Please vote on releasing the following candidate(RC4) as Apache Spark
>>> version 3.5.0.
>>>
>>>
>>>
>>> The vote is open until 11:59pm Pacific time *Sep 8th* and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>>
>>>
>>> [ ] +1 Release this package as Apache Spark 3.5.0
>>>
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>>
>>>
>>> The tag to be voted on is v3.5.0-rc4 (commit
>>> c2939589a29dd0d6a2d3d31a8d833877a37ee02a):
>>>
>>> https://github.com/apache/spark/tree/v3.5.0-rc4
>>>
>>>
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>>
>>> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc4-bin/
>>>
>>>
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>>
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>>
>>>
>>> The staging repository for this release can be found at:
>>>
>>> https://repository.apache.org/content/repositories/orgapachespark-1448
>>>
>>>
>>>
>>> The documentation corresponding to this release can be found at:
>>>
>>> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc4-docs/
>>>
>>>
>>>
>>> The list of bug fixes going into 3.5.0 can be found at the following URL:
>>>
>>> https://issues.apache.org/jira/projects/SPARK/versions/12352848
>>>
>>>
>>>
>>> This release is using the release script of the tag v3.5.0-rc4.
>>>
>>>
>>>
>>> FAQ
>>>
>>>
>>>
>>> =
>>>
>>> How can I help test this release?
>>>
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>>
>>> an existing Spark workload and running on this release candidate, then
>>>
>>> reporting any regressions.
>>>
>>>
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>>
>>> the current RC and see if anything important breaks, in the Java/Scala
>>>
>>> you can add the staging repository to your projects resolvers and test
>>>
>>> with the RC (make sure to clean up the artifact cache before/after so
>>>
>>> you don't end up building with an out of date RC going forward).
>>>
>>>
>>>
>>> ===
>>>
>>> What should happen to JIRA tickets still targeting 3.5.0?
>>>
>>> ===
>>>
>>> The current list of open tickets targeted at 3.5.0 can be found at:
>>>
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.5.0
>>>
>>>
>>>
>>> Committers should look at those and triage. Extremely important bug
>>>
>>> fixes, documentation, and API tweaks that impact compatibility should
>>>
>>> be worked on immediately. Everything else please retarget to an
>>>
>>> appropriate release.
>>>
>>>
>>>
>>> ==
>>>
>>> But my bug isn't fixed?
>>>
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>>
>>> release unless the bug in question is a regression from the previous
>>>
>>> release. That being said, if there is something which is a regression
>>>
>>> that has not been correctly targeted please ping me or a committer to
>>>
>>> help target the issue.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Yuanjian Li
>>>
>>>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Re: Elasticsearch support for Spark 3.x

2023-09-08 Thread Dipayan Dev

@Alfie Davidson  : Awesome, it worked with
"“org.elasticsearch.spark.sql”"
But as soon as I switched to *elasticsearch-spark-20_2.12, *"es" also
worked.


On Fri, Sep 8, 2023 at 12:45 PM Dipayan Dev  wrote:

>
> Let me try that and get back. Just wondering, if there a change in  the
> way we pass the format in connector from Spark 2 to 3?
>
>
> On Fri, 8 Sep 2023 at 12:35 PM, Alfie Davidson 
> wrote:
>
>> I am pretty certain you need to change the write.format from “es” to
>> “org.elasticsearch.spark.sql”
>>
>> Sent from my iPhone
>>
>> On 8 Sep 2023, at 03:10, Dipayan Dev  wrote:
>>
>> 
>>
>> ++ Dev
>>
>> On Thu, 7 Sep 2023 at 10:22 PM, Dipayan Dev 
>> wrote:
>>
>>> Hi,
>>>
>>> Can you please elaborate your last response? I don’t have any external
>>> dependencies added, and just updated the Spark version as mentioned below.
>>>
>>> Can someone help me with this?
>>>
>>> On Fri, 1 Sep 2023 at 5:58 PM, Koert Kuipers  wrote:
>>>
 could the provided scope be the issue?

 On Sun, Aug 27, 2023 at 2:58 PM Dipayan Dev 
 wrote:

> Using the following dependency for Spark 3 in POM file (My Scala
> version is 2.12.14)
>
>
>
>
>
>
> *org.elasticsearch
> elasticsearch-spark-30_2.12
> 7.12.0provided*
>
>
> The code throws error at this line :
> df.write.format("es").mode("overwrite").options(elasticOptions).save("index_name")
> The same code is working with Spark 2.4.0 and the following dependency
>
>
>
>
>
> *org.elasticsearch
> elasticsearch-spark-20_2.12
> 7.12.0*
>
>
> On Mon, 28 Aug 2023 at 12:17 AM, Holden Karau 
> wrote:
>
>> What’s the version of the ES connector you are using?
>>
>> On Sat, Aug 26, 2023 at 10:17 AM Dipayan Dev 
>> wrote:
>>
>>> Hi All,
>>>
>>> We're using Spark 2.4.x to write dataframe into the Elasticsearch
>>> index.
>>> As we're upgrading to Spark 3.3.0, it throwing out error
>>> Caused by: java.lang.ClassNotFoundException: es.DefaultSource
>>> at
>>> java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
>>> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
>>> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
>>>
>>> Looking at a few responses from Stackoverflow
>>> . it seems this is not yet
>>> supported by Elasticsearch-hadoop.
>>>
>>> Does anyone have experience with this? Or faced/resolved this issue
>>> in Spark 3?
>>>
>>> Thanks in advance!
>>>
>>> Regards
>>> Dipayan
>>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>
 CONFIDENTIALITY NOTICE: This electronic communication and any files
 transmitted with it are confidential, privileged and intended solely for
 the use of the individual or entity to whom they are addressed. If you are
 not the intended recipient, you are hereby notified that any disclosure,
 copying, distribution (electronic or otherwise) or forwarding of, or the
 taking of any action in reliance on the contents of this transmission is
 strictly prohibited. Please notify the sender immediately by e-mail if you
 have received this email by mistake and delete this email from your system.

 Is it necessary to print this email? If you care about the environment
 like we do, please refrain from printing emails. It helps to keep the
 environment forested and litter-free.
>>>
>>>

Re: Elasticsearch support for Spark 3.x

2023-09-08 Thread Dipayan Dev

Let me try that and get back. Just wondering, if there a change in  the way
we pass the format in connector from Spark 2 to 3?


On Fri, 8 Sep 2023 at 12:35 PM, Alfie Davidson 
wrote:

> I am pretty certain you need to change the write.format from “es” to
> “org.elasticsearch.spark.sql”
>
> Sent from my iPhone
>
> On 8 Sep 2023, at 03:10, Dipayan Dev  wrote:
>
> 
>
> ++ Dev
>
> On Thu, 7 Sep 2023 at 10:22 PM, Dipayan Dev 
> wrote:
>
>> Hi,
>>
>> Can you please elaborate your last response? I don’t have any external
>> dependencies added, and just updated the Spark version as mentioned below.
>>
>> Can someone help me with this?
>>
>> On Fri, 1 Sep 2023 at 5:58 PM, Koert Kuipers  wrote:
>>
>>> could the provided scope be the issue?
>>>
>>> On Sun, Aug 27, 2023 at 2:58 PM Dipayan Dev 
>>> wrote:
>>>
 Using the following dependency for Spark 3 in POM file (My Scala
 version is 2.12.14)






 *org.elasticsearch
 elasticsearch-spark-30_2.12
 7.12.0provided*


 The code throws error at this line :
 df.write.format("es").mode("overwrite").options(elasticOptions).save("index_name")
 The same code is working with Spark 2.4.0 and the following dependency





 *org.elasticsearch
 elasticsearch-spark-20_2.12
 7.12.0*


 On Mon, 28 Aug 2023 at 12:17 AM, Holden Karau 
 wrote:

> What’s the version of the ES connector you are using?
>
> On Sat, Aug 26, 2023 at 10:17 AM Dipayan Dev 
> wrote:
>
>> Hi All,
>>
>> We're using Spark 2.4.x to write dataframe into the Elasticsearch
>> index.
>> As we're upgrading to Spark 3.3.0, it throwing out error
>> Caused by: java.lang.ClassNotFoundException: es.DefaultSource
>> at
>> java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
>> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
>> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
>>
>> Looking at a few responses from Stackoverflow
>> . it seems this is not yet
>> supported by Elasticsearch-hadoop.
>>
>> Does anyone have experience with this? Or faced/resolved this issue
>> in Spark 3?
>>
>> Thanks in advance!
>>
>> Regards
>> Dipayan
>>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

>>> CONFIDENTIALITY NOTICE: This electronic communication and any files
>>> transmitted with it are confidential, privileged and intended solely for
>>> the use of the individual or entity to whom they are addressed. If you are
>>> not the intended recipient, you are hereby notified that any disclosure,
>>> copying, distribution (electronic or otherwise) or forwarding of, or the
>>> taking of any action in reliance on the contents of this transmission is
>>> strictly prohibited. Please notify the sender immediately by e-mail if you
>>> have received this email by mistake and delete this email from your system.
>>>
>>> Is it necessary to print this email? If you care about the environment
>>> like we do, please refrain from printing emails. It helps to keep the
>>> environment forested and litter-free.
>>
>>

Re: Elasticsearch support for Spark 3.x

2023-09-08 Thread Alfie Davidson

I am pretty certain you need to change the write.format from “es” to “org.elasticsearch.spark.sql”Sent from my iPhoneOn 8 Sep 2023, at 03:10, Dipayan Dev  wrote:++ DevOn Thu, 7 Sep 2023 at 10:22 PM, Dipayan Dev  wrote:Hi, Can you please elaborate your last response? I don’t have any external dependencies added, and just updated the Spark version as mentioned below. Can someone help me with this?On Fri, 1 Sep 2023 at 5:58 PM, Koert Kuipers  wrote:could the provided scope be the issue?On Sun, Aug 27, 2023 at 2:58 PM Dipayan Dev  wrote:Using the following dependency for Spark 3 in POM file (My Scala version is 2.12.14)    org.elasticsearch    elasticsearch-spark-30_2.12    7.12.0    providedThe code throws error at this line : df.write.format("es").mode("overwrite").options(elasticOptions).save("index_name")The same code is working with Spark 2.4.0 and the following dependency     org.elasticsearch    elasticsearch-spark-20_2.12    7.12.0On Mon, 28 Aug 2023 at 12:17 AM, Holden Karau  wrote:What’s the version of the ES connector you are using?On Sat, Aug 26, 2023 at 10:17 AM Dipayan Dev  wrote:Hi All,We're using Spark 2.4.x to write dataframe into the Elasticsearch index. As we're upgrading to Spark 3.3.0, it throwing out error Caused by: java.lang.ClassNotFoundException: es.DefaultSource	at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)Looking at a few responses from Stackoverflow. it seems this is not yet supported by Elasticsearch-hadoop. Does anyone have experience with this? Or faced/resolved this issue in Spark 3? Thanks in advance! RegardsDipayan
-- Twitter: https://twitter.com/holdenkarauBooks (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau




CONFIDENTIALITY NOTICE: This electronic communication and any files transmitted with it are confidential, privileged and intended solely for the use of the individual or entity to whom they are addressed. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution (electronic or otherwise) or forwarding of, or the taking of any action in reliance on the contents of this transmission is strictly prohibited. Please notify the sender immediately by e-mail if you have received this email by mistake and delete this email from your system.Is it necessary to print this email? If you care about the environment like we do, please refrain from printing emails. It helps to keep the environment forested and litter-free.

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

Re: [VOTE] Release Apache Spark 3.5.0 (RC4)

Re: Elasticsearch support for Spark 3.x

Re: Elasticsearch support for Spark 3.x

Re: Elasticsearch support for Spark 3.x

7 matches

Site Navigation

Mail list logo

Footer information