Re: Which version of spark version supports parquet version 2 ?

2024-04-18 Thread Bjørn Jørgensen
>>> >>>>>>>>>>> On Mon, 15 Apr 2024 at 21:33, Prem Sahoo >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thank you so much for the info! But do we have any release

Re: External Spark shuffle service for k8s

2024-04-11 Thread Bjørn Jørgensen
directories are explicitly specified then a default directory is created and configured appropriately. emptyDir volumes use the ephemeral storage feature of Kubernetes and do not persist beyond the life of the pod. tor. 11. apr. 2024 kl. 10:29 skrev Bjørn Jørgensen : > " In the end for my usecase

Re: External Spark shuffle service for k8s

2024-04-11 Thread Bjørn Jørgensen
gt;>>> >>>>>>>> You can make a PVC on K8S call it 300GB >>>>>>>> >>>>>>>> make a folder in yours dockerfile >>>>>>>> WORKDIR /opt/spark/work-dir >>>>>>>> RUN chmod g+w /opt/spark/work-dir >>>>>>>> >>>>>>>> start spark with adding this >>>>>>>> >>>>>>>> .config("spark.kubernetes.driver.volumes.persistentVolumeClaim.300gb.options.claimName", >>>>>>>> "300gb") \ >>>>>>>> >>>>>>>> .config("spark.kubernetes.driver.volumes.persistentVolumeClaim.300gb.mount.path", >>>>>>>> "/opt/spark/work-dir") \ >>>>>>>> >>>>>>>> .config("spark.kubernetes.driver.volumes.persistentVolumeClaim.300gb.mount.readOnly", >>>>>>>> "False") \ >>>>>>>> >>>>>>>> .config("spark.kubernetes.executor.volumes.persistentVolumeClaim.300gb.options.claimName", >>>>>>>> "300gb") \ >>>>>>>> >>>>>>>> .config("spark.kubernetes.executor.volumes.persistentVolumeClaim.300gb.mount.path", >>>>>>>> "/opt/spark/work-dir") \ >>>>>>>> >>>>>>>> .config("spark.kubernetes.executor.volumes.persistentVolumeClaim.300gb.mount.readOnly", >>>>>>>> "False") \ >>>>>>>> .config("spark.local.dir", "/opt/spark/work-dir") >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> lør. 6. apr. 2024 kl. 15:45 skrev Mich Talebzadeh < >>>>>>>> mich.talebza...@gmail.com>: >>>>>>>> >>>>>>>>> I have seen some older references for shuffle service for k8s, >>>>>>>>> although it is not clear they are talking about a generic shuffle >>>>>>>>> service for k8s. >>>>>>>>> >>>>>>>>> Anyhow with the advent of genai and the need to allow for a larger >>>>>>>>> volume of data, I was wondering if there has been any more work on >>>>>>>>> this matter. Specifically larger and scalable file systems like >>>>>>>>> HDFS, >>>>>>>>> GCS , S3 etc, offer significantly larger storage capacity than >>>>>>>>> local >>>>>>>>> disks on individual worker nodes in a k8s cluster, thus allowing >>>>>>>>> handling much larger datasets more efficiently. Also the degree of >>>>>>>>> parallelism and fault tolerance with these files systems come into >>>>>>>>> it. I will be interested in hearing more about any progress on >>>>>>>>> this. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> . >>>>>>>>> >>>>>>>>> Mich Talebzadeh, >>>>>>>>> >>>>>>>>> Technologist | Solutions Architect | Data Engineer | Generative AI >>>>>>>>> >>>>>>>>> London >>>>>>>>> United Kingdom >>>>>>>>> >>>>>>>>> >>>>>>>>>view my Linkedin profile >>>>>>>>> >>>>>>>>> >>>>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Disclaimer: The information provided is correct to the best of my >>>>>>>>> knowledge but of course cannot be guaranteed . It is essential to >>>>>>>>> note >>>>>>>>> that, as with any advice, quote "one test result is worth >>>>>>>>> one-thousand >>>>>>>>> expert opinions (Werner Von Braun)". >>>>>>>>> >>>>>>>>> >>>>>>>>> - >>>>>>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Bjørn Jørgensen >>>>>>>> Vestre Aspehaug 4, 6010 Ålesund >>>>>>>> Norge >>>>>>>> >>>>>>>> +47 480 94 297 >>>>>>>> >>>>>>> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: External Spark shuffle service for k8s

2024-04-06 Thread Bjørn Jørgensen
The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner Von Braun)". > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Bjørn Jørgensen
HrbR-XT-OQ!Wu9fFP8RFJW2N_YUvwl9yctGHxtM-CFPe6McqOJDrxGBjIaRoF8vRwpjT9WzHojwI2R09Nbg8YE9ggB4FtocU8cQFw$> > > > > > > > > Disclaimer: The information provided is correct to the best of my > > knowledge but of course cannot be guaranteed . It is essential to note > > that, as with any advice, quote "one test result is worth one-thousand > > expert opinions (Werner Von Braun)". > > > > - > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: When and how does Spark use metastore statistics?

2023-12-26 Thread Bjørn Jørgensen
Tell me more about spark.sql.cbo.strategy tir. 12. des. 2023 kl. 00:25 skrev Nicholas Chammas < nicholas.cham...@gmail.com>: > Where exactly are you getting this information from? > > As far as I can tell, spark.sql.cbo.enabled has defaulted to false since > it was introduced 7 years ago >

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-10 Thread Bjørn Jørgensen
46e1%7C0%7C0%7C638351737993352064%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=jltCb10Ws2CxEHh4%2FF%2Big96Tt8U1UCEZlmhAuWRxx9Y%3D=0>). >>>>>> This move has gained wide industry adoption and contributions from the >>>>>> community. In a mere year, the Flink operator has garnered more than 600 >>>>>> stars and has attracted contributions from over 80 contributors. This >>>>>> showcases the level of community interest and collaborative momentum that >>>>>> can be achieved in similar scenarios. >>>>>> More details can be found at SPIP doc : Spark Kubernetes Operator >>>>>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE >>>>>> <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE=05%7C01%7Cif56%40g.cornell.edu%7C6b33babc19c64437ef0408dbe18607c6%7C5d7e43661b9b45cf8e79b14b27df46e1%7C0%7C0%7C638351737993352064%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=w8FrIp88nEpI7lXCBy7Y2U9NZ0uy%2B2Bssu7wjFqZCFw%3D=0> >>>>>> >>>>>> Thanks, >>>>>> -- >>>>>> *Zhou JIANG* >>>>>> >>>>>> >>>>>> >>> >>> -- >>> Ryan Blue >>> Tabular >>> >> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: [Reminder] Spark 3.5 RC Cut

2023-08-02 Thread Bjørn Jørgensen
@Dongjoon Hyun FYI [image: image.png] We better ask common-...@hadoop.apache.org. ons. 2. aug. 2023 kl. 18:03 skrev Dongjoon Hyun : > Oh, I got it, Emil and Bjorn. > > Dongjoon. > > On Wed, Aug 2, 2023 at 12:32 AM Bjørn Jørgensen > wrote: > >> "*As far as

Re: [Reminder] Spark 3.5 RC Cut

2023-08-02 Thread Bjørn Jørgensen
Date Event > > > > July 17th 2023 > > > > Late July > > > > 2023 Code freeze. Release branch cut. > > > > QA period. Focus on bug fixes, tests, stability and docs. > > > > Generally, no new features merged. > > > > > > > > > > > > August 2023 Release candidates (RC), voting, etc. until > > final > > > release passes > > > > > > > > > > > > Best, > > > > Yuanjian > > > > > > > > > - > > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > <mailto:dev-unsubscr...@spark.apache.org> > > > <mailto:dev-unsubscr...@spark.apache.org > > <mailto:dev-unsubscr...@spark.apache.org>> > > > > > > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: Spark 3.4.0 and 3.4.1 and Java version in Dockerfile

2023-07-22 Thread Bjørn Jørgensen
ofile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may a

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-20 Thread Bjørn Jørgensen
gt; > > I'm looking forward to seeing the upcoming detailed discussions >>> including >>> > > the following >>> > > - Apache Spark 4.0.0 Preview (and Dates) >>> > > - Apache Spark 4.0.0 Items >>> > > - Apache Spark 4.0.0 Plan Adjustment >>> > > >&g

Re: Apache Spark 3.5.0 Expectations (?)

2023-05-31 Thread Bjørn Jørgensen
@Cheng Pan https://issues.apache.org/jira/browse/HIVE-22126 ons. 31. mai 2023 kl. 03:58 skrev Cheng Pan : > @Bjørn Jørgensen > > I did some investigation on upgrading Guava after Spark drop Hadoop2 > support, but unfortunately, the Hive still depends on it, the worse thing > i

Re: Apache Spark 3.5.0 Expectations (?)

2023-05-30 Thread Bjørn Jørgensen
71 >>>>>> SPARK-43394 Upgrade to Maven 3.8.8 >>>>>> SPARK-43436 Upgrade to RocksDbjni 8.1.1.1 >>>>>> SPARK-43446 Upgrade to Apache Arrow 12.0.0 >>>>>> SPARK-43447 Support R 4.3.0 >>>>>> SPARK-43489 Remove protobuf 2.5.0 >>>>>> SPARK-43519 Bump Parquet to 1.13.1 >>>>>> SPARK-43581 Upgrade kubernetes-client to 6.6.2 >>>>>> SPARK-43588 Upgrade to ASM 9.5 >>>>>> SPARK-43600 Update K8s doc to recommend K8s 1.24+ >>>>>> SPARK-43738 Upgrade to DropWizard Metrics 4.2.18 >>>>>> SPARK-43831 Build and Run Spark on Java 21 >>>>>> SPARK-43832 Upgrade to Scala 2.12.18 >>>>>> SPARK-43836 Make Scala 2.13 as default in Spark 3.5 >>>>>> SPARK-43842 Upgrade gcs-connector to 2.2.14 >>>>>> SPARK-43844 Update to ORC 1.9.0 >>>>>> UMBRELLA: Add SQL functions into Scala, Python and R API >>>>>> >>>>>> Thanks, >>>>>> Dongjoon. >>>>>> >>>>>> PS. The above is not a list of release blockers. Instead, it could be >>>>>> a nice-to-have from someone's perspective. >>>>>> >>>>> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: Slack for Spark Community: Merging various threads

2023-04-07 Thread Bjørn Jørgensen
>> > *Developer* discussions should still happen on email, JIRA and GitHub >> and be async-friendly (72-hour rule) to fit the ASF’s development model. >> >> >> Are there any other questions? >> >> >> Dongjoon. >> >> >> -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: Slack for PySpark users

2023-04-04 Thread Bjørn Jørgensen
t;>>>>>>>>> been >>>>>>>>>> suggested as well so those who like investigative search can agree >>>>>>>>>> and come >>>>>>>>>> up with a freebie one. >>>>>>>>

Re: Slack for PySpark users

2023-03-30 Thread Bjørn Jørgensen
>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, 28 Mar 2023 at 03:52, asma zgolli >>>>>> wrote: >>>>>> >>>>>>> +1 good idea, I d like to join as well. >>>>>>> >>>>>>> Le mar. 28 mars 2023 à 04:09, Winston Lai a >>>>>>> écrit : >>>>>>> >>>>>>>> Please let us know when the channel is created. I'd like to join :) >>>>>>>> >>>>>>>> Thank You & Best Regards >>>>>>>> Winston Lai >>>>>>>> -- >>>>>>>> *From:* Denny Lee >>>>>>>> *Sent:* Tuesday, March 28, 2023 9:43:08 AM >>>>>>>> *To:* Hyukjin Kwon >>>>>>>> *Cc:* keen ; u...@spark.apache.org < >>>>>>>> u...@spark.apache.org> >>>>>>>> *Subject:* Re: Slack for PySpark users >>>>>>>> >>>>>>>> +1 I think this is a great idea! >>>>>>>> >>>>>>>> On Mon, Mar 27, 2023 at 6:24 PM Hyukjin Kwon >>>>>>>> wrote: >>>>>>>> >>>>>>>> Yeah, actually I think we should better have a slack channel so we >>>>>>>> can easily discuss with users and developers. >>>>>>>> >>>>>>>> On Tue, 28 Mar 2023 at 03:08, keen wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> I really like *Slack *as communication channel for a tech >>>>>>>> community. >>>>>>>> There is a Slack workspace for *delta lake users* ( >>>>>>>> https://go.delta.io/slack) that I enjoy a lot. >>>>>>>> I was wondering if there is something similar for PySpark users. >>>>>>>> >>>>>>>> If not, would there be anything wrong with creating a new >>>>>>>> Slack workspace for PySpark users? (when explicitly mentioning that >>>>>>>> this is >>>>>>>> *not* officially part of Apache Spark)? >>>>>>>> >>>>>>>> Cheers >>>>>>>> Martin >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Asma ZGOLLI >>>>>>> >>>>>>> Ph.D. in Big Data - Applied Machine Learning >>>>>>> >>>>>>> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: Topics for Spark online classes & webinars

2023-03-28 Thread Bjørn Jørgensen
t;>>view my Linkedin profile >>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>> >>>>> >>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>> >>>>> >>>>> >>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>>> any loss, damage or destruction of data or any other property which may >>>>> arise from relying on this email's technical content is explicitly >>>>> disclaimed. The author will in no case be liable for any monetary damages >>>>> arising from such loss, damage or destruction. >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, 13 Mar 2023 at 16:29, asma zgolli >>>>> wrote: >>>>> >>>>> Hello Mich, >>>>> >>>>> Can you please provide the link for the confluence page? >>>>> >>>>> Many thanks >>>>> Asma >>>>> Ph.D. in Big Data - Applied Machine Learning >>>>> >>>>> Le lun. 13 mars 2023 à 17:21, Mich Talebzadeh < >>>>> mich.talebza...@gmail.com> a écrit : >>>>> >>>>> Apologies I missed the list. >>>>> >>>>> To move forward I selected these topics from the thread "Online >>>>> classes for spark topics". >>>>> >>>>> To take this further I propose a confluence page to be seup. >>>>> >>>>> >>>>>1. Spark UI >>>>>2. Dynamic allocation >>>>>3. Tuning of jobs >>>>>4. Collecting spark metrics for monitoring and alerting >>>>>5. For those who prefer to use Pandas API on Spark since the >>>>>release of Spark 3.2, What are some important notes for those users? >>>>> For >>>>>example, what are the additional factors affecting the Spark >>>>> performance >>>>>using Pandas API on Spark? How to tune them in addition to the >>>>> conventional >>>>>Spark tuning methods applied to Spark SQL users. >>>>>6. Spark internals and/or comparing spark 3 and 2 >>>>>7. Spark Streaming & Spark Structured Streaming >>>>>8. Spark on notebooks >>>>>9. Spark on serverless (for example Spark on Google Cloud) >>>>>10. Spark on k8s >>>>> >>>>> Opinions and how to is welcome >>>>> >>>>> >>>>>view my Linkedin profile >>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>> >>>>> >>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>> >>>>> >>>>> >>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>>> any loss, damage or destruction of data or any other property which may >>>>> arise from relying on this email's technical content is explicitly >>>>> disclaimed. The author will in no case be liable for any monetary damages >>>>> arising from such loss, damage or destruction. >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, 13 Mar 2023 at 16:16, Mich Talebzadeh < >>>>> mich.talebza...@gmail.com> wrote: >>>>> >>>>> Hi guys >>>>> >>>>> To move forward I selected these topics from the thread "Online >>>>> classes for spark topics". >>>>> >>>>> To take this further I propose a confluence page to be seup. >>>>> >>>>> Opinions and how to is welcome >>>>> >>>>> Cheers >>>>> >>>>> >>>>> >>>>>view my Linkedin profile >>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>> >>>>> >>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>> >>>>> >>>>> >>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>>> any loss, damage or destruction of data or any other property which may >>>>> arise from relying on this email's technical content is explicitly >>>>> disclaimed. The author will in no case be liable for any monetary damages >>>>> arising from such loss, damage or destruction. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >> >> -- >> Asma ZGOLLI >> >> PhD in Big Data - Applied Machine Learning >> Email : zgollia...@gmail.com >> Tel : (+49) 015777685768 >> Skype : asma_zgolli >> > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Failed to build master google protobuf protoc-3.22.0-linux-x86_64.exe

2023-03-16 Thread Bjørn Jørgensen
? https://github.com/bjornjorgensen/jupyter-spark-master-docker/actions/runs/4434813547/jobs/7782658831#step:7:37761 Should we contact Google about this? -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: Topics for Spark online classes & webinars

2023-03-15 Thread Bjørn Jørgensen
gt;>>>using Pandas API on Spark? How to tune them in addition to the >>>>>>> conventional >>>>>>>Spark tuning methods applied to Spark SQL users. >>>>>>>6. Spark internals and/or comparing spark 3 and 2 >>

Re: [VOTE] Release Apache Spark 3.4.0 (RC4)

2023-03-12 Thread Bjørn Jørgensen
ent-jvm_2.12 lør. 11. mar. 2023 kl. 13:43 skrev yangjie01 : > Can you test `./build/mvn clean package -Phive` ? Thanks > > > > > > *发件人**: *Bjørn Jørgensen > *日期**: *2023年3月11日 星期六 20:33 > *收件人**: *Xinrong Meng > *抄送**: *beliefer , dev > *主题**: *Re: Re: [V

Re: Re: [VOTE] Release Apache Spark 3.4.0 (RC4)

2023-03-11 Thread Bjørn Jørgensen
>>> Version/s" = 3.4.0 >>> >>> Committers should look at those and triage. Extremely important bug >>> fixes, documentation, and API tweaks that impact compatibility should >>> be worked on immediately. Everything else please retarget to an >>> appropriate release. >>> >>> == >>> But my bug isn't fixed? >>> == >>> In order to make timely releases, we will typically not hold the >>> release unless the bug in question is a regression from the previous >>> release. That being said, if there is something which is a regression >>> that has not been correctly targeted please ping me or a committer to >>> help target the issue. >>> >>> Thanks, >>> Xinrong Meng >>> >> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: spark executor pod has same memory value for request and limit

2023-03-10 Thread Bjørn Jørgensen
hat reason, >>>> spark.executor.memory is assigned to requests.memory and limits.memory >>>> like the following >>>> >>>> Limits: >>>> memory: 5734MiRequests: >>>> cpu: 4 >>>> memory: 5734Mi >>>> >>>> >>>> Is there any special reason to not have >>>> spark.kubernetes.executor.request.memory parameter? >>>> and can I use spark.kubernetes.executor.podTemplateFile parameter to >>>> set smaller memory request than the memory limit in pod template file? >>>> >>>> >>>> Limits: >>>> memory: 5734MiRequests: >>>> cpu: 4 >>>> memory: 1024Mi >>>> >>>> >>>> Thanks >>>> >>>> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: [VOTE] Release Apache Spark 3.4.0 (RC1)

2023-02-22 Thread Bjørn Jørgensen
s release candidate, then >>>>> reporting any regressions. >>>>> >>>>> If you're working in PySpark you can set up a virtual env and install >>>>> the current RC and see if anything important breaks, in the Java/Scala >>>>> you can add the staging repository to your projects resolvers and test >>>>> with the RC (make sure to clean up the artifact cache before/after so >>>>> you don't end up building with a out of date RC going forward). >>>>> >>>>> === >>>>> What should happen to JIRA tickets still targeting 3.4.0? >>>>> === >>>>> The current list of open tickets targeted at 3.4.0 can be found at: >>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target >>>>> Version/s" = 3.4.0 >>>>> >>>>> Committers should look at those and triage. Extremely important bug >>>>> fixes, documentation, and API tweaks that impact compatibility should >>>>> be worked on immediately. Everything else please retarget to an >>>>> appropriate release. >>>>> >>>>> == >>>>> But my bug isn't fixed? >>>>> == >>>>> In order to make timely releases, we will typically not hold the >>>>> release unless the bug in question is a regression from the previous >>>>> release. That being said, if there is something which is a regression >>>>> that has not been correctly targeted please ping me or a committer to >>>>> help target the issue. >>>>> >>>>> Thanks, >>>>> Xinrong Meng >>>>> >>>> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Bjørn Jørgensen
There is a fix for python 3.11 https://github.com/apache/spark/pull/38987 We should have this in more branches. man. 13. feb. 2023 kl. 09:39 skrev Bjørn Jørgensen : > On manjaro it is Python 3.10.9 > > On ubuntu it is Python 3.11.1 > > man. 13. feb. 2023 kl. 03:24 skrev yangjie

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-13 Thread Bjørn Jørgensen
thon 3.10, they will succeed > > > > YangJie > > > > *发件人**: *Bjørn Jørgensen > *日期**: *2023年2月13日 星期一 05:09 > *收件人**: *Sean Owen > *抄送**: *"L. C. Hsieh" , Spark dev list < > dev@spark.apache.org> > *主题**: *Re: [VOTE] Release Spar

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-12 Thread Bjørn Jørgensen
Server VM (build 17.0.6+10, mixed mode) :) So I'm +1 søn. 12. feb. 2023 kl. 12:53 skrev Bjørn Jørgensen : > I use ubuntu rolling > $ java -version > openjdk version "17.0.6" 2023-01-17 > OpenJDK Runtime Environment (build 17.0.6+10-Ubuntu-0ubuntu1) > OpenJDK 64-Bi

Re: [VOTE] Release Spark 3.3.2 (RC1)

2023-02-12 Thread Bjørn Jørgensen
** >> FAILED ***") in the log file. >> >> Maybe it is due to the dev env? What dev env you're using to run the test? >> >> >> On Sat, Feb 11, 2023 at 8:58 AM Bjørn Jørgensen >> wrote: >> > >> > >> > ./build/mvn clean p

Re: Building Spark to run PySpark Tests?

2023-01-18 Thread Bjørn Jørgensen
ime Environment Homebrew (build 11.0.17+0) >> > OpenJDK 64-Bit Server VM Homebrew (build 11.0.17+0, mixed mode) >> > ``` >> > >> > Alternatively, I've also tried simply building Spark and using a >> python=3.9 venv and installing the requirements from `pip install -r >> dev/requirements.txt` and using that as the interpreter to run tests. >> However, I was running into some failing pandas test which to me seemed >> like it was coming from a pandas version difference as `requirements.txt` >> didn't specify a version. >> > >> > I suppose I have a couple of questions in regards to this: >> > 1. Am I missing a build step to build Spark and run PySpark unit tests? >> > 2. Where could I find whether an upstream test is failing for a >> specific release? >> > 3. Would it be possible to configure the `run-tests` script to run all >> tests regardless of test failures? >> >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: [Suggest] Add geo function to core

2023-01-17 Thread Bjørn Jørgensen
>> > > > >> > > > It is possible to write some code in an implementation-independent >> way >> > > using GeoAPI interfaces, which aim to do what JDBC interfaces do for >> > > databases. Apache SIS and PROJ-JNI are implementations of GeoAPI >> > > interfaces, so by using those interfaces you can let users choose >> among >> > > those two implementations. I think that GeoAPI wrappers could easily >> be >> > > contributed to PROJ4J as well if there is a desire for that. >> > > > >> > > > Regarding Geohash, if we are talking about the algorithm described >> at >> > > https://en.wikipedia.org/wiki/Geohash, then SIS already supports it. >> SIS >> > > supports also the Military Grid Reference System (MGRS), which can be >> seen >> > > as another kind of geohash with better characteristics. >> > > > >> > > > Regards, >> > > > >> > > > Martin >> > > > >> > > > >> - >> > > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> > > > >> > > > >> > > >> > > - >> > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> > > >> > > >> > >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: [VOTE][RESULT] Release Spark 3.2.3, RC1

2022-11-26 Thread Bjørn Jørgensen
How things going with this relase? I can't find it on https://spark.apache.org/downloads.html fre. 18. nov. 2022, 19:38 skrev Chao Sun : > CORRECTED: > > The vote passes with 12 +1s (6 binding +1s). > Thanks to all who helped with the release! > > (* = binding) > +1: > - Dongjoon Hyun (*) > - L.

Re: Contributor privilege

2022-11-11 Thread Bjørn Jørgensen
e user id is: dengziming > > -- > Best, > Ziming > > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Upgrade guava to 31.1-jre and remove hadoop2

2022-11-06 Thread Bjørn Jørgensen
Hi, anyone that has tried to upgrade guava now after we stop supporting hadoop2? And is there a plan for removing hadoop2 code from the code base?

Re: Issue with SparkContext

2022-09-20 Thread Bjørn Jørgensen
JavaError while running SparkContext. > Can you please help me to resolve this issue. > > > > Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for > Windows > > > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: Jupyter notebook on Dataproc versus GKE

2022-09-14 Thread Bjørn Jørgensen
r scheduling. > > On Tue, Sep 6, 2022 at 10:01 AM Mich Talebzadeh > wrote: > >> Thank you all. >> >> Has anyone used Argo for k8s scheduler by any chance? >> >> On Tue, 6 Sep 2022 at 13:41, Bjørn Jørgensen >> wrote: >> >>> "*Jupyt

Re: Time for Spark 3.3.1 release?

2022-09-14 Thread Bjørn Jørgensen
At least we should upgrade hadoop to the latest version https://hadoop.apache.org/release/2.10.2.html Are there some spesial reasons why we have a hadoop version that is 7 years old? ons. 14. sep. 2022, 20:25 skrev Dongjoon Hyun : > Ya, +1 for Sean's comment. > > In addition, all Apache Spark's

Re: Jupyter notebook on Dataproc versus GKE

2022-09-06 Thread Bjørn Jørgensen
rom relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Mon, 5 Sept 2022 at 20:58, Bjørn Jørgensen > wrote: > >&g

Re: Jupyter notebook on Dataproc versus GKE

2022-09-05 Thread Bjørn Jørgensen
y loss, damage or destruction of data or any other property which may >>>> arise from relying on this email's technical content is explicitly >>>> disclaimed. The author will in no case be liable for any monetary damages >>>> arising from such loss, damage or destruction. >>>> >>>> >>>> >>> -- >>> Twitter: https://twitter.com/holdenkarau >>> Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> >> -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: Welcoming three new PMC members

2022-08-10 Thread Bjørn Jørgensen
Congratulations :) tir. 9. aug. 2022 kl. 18:40 skrev Xiao Li : > Hi all, > > The Spark PMC recently voted to add three new PMC members. Join me in > welcoming them to their new roles! > > New PMC members: Huaxin Gao, Gengliang Wang and Maxim Gekk > > The Spark PMC >

Re: Welcome Xinrong Meng as a Spark committer

2022-08-10 Thread Bjørn Jørgensen
nthusiastically. Please join me > in welcoming Xinrong! > > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: Apache Spark 3.2.2 Release?

2022-07-06 Thread Bjørn Jørgensen
+1 ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon : > Yeah +1 > > On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun > wrote: > >> Hi, All. >> >> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches >> including 11 correctness patches arrived at branch-3.2. >> >> Shall we make a new release,

Re: The draft of the Spark 3.3.0 release notes

2022-06-03 Thread Bjørn Jørgensen
fied that any disclosure, >> copying, distribution (electronic or otherwise) or forwarding of, or the >> taking of any action in reliance on the contents of this transmission is >> strictly prohibited. Please notify the sender immediately by e-mail if you >> have received this

Re: Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

2022-05-18 Thread Bjørn Jørgensen
t; (pandas-on-Spark) in PR titles? We already use "ps" in >> > many places when we: import pyspark.pandas as ps. >> > This is similar to "Structured Streaming" in JIRA, and "SS" in PR >> title. >> > >> > I think it'd be easier to track the changes here with that. >> > Currently it's a bit difficult to identify it from pure PySpark >> changes. >> > >> >> >> -- >> Best regards, >> Maciej Szymkiewicz >> >> Web: https://zero323.net >> PGP: A30CEF0C31A501EC >> > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: Issue on Spark on K8s with Proxy user on Kerberized HDFS : Spark-25355

2022-05-03 Thread Bjørn Jørgensen
cessorImpl.newInstance(Unknown >>>> Source) >>>> >>>> at java.base/java.lang.reflect.Constructor.newInstance(Unknown >>>> Source) >>>> >>>> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) >>>> >>>> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) >>>> >>>> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1501) >>>> >>>> at org.apache.hadoop.ipc.Client.call(Client.java:1443) >>>> >>>> at org.apache.hadoop.ipc.Client.call(Client.java:1353) >>>> >>>> at >>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) >>>> >>>> at >>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) >>>> >>>> at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) >>>> >>>> at >>>> >>>> >>>> >>>> On debugging deep , we found the proxy user doesn't have access to >>>> delegation tokens in case of K8s .SparkSubmit.submit explicitly creating >>>> the proxy user and this user doesn't have delegation token. >>>> >>>> >>>> Please help me with the same. >>>> >>>> >>>> Regards >>>> >>>> Pralabh Kumar >>>> >>>> >>>> >>>> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: Tools for regression testing

2022-03-24 Thread Bjørn Jørgensen
edin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destructi

Re: Tools for regression testing

2022-03-24 Thread Bjørn Jørgensen
or any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: One click to run Spark on Kubernetes

2022-02-23 Thread Bjørn Jørgensen
nical content is explicitly >>>> disclaimed. The author will in no case be liable for any monetary damages >>>> arising from such loss, damage or destruction. >>>> >>>> >>>> >>>> >>>> On Wed, 23 Feb 2022 at 04:06, bo yang wrote: >>>> >>>>> Hi Spark Community, >>>>> >>>>> We built an open source tool to deploy and run Spark on Kubernetes >>>>> with a one click command. For example, on AWS, it could automatically >>>>> create an EKS cluster, node group, NGINX ingress, and Spark Operator. Then >>>>> you will be able to use curl or a CLI tool to submit Spark application. >>>>> After the deployment, you could also install Uber Remote Shuffle Service >>>>> to >>>>> enable Dynamic Allocation on Kuberentes. >>>>> >>>>> Anyone interested in using or working together on such a tool? >>>>> >>>>> Thanks, >>>>> Bo >>>>> >>>>> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: [ANNOUNCE] Apache Spark 3.1.3 released + Docker images

2022-02-22 Thread Bjørn Jørgensen
>>>>>> We are happy to announce the availability of Spark 3.1.3! >>>>>>> >>>>>>> Spark 3.1.3 is a maintenance release containing stability fixes. This >>>>>>> release is based on the branch-3.1 maintenance branch of Spark. We >>>>>>> strongly >>>>>>> recommend all 3.1 users to upgrade to this stable release. >>>>>>> >>>>>>> To download Spark 3.1.3, head over to the download page: >>>>>>> https://spark.apache.org/downloads.html >>>>>>> >>>>>>> To view the release notes: >>>>>>> https://spark.apache.org/releases/spark-release-3-1-3.html >>>>>>> >>>>>>> We would like to acknowledge all community members for contributing >>>>>>> to this >>>>>>> release. This release would not have been possible without you. >>>>>>> >>>>>>> *New Dockerhub magic in this release:* >>>>>>> >>>>>>> We've also started publishing docker containers to the Apache >>>>>>> Dockerhub, >>>>>>> these contain non-ASF artifacts that are subject to different >>>>>>> license terms than the >>>>>>> Spark release. The docker containers are built for Linux x86 and >>>>>>> ARM64 since that's >>>>>>> what I have access to (thanks to NV for the ARM64 machines). >>>>>>> >>>>>>> You can get them from https://hub.docker.com/apache/spark (and >>>>>>> spark-r and spark-py) :) >>>>>>> (And version 3.2.1 is also now published on Dockerhub). >>>>>>> >>>>>>> Holden >>>>>>> >>>>>>> -- >>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Twitter: https://twitter.com/holdenkarau >>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>> >>>> >>> >>> -- >>> Twitter: https://twitter.com/holdenkarau >>> Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> >> -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Bjørn Jørgensen
Ok, but deleting users' data without them knowing it is never a good idea. That's why I give this RC -1. lør. 22. jan. 2022 kl. 00:16 skrev Sean Owen : > (Bjorn - unless this is a regression, it would not block a release, even > if it's a bug) > > On Fri, Jan 21, 2022 at 5:09 PM Bjø

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Bjørn Jørgensen
ion? > here we're trying to figure out whether there are critical bugs introduced > in 3.2.1 vs 3.2.0) > > On Fri, Jan 21, 2022 at 1:58 PM Bjørn Jørgensen > wrote: > >> Hi, I am wondering if it's a bug or not. >> >> I do have a lot of json files, where they

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Bjørn Jørgensen
y should be worked on immediately. Everything else please > retarget to an appropriate release. == But my bug isn't > fixed? == In order to make timely releases, we will > typically not hold the release unless the bug in question is a regression > from the previous release. That being said, if there is something which is > a regression that has not been correctly targeted please ping me or a > committer to help target the issue. > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297

Re: [VOTE] Release Spark 3.2.1 (RC1)

2022-01-15 Thread Bjørn Jørgensen
/dist/dev/spark/v3.2.1-rc1-docs/ >>>>> >>>>> The list of bug fixes going into 3.2.1 can be found at the following >>>>> URL: >>>>> https://s.apache.org/7tzik >>>>> >>>>> This release is using the release script of the t