Re: [ANNOUNCE] Apache Spark 3.5.1 released
Excellent work, congratulations! On Wed, Feb 28, 2024 at 10:12 PM Dongjoon Hyun wrote: > Congratulations! > > Bests, > Dongjoon. > > On Wed, Feb 28, 2024 at 11:43 AM beliefer wrote: > >> Congratulations! >> >> >> >> At 2024-02-28 17:43:25, "Jungtaek Lim" >> wrote: >> >> Hi everyone, >> >> We are happy to announce the availability of Spark 3.5.1! >> >> Spark 3.5.1 is a maintenance release containing stability fixes. This >> release is based on the branch-3.5 maintenance branch of Spark. We >> strongly >> recommend all 3.5 users to upgrade to this stable release. >> >> To download Spark 3.5.1, head over to the download page: >> https://spark.apache.org/downloads.html >> >> To view the release notes: >> https://spark.apache.org/releases/spark-release-3-5-1.html >> >> We would like to acknowledge all community members for contributing to >> this >> release. This release would not have been possible without you. >> >> Jungtaek Lim >> >> ps. Yikun is helping us through releasing the official docker image for >> Spark 3.5.1 (Thanks Yikun!) It may take some time to be generally available. >> >> -- John Zhuge
Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow
Congratulations! Excellent work! On Tue, Feb 13, 2024 at 8:04 PM Yufei Gu wrote: > Absolutely thrilled to see the project going open-source! Huge congrats to > Chao and the entire team on this milestone! > > Yufei > > > On Tue, Feb 13, 2024 at 12:43 PM Chao Sun wrote: > >> Hi all, >> >> We are very happy to announce that Project Comet, a plugin to >> accelerate Spark query execution via leveraging DataFusion and Arrow, >> has now been open sourced under the Apache Arrow umbrella. Please >> check the project repo >> https://github.com/apache/arrow-datafusion-comet for more details if >> you are interested. We'd love to collaborate with people from the open >> source community who share similar goals. >> >> Thanks, >> Chao >> >> ----- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> -- John Zhuge
Re: Spark on Kubernetes scheduler variety
tch Scheduling. >>>> <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md> >>>> >>>> >>>> >>>> What is not very clear is the degree of progress of these projects. You >>>> may be kind enough to elaborate on KPI for each of these projects and where >>>> you think your contributions is going to be. >>>> >>>> >>>> HTH, >>>> >>>> >>>> Mich >>>> >>>> >>>>view my Linkedin profile >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>> >>>> >>>> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>> any loss, damage or destruction of data or any other property which may >>>> arise from relying on this email's technical content is explicitly >>>> disclaimed. The author will in no case be liable for any monetary damages >>>> arising from such loss, damage or destruction. >>>> >>>> >>>> >>>> >>>> On Fri, 18 Jun 2021 at 00:44, Holden Karau >>>> wrote: >>>> >>>>> Hi Folks, >>>>> >>>>> I'm continuing my adventures to make Spark on containers party and I >>>>> was wondering if folks have experience with the different batch >>>>> scheduler options that they prefer? I was thinking so that we can >>>>> better support dynamic allocation it might make sense for us to >>>>> support using different schedulers and I wanted to see if there are >>>>> any that the community is more interested in? >>>>> >>>>> I know that one of the Spark on Kube operators supports >>>>> volcano/kube-batch so I was thinking that might be a place I start >>>>> exploring but also want to be open to other schedulers that folks >>>>> might be interested in. >>>>> >>>>> Cheers, >>>>> >>>>> Holden :) >>>>> >>>>> -- >>>>> Twitter: https://twitter.com/holdenkarau >>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>> https://amzn.to/2MaRAG9 >>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>> >>>>> - >>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> >>>>> -- >>> Twitter: https://twitter.com/holdenkarau >>> Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> >> -- John Zhuge
Re: Timestamp Difference/operations
Yeah, operator "-" does not seem to be supported, however, you can use "datediff" function: In [9]: select datediff(CAST('2000-02-01 12:34:34' AS TIMESTAMP), CAST('2000-01-01 00:00:00' AS TIMESTAMP)) Out[9]: +--+ | datediff(CAST(CAST(2000-02-01 12:34:34 AS TIMESTAMP) AS DATE), CAST(CAST(2000-01-01 00:00:00 AS TIMESTAMP) AS DATE)) | +--+ | 31 | +--+ In [10]: select datediff('2000-02-01 12:34:34', '2000-01-01 00:00:00') Out[10]: ++ | datediff(CAST(2000-02-01 12:34:34 AS DATE), CAST(2000-01-01 00:00:00 AS DATE)) | ++ | 31 | ++ In [11]: select datediff(timestamp '2000-02-01 12:34:34', timestamp '2000-01-01 00:00:00') Out[11]: +--+ | datediff(CAST(TIMESTAMP('2000-02-01 12:34:34.0') AS DATE), CAST(TIMESTAMP('2000-01-01 00:00:00.0') AS DATE)) | +--+ | 31 | +--+ On Fri, Oct 12, 2018 at 7:01 AM Paras Agarwal wrote: > Hello Spark Community, > > Currently in hive we can do operations on Timestamp Like : > CAST('2000-01-01 12:34:34' AS TIMESTAMP) - CAST('2000-01-01 00:00:00' AS > TIMESTAMP) > > Seems its not supporting in spark. > Is there any way available. > > Kindly provide some insight on this. > > > Paras > 9130006036 > -- John
Re: Handle BlockMissingException in pyspark
BlockMissingException typically indicates the HDFS file is corrupted. Might be an HDFS issue, Hadoop mailing list is a better bet: u...@hadoop.apache.org. Capture at the full stack trace in executor log. If the file still exists, run `hdfs fsck -blockId blk_1233169822_159765693` to determine whether it is corrupted. If not corrupted, could there be excessive (thousands) current reads on the block? Hadoop version? Spark version? On Mon, Aug 6, 2018 at 2:21 AM Divay Jindal wrote: > Hi , > > I am running pyspark in dockerized jupyter environment , I am constantly > getting this error : > > ``` > > Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.runJob. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 33 > in stage 25.0 failed 1 times, most recent failure: Lost task 33.0 in stage > 25.0 (TID 35067, localhost, executor driver) > : org.apache.hadoop.hdfs.BlockMissingException > : Could not obtain block: > BP-1742911633-10.225.201.50-1479296658503:blk_1233169822_159765693 > > ``` > > Please can anyone help me with how to handle such exception in pyspark. > > -- > Best Regards > *Divay Jindal* > > > -- John
Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?
Sounds good. Should we add another paragraph after this paragraph in configuration.md to explain executor env as well? I will be happy to upload a simple patch. Note: When running Spark on YARN in cluster mode, environment variables > need to be set using the spark.yarn.appMasterEnv.[EnvironmentVariableName] > property in your conf/spark-defaults.conf file. Environment variables > that are set in spark-env.sh will not be reflected in the YARN > Application Master process in clustermode. See the YARN-related Spark > Properties > <https://github.com/apache/spark/blob/master/docs/running-on-yarn.html#spark-properties> > for > more information. Something like: Note: When running Spark on YARN, environment variables for the executors need to be set using the spark.yarn.executorEnv.[EnvironmentVariableName] property in your conf/spark-defaults.conf file or on the command line. Environment variables that are set in spark-env.sh will not be reflected in the executor process. On Wed, Jan 3, 2018 at 7:53 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > Because spark-env.sh is something that makes sense only on the gateway > machine (where the app is being submitted from). > > On Wed, Jan 3, 2018 at 6:46 PM, John Zhuge <john.zh...@gmail.com> wrote: > > Thanks Jacek and Marcelo! > > > > Any reason it is not sourced? Any security consideration? > > > > > > On Wed, Jan 3, 2018 at 9:59 AM, Marcelo Vanzin <van...@cloudera.com> > wrote: > >> > >> On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge <jzh...@apache.org> wrote: > >> > I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. > Is > >> > spark-env.sh sourced when starting the Spark AM container or the > >> > executor > >> > container? > >> > >> No, it's not. > >> > >> -- > >> Marcelo > > > > > > > > > > -- > > John > > > > -- > Marcelo > -- John
Re: Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?
Thanks Jacek and Marcelo! Any reason it is not sourced? Any security consideration? On Wed, Jan 3, 2018 at 9:59 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > On Tue, Jan 2, 2018 at 10:57 PM, John Zhuge <jzh...@apache.org> wrote: > > I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is > > spark-env.sh sourced when starting the Spark AM container or the executor > > container? > > No, it's not. > > -- > Marcelo > -- John
Is spark-env.sh sourced by Application Master and Executor for Spark on YARN?
Hi, I am running Spark 2.0.0 and 2.1.1 on YARN in a Hadoop 2.7.3 cluster. Is spark-env.sh sourced when starting the Spark AM container or the executor container? Saw this paragraph on https://github.com/apache/spark/blob/master/docs/configuration.md: Note: When running Spark on YARN in cluster mode, environment variables > need to be set using the spark.yarn.appMasterEnv.[EnvironmentVariableName] > property > in your conf/spark-defaults.conf file. Environment variables that are set > in spark-env.sh will not be reflected in the YARN Application Master > process in clustermode. See the YARN-related Spark Properties > <https://github.com/apache/spark/blob/master/docs/running-on-yarn.html#spark-properties> > for > more information. Does it mean spark-env.sh will not be sourced when starting AM in cluster mode? Does this paragraph appy to executor as well? Thanks, -- John Zhuge