rency and Thread Issues: If there are too many concurrent
> connections or thread limitations,
> it could result in failed connections. *Adjust
> spark.shuffle.io.clientThreads*
> - It might be prudent to do the same to *spark.shuffle.io.server.Threads*
> - Check how stable your envi
,
it could result in failed connections. *Adjust
spark.shuffle.io.clientThreads*
- It might be prudent to do the same to *spark.shuffle.io.server.Threads*
- Check how stable your environment is. Observe any issues reported in
Spark UI
HTH
Mich Talebzadeh,
Solutions Architect/Engineering Lead
trust that you are familiar with the concept of shuffle in Spark.
> Spark Shuffle is an expensive operation since it involves the following
>
>-
>
>Disk I/O
>-
>
>Involves data serialization and deserialization
>-
>
>Network I/O
>
> Bas
Hi,
These two threads that you sent seem to be duplicates of each other?
Anyhow I trust that you are familiar with the concept of shuffle in Spark.
Spark Shuffle is an expensive operation since it involves the following
-
Disk I/O
-
Involves data serialization and deserialization
I want to learn differences among below thread configurations.
spark.shuffle.io.serverThreads
spark.shuffle.io.clientThreads
spark.shuffle.io.threads
spark.rpc.io.serverThreads
spark.rpc.io.clientThreads
spark.rpc.io.threads
Thanks.
I want to learn differences among below thread configurations.
spark.shuffle.io.serverThreads
spark.shuffle.io.clientThreads
spark.shuffle.io.threads
spark.rpc.io.serverThreads
spark.rpc.io.clientThreads
spark.rpc.io.threads
Thanks.
s back.
Thanks,
Sankavi
From: Bjørn Jørgensen
Sent: Monday, August 14, 2023 6:11 PM
To: Sankavi Nagalingam
Cc: user@spark.apache.org; Vijaya Kumar Mathupaiyan
Subject: [EXT MSG] Re: Spark Vulnerabilities
EXTERNAL source. Be CAREFUL with links / attachments
I have added links to the github
Yes, it sounds like it. So the broadcast DF size seems to be between 1 and
4GB. So I suggest that you leave it as it is.
I have not used the standalone mode since spark-2.4.3 so I may be missing a
fair bit of context here. I am sure there are others like you that are
still using it!
HTH
Mich
from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 17 Aug 2023 at 21:01, Patrick Tucci
> wrote:
>
>> Hi Mich,
>>
>> Here are my config values from spark-defaults.conf:
>>
>> spark.eventLog.enabled true
>> spark.eventLog.dir hdfs:/
ny monetary damages arising from
such loss, damage or destruction.
On Thu, 17 Aug 2023 at 21:01, Patrick Tucci wrote:
> Hi Mich,
>
> Here are my config values from spark-defaults.conf:
>
> spark.eventLog.enabled true
> spark.eventLog.dir hdfs://10.0.50.1:8020/spark-
Hi Mich,
Here are my config values from spark-defaults.conf:
spark.eventLog.enabled true
spark.eventLog.dir hdfs://10.0.50.1:8020/spark-logs
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.history.fs.logDirectory hdfs://10.0.50.1:8020/spark-logs
Hello Paatrick,
As a matter of interest what parameters and their respective values do
you use in spark-submit. I assume it is running in YARN mode.
HTH
Mich Talebzadeh,
Solutions Architect/Engineering Lead
London
United Kingdom
view my Linkedin profile
<https://www.linkedin.com/in/m
Hi Mich,
Yes, that's the sequence of events. I think the big breakthrough is that
(for now at least) Spark is throwing errors instead of the queries hanging.
Which is a big step forward. I can at least troubleshoot issues if I know
what they are.
When I reflect on the issues I faced an
Hi Patrik,
glad that you have managed to sort this problem out. Hopefully it will go
away for good.
Still we are in the dark about how this problem is going away and coming
back :( As I recall the chronology of events were as follows:
1. The Issue with hanging Spark job reported
2
Hi Everyone,
I just wanted to follow up on this issue. This issue has continued since
our last correspondence. Today I had a query hang and couldn't resolve the
issue. I decided to upgrade my Spark install from 3.4.0 to 3.4.1. After
doing so, instead of the query hanging, I got an error me
For the Guava case, you may be interested in
https://github.com/apache/spark/pull/42493
Thanks,
Cheng Pan
> On Aug 14, 2023, at 16:50, Sankavi Nagalingam
> wrote:
>
> Hi Team,
> We could see there are many dependent vulnerabilities present in the latest
> spark-core:3.4.
Yeah, we generally don't respond to "look at the output of my static
analyzer".
Some of these are already addressed in a later version.
Some don't affect Spark.
Some are possibly an issue but hard to change without breaking lots of
things - they are really issues with upstrea
I have added links to the github PR. Or comment for those that I have not
seen before.
Apache Spark has very many dependencies, some can easily be upgraded while
others are very hard to fix.
Please feel free to open a PR if you wanna help.
man. 14. aug. 2023 kl. 14:06 skrev Sankavi Nagalingam
Hi Team,
We could see there are many dependent vulnerabilities present in the latest
spark-core:3.4.1.jar. PFA
Could you please let us know when will be the fix version available for the
users.
Thanks,
Sankavi
The information in this e-mail and any attachments is confidential and may be
r install an additional Java version, I attempted to use the
> latest alpha as well. This appears to have worked, although I couldn't
> figure out how to get it to use the metastore_db from Spark.
>
> After turning my attention back to Spark, I determined the issue. After
> much trouble
tions
suggest it might be a Java incompatibility issue. Since I didn't want to
downgrade or install an additional Java version, I attempted to use the
latest alpha as well. This appears to have worked, although I couldn't
figure out how to get it to use the metastore_db from Spark.
After turni
to migrate
>>> to Delta Lake and see if that solves the issue.
>>>
>>> Thanks again for your feedback.
>>>
>>> Patrick
>>>
>>> On Fri, Aug 11, 2023 at 10:09 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>
;
>> On Fri, Aug 11, 2023 at 10:09 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi Patrick,
>>>
>>> There is not anything wrong with Hive On-premise it is the best data
>>> warehouse there is
>>>
>>> Hive handles bo
..@gmail.com> wrote:
>
>> Hi Patrick,
>>
>> There is not anything wrong with Hive On-premise it is the best data
>> warehouse there is
>>
>> Hive handles both ORC and Parquet formal well. They are both columnar
>> implementations of relational mod
both ORC and Parquet formal well. They are both columnar
> implementations of relational model. What you are seeing is the Spark API
> to Hive which prefers Parquet. I found out a few years ago.
>
> From your point of view I suggest you stick to parquet format with Hive
> specific t
Hi Patrick,
There is not anything wrong with Hive On-premise it is the best data
warehouse there is
Hive handles both ORC and Parquet formal well. They are both columnar
implementations of relational model. What you are seeing is the Spark API
to Hive which prefers Parquet. I found out a few
Thanks for the reply Stephen and Mich.
Stephen, you're right, it feels like Spark is waiting for something, but
I'm not sure what. I'm the only user on the cluster and there are plenty of
resources (+60 cores, +250GB RAM). I even tried restarting Hadoop, Spark
and the host serve
Steve may have a valid point. You raised an issue with concurrent writes
before, if I recall correctly. Since this limitation may be due to Hive
metastore. By default Spark uses Apache Derby for its database
persistence. *However
it is limited to only one Spark session at any time for the purposes
Hi Kezhi,
Yes, you no longer need to start a master to make the client work. Please
see the quickstart.
https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_connect.html
You can think of Spark Connect as an API on top of Master so workers can be
added to the cluster same
Hi Patrick,
When this has happened to me in the past (admittedly via spark-submit) it has
been because another job was still running and had already claimed some of the
resources (cores and memory).
I think this can also happen if your configuration tries to claim resources
that will never be
Hi Mich,
I don't believe Hive is installed. I set up this cluster from scratch. I
installed Hadoop and Spark by downloading them from their project websites.
If Hive isn't bundled with Hadoop or Spark, I don't believe I have it. I'm
running the Thrift server distributed
Hi Mark,
I created a spark3.4.1 docker file. Details from
spark-py-3.4.1-scala_2.12-11-jre-slim-buster
<https://hub.docker.com/repository/docker/michtalebzadeh/spark_dockerfiles/tags?page=1&ordering=last_updated>
Pull instructions are given
docker pull
michtalebzadeh/spark_dockerfile
damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 10 Aug 2023 at 20:02, Patrick Tucci
> wrote:
>
>> Hi Mich,
>>
>> Thanks for the reply. Unfortunately I don't have Hive set up on my
>> cluster. I can explore this if th
my
> cluster. I can explore this if there are no other ways to troubleshoot.
>
> I'm using beeline to run commands against the Thrift server. Here's the
> command I use:
>
> ~/spark/bin/beeline -u jdbc:hive2://10.0.50.1:1 -n hadoop -f
> command.sql
>
> Thanks
Hi Mich,
Thanks for the reply. Unfortunately I don't have Hive set up on my cluster.
I can explore this if there are no other ways to troubleshoot.
I'm using beeline to run commands against the Thrift server. Here's the
command I use:
~/spark/bin/beeline -u jdbc:hive2://10.
mail's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On Thu, 10 Aug 2023 at 18:39, Patrick Tucci wrote:
> Hello,
>
> I'm attempting to run a query on Spar
Hello,
I'm attempting to run a query on Spark 3.4.0 through the Spark
ThriftServer. The cluster has 64 cores, 250GB RAM, and operates in
standalone mode using HDFS for storage.
The query is as follows:
SELECT ME.*, MB.BenefitID
FROM MemberEnrollment ME
JOIN MemberBenefits MB
ON
Hi,
I'm recently learning Spark Connect but have some questions regarding the
connect server's relation with master or workers: so when I'm using the
connect server, I don't have to start a master alone side to make clients
work. Is the connect server simply using "local[
Hi Mark,
you can build it yourself, no big deal :)
REPOSITORY TAG
IMAGE ID CREATED
SIZE
sparkpy/spark-py
3.4.1-scala_2.12-11-jre-slim-buster-Dockerfile a876102b2206 1
second ago
Hello,
I noticed that the apache/spark-py image for Spark's 3.4.1 release is not
available (apache/spark@3.4.1 is available). Would it be possible to get
the 3.4.1 release build for the apache/spark-py image published?
Thanks,
Mark
--
This communication, together wit
unsubscribe
From: Mich Talebzadeh
Sent: Tuesday, August 8, 2023 4:43 PM
To: user @spark
Subject: [EXTERNAL] Use of ML in certain aspects of Spark to improve the
performance
I am currently pondering and sharing my thoughts openly. Given our reliance on
gathered
I am currently pondering and sharing my thoughts openly. Given our reliance
on gathered statistics, it prompts the question of whether we could
integrate specific machine learning components into Spark Structured
Streaming. Consider a scenario where we aim to adjust configuration values
on the fly
Hi,
I would like to share experience on spark 3.4.1 running on k8s autopilot or
some refer to it as serverless.
My current experience is on Google GKE autopilot
<https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview>.
So essentially you specify the name and region a
pp4 has one row, I'm guessing - containing an array of 10 images. You want
10 rows of 1 image each.
But, just don't do this. Pass the bytes of the image as an array,
along with width/height/channels, and reshape it on use. It's just easier.
That is how the Spark image representati
Hello Adrian,
here is the snippet
import tensorflow_datasets as tfds
(ds_train, ds_test), ds_info = tfds.load(
dataset_name, data_dir='', split=["train",
"test"], with_info=True, as_supervised=True
)
schema = StructType([
StructField("image", ArrayType(ArrayType(ArrayType(Integer
will cover me as I plan to be out of the
> office soon)
>
> Hi Kent and Sean,
>
> Nice to meet you. I am working on the OSS legal aspects with Pavan who is
> planning to make the contribution request to the Spark project. I saw that
> Sean mentioned in his email that the contribu
(Adding my manager Eugene Kim who will cover me as I plan to be out of the
office soon)
Hi Kent and Sean,
Nice to meet you. I am working on the OSS legal aspects with Pavan who is
planning to make the contribution request to the Spark project. I saw that
Sean mentioned in his email that the
Hello,
can you also please show us how you created the pandas dataframe? I mean,
how you added the actual data into the dataframe. It would help us for
reproducing the error.
Thank you,
Pop-Tifrea Adrian
On Mon, Jul 31, 2023 at 5:03 AM second_co...@yahoo.com <
second_co...@yahoo.com> wrote:
> i
Hi,
I am new to Spark and looking for help regarding the session windowing
<https://spark.apache.org/docs/3.4.1/structured-streaming-programming-guide.html#types-of-time-windows>
in Spark. I want to create session windows on a user activity stream with a
gap duration of `x` minutes and als
i changed to
ArrayType(ArrayType(ArrayType(IntegerType( , still get same error
Thank you for responding
On Thursday, July 27, 2023 at 06:58:09 PM GMT+8, Adrian Pop-Tifrea
wrote:
Hello,
when you said your pandas Dataframe has 10 rows, does that mean it contains 10
images? Becaus
ok so as expected the underlying database is Hive. Hive uses hdfs storage.
You said you encountered limitations on concurrent writes. The order and
limitations are introduced by Hive metastore so to speak. Since this is all
happening through Spark, by default implementation of the Hive metastore
4:28 PM Mich Talebzadeh
> wrote:
>
>> It is not Spark SQL that throws the error. It is the underlying Database
>> or layer that throws the error.
>>
>> Spark acts as an ETL tool. What is the underlying DB where the table
>> resides? Is concurrency supported.
that
will work better in different use cases according to the writing pattern,
type of queries, data characteristics, etc.
*Pol Santamaria*
On Sat, Jul 29, 2023 at 4:28 PM Mich Talebzadeh
wrote:
> It is not Spark SQL that throws the error. It is the underlying Database
> or layer that
It is not Spark SQL that throws the error. It is the underlying Database or
layer that throws the error.
Spark acts as an ETL tool. What is the underlying DB where the table
resides? Is concurrency supported. Please send the error to this list
HTH
Mich Talebzadeh,
Solutions Architect
Hello,
I'm building an application on Spark SQL. The cluster is set up in
standalone mode with HDFS as storage. The only Spark application running is
the Spark Thrift Server using FAIR scheduling mode. Queries are submitted
to Thrift Server using beeline.
I have multiple queries that insert
Spark on tin boxes like Google Dataproc or AWS EC2 often utilise YARN
resource manager. YARN is the most widely used resource manager not just
for Spark but for other artefacts as well. On-premise YARN is used
extensively. In Cloud it is also used widely in Infrastructure as a Service
such as
Hi all,
I am learning about the performance difference of Spark when performing a
JOIN problem on Serverless (K8S) and Serverful (Traditional server)
environments.
Through experiment, Spark on K8s tends to run slower than Serverful.
Through understanding the architecture, I know that Spark runs
Hello,
when you said your pandas Dataframe has 10 rows, does that mean it contains
10 images? Because if that's the case, then you'd want ro only use 3 layers
of ArrayType when you define the schema.
Best regards,
Adrian
On Thu, Jul 27, 2023, 11:04 second_co...@yahoo.com.INVALID
wrote:
> i h
i have panda dataframe with column 'image' using numpy.ndarray. shape is (500,
333, 3) per image. my panda dataframe has 10 rows, thus, shape is (10, 500,
333, 3)
when using spark.createDataframe(panda_dataframe, schema), i need to specify
the schema,
schema = StructType([
StructField(
There is no such method in Spark. I think that's some EMR-specific
modification.
On Wed, Jul 26, 2023 at 11:06 PM second_co...@yahoo.com.INVALID
wrote:
> I ran the following code
>
> spark.sparkContext.list_packages()
>
> on spark 3.4.1 and i get below error
>
>
I ran the following code
spark.sparkContext.list_packages()
on spark 3.4.1 and i get below error
An error was encountered:
AttributeError
[Traceback (most recent call last):
, File "/tmp/spark-3d66c08a-08a3-4d4e-9fdf-45853f65e03d/shell_wrapper.py",
line 113, in exec
self._exec
A with Twilio and consider
> establishing that to govern contributions.
> >
> > On Mon, Jul 24, 2023 at 6:10 PM Pavan Kotikalapudi <
> pkotikalap...@twilio.com.invalid> wrote:
> >>
> >> Hi Spark Dev,
> >>
> >> My name is Pavan Kotikalapudi,
ributed to the project is assumed
> to have been licensed per above already.
>
> It might be wise to review the CCLA with Twilio and consider establishing
> that to govern contributions.
>
> On Mon, Jul 24, 2023 at 6:10 PM Pavan Kotikalapudi
> wrote:
>>
>> Hi S
e wise to review the CCLA with Twilio and consider establishing
that to govern contributions.
On Mon, Jul 24, 2023 at 6:10 PM Pavan Kotikalapudi
wrote:
> Hi Spark Dev,
>
> My name is Pavan Kotikalapudi, I work at Twilio.
>
> I am looking to contribute to this spark issue
> h
Hi Spark Dev,
My name is Pavan Kotikalapudi, I work at Twilio.
I am looking to contribute to this spark issue
https://issues.apache.org/jira/browse/SPARK-24815.
There is a clause from the company's OSS saying
- The proposed contribution is about 100 lines of code modification in the
personally I have not done it myself.
CCed to spark user group if some user has tried it among users.
HTH
Mich Talebzadeh,
Solutions Architect/Engineering Lead
Palantir Technologies Limited
London
United Kingdom
view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-p
This is the downloaded docker?
Try this with the added configuration options as below
/opt/spark/sbin/start-connect-server.sh *--conf
spark.driver.extraJavaOptions="-Divy.cache.dir=/tmp -Divy.home=/tmp"
*--packages
org.apache.spark:spark-connect_2.12:3.4.1
And you will get
Hello,
I am trying to launch Spark connect on Docker Image
❯ docker run -it apache/spark:3.4.1-scala2.12-java11-r-ubuntu /bin/bash
spark@aa0a670f7433:/opt/spark/work-dir$
/opt/spark/sbin/start-connect-server.sh --packages
org.apache.spark:spark-connect_2.12:3.4.1
starting
this link might help
https://stackoverflow.com/questions/46929351/spark-reading-orc-file-in-driver-not-in-executors
Mich Talebzadeh,
Solutions Architect/Engineering Lead
Palantir Technologies Limited
London
United Kingdom
view my Linkedin profile
<https://www.linkedin.com/in/mich-talebza
t; partition updates(insert overwrite) daily for the last 30 days
>>> (partitions).
>>> The ETL inside the staging directories is completed in hardly 5minutes,
>>> but then renaming takes a lot of time as it deletes and copies the
>>> partitions.
>>> My issue is somethi
rtitions).
>> The ETL inside the staging directories is completed in hardly 5minutes,
>> but then renaming takes a lot of time as it deletes and copies the
>> partitions.
>> My issue is something related to this -
>> https://groups.google.com/g/cloud-dataproc-discuss/c/neMyhyt
tories is completed in hardly 5minutes,
> but then renaming takes a lot of time as it deletes and copies the
> partitions.
> My issue is something related to this -
> https://groups.google.com/g/cloud-dataproc-discuss/c/neMyhytlfyg?pli=1
>
>
>
> With Best Regards,
>
>
it deletes and copies the partitions.
My issue is something related to this -
https://groups.google.com/g/cloud-dataproc-discuss/c/neMyhytlfyg?pli=1
With Best Regards,
Dipayan Dev
On Wed, Jul 19, 2023 at 12:06 AM Mich Talebzadeh
wrote:
> Spark has no role in creating that hive stag
Spark has no role in creating that hive staging directory. That directory
belongs to Hive and Spark simply does ETL there, loading to the Hive
managed table in your case which ends up in saging directory
I suggest that you review your design and use an external hive table with
explicit location
It does help performance but not significantly.
I am just wondering, once Spark creates that staging directory along with
the SUCCESS file, can we just do a gsutil rsync command and move these
files to original directory? Anyone tried this approach or foresee any
concern?
On Mon, 17 Jul 2023
++ DEV community
On Mon, Jul 17, 2023 at 4:14 PM Varun Shah
wrote:
> Resending this message with a proper Subject line
>
> Hi Spark Community,
>
> I am trying to set up my forked apache/spark project locally for my 1st
> Open Source Contribution, by building and creating a pa
Hi Team,
I am still looking for a guidance here. Really appreciate anything that
points me in the right direction.
On Mon, Jul 17, 2023, 16:14 Varun Shah wrote:
> Resending this message with a proper Subject line
>
> Hi Spark Community,
>
> I am trying to set up my forked apach
ll take a long time to perform this step. One workaround will be
> to create smaller number of larger files if that is possible from Spark and
> if this is not possible then those configurations allow for configuring the
> threadpool which does the metadata copy.
>
> You can go thr
ote the
> MLlib-specific contribution guidelines section in particular.
>
> https://spark.apache.org/contributing.html
>
> Since you are looking for something to start with, take a look at this
> Jira query for starter issues.
>
>
> https://issues.apache.org/jira/browse/S
Fileoutputcommitter v2 is supported in GCS but the rename is a metadata
copy and delete operation in GCS and therefore if there are many number of
files it will take a long time to perform this step. One workaround will be
to create smaller number of larger files if that is possible from Spark and
restingly, it took only 10 minutes to write the output in the staging
> directory and rest of the time it took to rename the objects. Thats the
> concern.
>
> Looks like a known issue as spark behaves with GCS but not getting any
> workaround for this.
>
>
> On Mon, 17 Jul
It does support- It doesn’t error out for me atleast. But it took around 4
hours to finish the job.
Interestingly, it took only 10 minutes to write the output in the staging
directory and rest of the time it took to rename the objects. Thats the
concern.
Looks like a known issue as spark behaves
er algorithms?
>
> I tried v2 algorithm but its not enhancing the runtime. What’s the best
> practice in Dataproc for dynamic updates in Spark.
>
>
> On Mon, 17 Jul 2023 at 7:05 PM, Jay wrote:
>
>> You can try increasing fs.gs.batch.threads and
>> fs.gs.max.requests
Thanks Jay,
I will try that option.
Any insight on the file committer algorithms?
I tried v2 algorithm but its not enhancing the runtime. What’s the best
practice in Dataproc for dynamic updates in Spark.
On Mon, 17 Jul 2023 at 7:05 PM, Jay wrote:
> You can try increas
You can try increasing fs.gs.batch.threads and fs.gs.max.requests.per.batch.
The definitions for these flags are available here -
https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/CONFIGURATION.md
On Mon, 17 Jul 2023 at 14:59, Dipayan Dev wrote:
> No, I am using Sp
Resending this message with a proper Subject line
Hi Spark Community,
I am trying to set up my forked apache/spark project locally for my 1st
Open Source Contribution, by building and creating a package as mentioned here
under Running Individual Tests
<https://spark.apache.org/develo
No, I am using Spark 2.4 to update the GCS partitions . I have a managed
Hive table on top of this.
[image: image.png]
When I do a dynamic partition update of Spark, it creates the new file in a
Staging area as shown here.
But the GCS blob renaming takes a lot of time. I have a partition based on
So you are using GCP and your Hive is installed on Dataproc which happens
to run your Spark as well. Is that correct?
What version of Hive are you using?
HTH
Mich Talebzadeh,
Solutions Architect/Engineering Lead
Palantir Technologies Limited
London
United Kingdom
view my Linkedin profile
Hi All,
Of late, I have encountered the issue where I have to overwrite a lot of
partitions of the Hive table through Spark. It looks like writing to
hive_staging_directory takes 25% of the total time, whereas 75% or more
time goes in moving the ORC files from staging directory to the final
this Jira
query for starter issues.
https://issues.apache.org/jira/browse/SPARK-38719?jql=project%20%3D%20SPARK%20AND%20labels%20%3D%20%22starter%22%20AND%20status%20%3D%20Open
Cheers,
Brian
On Sun, Jul 16, 2023 at 8:49 AM Dipayan Dev wrote:
> Hi Spark Community,
>
> A very good morni
Hi Spark Community,
A very good morning to you.
I am using Spark from last few years now, and new to the community.
I am very much interested to be a contributor.
I am looking to contribute to Spark MLLib. Can anyone please suggest me how
to start with contributing to any new MLLib feature? Is
Hey Spark Community,
Our Jupyterhub/Jupyterlab (with spark client) runs behind two layers of
HAProxy and the Yarn cluster runs remotely. We want to use deploy mode
'client' so that we can capture the output of any spark sql query in
jupyterlab. I'm aware of other technologies like
Gentle reminder on this.
On Sat, Jul 8, 2023 at 7:59 PM Surya Soma wrote:
> Hello,
>
> I am trying to publish custom metrics using Spark CustomMetric API as
> supported since spark 3.2 https://github.com/apache/spark/pull/31476,
>
>
> https://spark.apache.org/docs/3.2.
Well, in that case, you may want to make sure your Spark server is
running properly and you can access the Spark UI using your browser. If
you're not owning the spark cluster, contact your spark admin.
On 7/12/23 1:56 PM, timi ayoade wrote:
I can't even connect to the spark UI
O
unsubscribe
From: timi ayoade
Sent: Wednesday, July 12, 2023 6:11 AM
To: user@spark.apache.org
Subject: [EXTERNAL] Spark Not Connecting
Hi Apache spark community, I am a Data EngineerI have been using Apache spark
for some time now. I recently tried to use it
Hi Apache spark community, I am a Data EngineerI have been using Apache
spark for some time now. I recently tried to use it but I have been getting
some errors. I have tried debugging the error but to no avail. the
screenshot is attached below. I will be glad if responded to. thanks
Are you using Spark 3.4?
Under directory $SPARK_HOME get a list of jar files for hive and hadoop.
This one is for version 3.4.0
/opt/spark/jars> ltr *hive* *hadoop*
-rw-r--r--. 1 hduser hadoop 717820 Apr 7 03:43 spark-hive_2.12-3.4.0.jar
-rw-r--r--. 1 hduser hadoop 563632 Apr 7 03:43
sp
Hi all,
We made some changes to hive which require changes to the hive jars that
Spark is bundled with. Since Spark 3.3.1 comes bundled with Hive 2.3.9
jars, we built our changes in Hive 2.3.9 and put the necessary jars under
$SPARK_HOME/jars (replacing the original jars that were there
Hello,
I am trying to publish custom metrics using Spark CustomMetric API as
supported since spark 3.2 https://github.com/apache/spark/pull/31476,
https://spark.apache.org/docs/3.2.0/api/java/org/apache/spark/sql/connector/metric/CustomMetric.html
I have created a custom metric implementing
Hello everyone,
I’m really sorry to use this mailing list, but seems impossible to notify a
strange behaviour that is happening with the Spark UI. I’m sending also the
link to the stackoverflow question here
https://stackoverflow.com/questions/76632692/spark-ui-executors-tab-its-empty
I’m
501 - 600 of 3549 matches
Mail list logo