RK-33615
Best,
Meikel
From: Mich Talebzadeh
Sent: Samstag, 4. Dezember 2021 18:36
To: Bode, Meikel, NMA-CFD
Cc: dev ; user@spark.apache.org
Subject: Re: Conda Python Env in K8S
Hi Meikel
In the past I tried with
--py-files hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/
these options exist and I want to understand what the issue is...
Any hints on that?
Best,
Meikel
From: Mich Talebzadeh
Sent: Freitag, 3. Dezember 2021 13:27
To: Bode, Meikel, NMA-CFD
Cc: dev ; user@spark.apache.org
Subject: Re: Conda Python Env in K8S
Build python packages into the docker
Hello,
I am trying to run spark jobs using Spark Kubernetes Operator.
But when I try to bundle a conda python environment using the following
resource description the python interpreter is only unpack to the driver and
not to the executors.
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: Spark
Can we add Python dependencies as we can do for mvn coordinates? So that we run
sth like pip install or download from pypi index?
From: Mich Talebzadeh
Sent: Mittwoch, 24. November 2021 18:28
Cc: user@spark.apache.org
Subject: Re: [issue] not able to add external libs to pyspark job while using
27;true')
Any help on these issues would be very appreciated!
Many thanks,
Meikel Bode
From: Bode, Meikel, NMA-CFD
Sent: Mittwoch, 10. November 2021 08:23
To: user ; dev
Subject: HiveThrift2 ACID Transactions?
Hi all,
We want to use apply INSERTS, UPDATE, and DELETE operations on table
Hi all,
We want to use apply INSERTS, UPDATE, and DELETE operations on tables based on
parquet or ORC files served by thrift2.
Actually its unclear whether we can enable them and where.
At the moment, when executing UPDATE or DELETE operations those are getting
blocked.
Anyone out who uses ACI
Hi all,
I try to get Thrift2 on Spark 3.1.2 running on K8S with one executor for the
moment. This works so far but it fails at executor side during initialization.
The issue seems to be related to access restrictions on certain directories...
But I am not sure. Please see errors marked in yellow
Many thanks! 😊
From: Gengliang Wang
Sent: Dienstag, 19. Oktober 2021 16:16
To: dev ; user
Subject: [ANNOUNCE] Apache Spark 3.2.0
Hi all,
Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous
contribution from the open-source community, this release managed to resolve in
ex
Hi,
thx. Great work. Will test it 😊
Best,
Meikel Bode
From: Kidong Lee
Sent: Freitag, 10. September 2021 01:39
To: user@spark.apache.org
Subject: spark thrift server as hive on spark running on kubernetes, and more.
Hi,
Recently, I have open-sourced a tool called
DataRoaster(https://github.c
On EKS...
From: Mich Talebzadeh
Sent: Donnerstag, 12. August 2021 15:47
To: Bode, Meikel, NMA-CFD
Cc: user@spark.apache.org
Subject: Re: K8S submit client vs. cluster
Ok
As I see it with PySpark even if it is submitted as cluster, it will be
converted to client mode anyway
Are you running
Hi Mich,
All PySpark.
Best,
Meikel
From: Mich Talebzadeh
Sent: Donnerstag, 12. August 2021 13:41
To: Bode, Meikel, NMA-CFD
Cc: user@spark.apache.org
Subject: Re: K8S submit client vs. cluster
Is this Spark or PySpark?
[https://docs.google.com/uc?export=download&
Hi all,
If we schedule a spark job on k8s, how are volume mappings handled?
In client mode I would expect that drivers volumes have to mapped manually in
the pod template. Executor volumes are attached dynamically based on submit
parameters. Right...?
I cluster mode I would expect that volumes
Hi folks,
Maybe not the right audience but maybe you came along such an requirement.
Is it possible to define a parquet schema, that contains technical column names
and a list of translations for a certain column name into different languages?
I give an example:
Technical: "custnr" would transla
Hi all,
My df looks like follows:
Situation:
MainKey, SubKey, Val1, Val2, Val3, ...
1, 2, a, null, c
1, 2, null, null, c
1, 3, null, b, null
1, 3, a, null, c
Desired outcome:
1, 2, a, b, c
1, 2, a, b, c
1, 3, a, b, c
1, 3, a, b, c
How could I populate/synchronize empty cells of all records wi
Hi Kidong Lee,
Thank you for your email. Actually I came along your blog and it seems to be
very complete.
As you write, that its not easy to bring Spark Thrift2 to K8S and because you
had to write your own wrapper, I have the impression that is not really
officially supported, despite the fact
Hi all,
We migrate to k8s and I wonder whether there are already "good practices" to
run thrift2 on k8s?
Best,
Meikel
Hi all,
when broadcasting a large dict containing several million entries to executors
what exactly happens when calling bc_var.value within a UDF like:
..
d = bc_var.value
..
Does d receives a copy of the dict inside value or is this handled like a
pointer?
Thanks,
Meikel
e root by
indicating:
child, lvl-0-parent
inquiry1, null
inquiry2, null
order3, null
Actually that’s what I realized with my recursive UDF I put into the initial
post.
Thank you for any hints on that issue! Any hints on the UDF solution are also
very welcome:
Thx and best,
Meikel
From: Bod
Hi all,
I implemented a recursive UDF, that tries to find a document number in a long
list of predecessor documents. This can be a multi-level hierarchy:
C is successor of B is successor of A (but many more levels are possible)
As input to that UDF I prepare a dict that contains the complete doc
Congrats!
Von: Hyukjin Kwon
Gesendet: Mittwoch, 3. März 2021 02:41
An: user @spark ; dev
Betreff: [ANNOUNCE] Announcing Apache Spark 3.1.1
We are excited to announce Spark 3.1.1 today.
Apache Spark 3.1.1 is the second release of the 3.x line. This release adds
Python type annotations and Pytho
Hi Sean.
You are right. So we are using docker images for our spark cluster. The
generation of the worker image did no succeed and therefore the old 3.0.1 image
was still in use.
Thanks,
Best,
Meikel
Von: Sean Owen
Gesendet: Freitag, 26. Februar 2021 10:29
An: Bode, Meikel, NMA-CFD
Cc: user
Hi All,
After changing to 3.0.2 I face the following issue. Thanks for any hint on that
issue.
Best,
Meikel
df = self.spark.read.json(path_in)
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 300,
in json
File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_
Hi all,
I process a lot of JSON files of different sizes. All files share the same
overall structure. I have no issue with files of sizes around 150-300MB.
Another file of around 530MB now causes errors when I apply selectExpr on the
resulting DF after reading the file.
AnalysisException: canno
23 matches
Mail list logo