Hi,
In spark 2.3+ I saw that pyarrow was being used in a bunch of places in
spark. And I was trying to understand the benefit in terms of serialization
/ deserializaiton it provides.
I understand that the new pandas-udf works only if pyarrow is installed.
But what about the plain old PythonUDF whi
Thank you for volunteering for 2.3.4 release manager, Kazuaki!
It's great to see a new release manager in advance. :D
Thank you for reply, Stavros.
In addition to that issue, I'm also monitoring some other K8s issues and
PRs.
But, I'm not sure we can have that because some PRs seems to fail at
bui
Not currently in Spark.
However, there are systems out there that can share DataFrame between languages
on top of Spark - it’s not calling the python UDF directly but you can pass the
DataFrame to python and then .map(UDF) that way.
From: Fiske, Danny
Sent: Mo
Thanks, Keith. we have set the SPARK_WORKER_INSTANCES=8. So that means we
are running 8 workers in a single machine with 1 thread and this gives the
8 threads?
Is there a preference for running 1 worker and 8 threads inside it? These
are dual CPU machines, so I believe we at least need 2 worker in
Your module 'feature' isn't available to the yarn workers, so you'll need
to either install it on them if you have access, or else upload to the
workers at runtime using --py-files or similar.
On Tue, Jul 16, 2019 at 7:16 AM zenglong chen
wrote:
> Hi,all,
> When i run a run a python script
Hi,all,
When i run a run a python script on spark submit,it done well in
local[*] mode,but not in standalone mode or yarn mode.The error like below:
Caused by: org.apache.spark.api.python.PythonException: Traceback (most
recent call last):
File "/usr/local/lib/python2.7/dist-packages/pyspa
Thank you Dongjoon for being a release manager.
If the assumed dates are ok, I would like to volunteer for an 2.3.4
release manager.
Best Regards,
Kazuaki Ishizaki,
From: Dongjoon Hyun
To: dev , "user @spark" ,
Apache Spark PMC
Date: 2019/07/13 07:18
Subject:[EXTERNAL] Re:
Hi Dongjoon,
Should we also consider fixing
https://issues.apache.org/jira/browse/SPARK-27812 before the cut?
Best,
Stavros
On Mon, Jul 15, 2019 at 7:04 PM Dongjoon Hyun
wrote:
> Hi, Apache Spark PMC members.
>
> Can we cut Apache Spark 2.4.4 next Monday (22nd July)?
>
> Bests,
> Dongjoon.
>
>
HI ,
I have long running spark streaming jobs.
Event log directories are getting filled with .inprogress files.
Is there fix or work around for spark streaming.
There is also one jira raised for the same by one reporter.
https://issues.apache.org/jira/browse/SPARK-22783
--
Raman Gugnani
85888