date:20190716

Usage of PyArrow in Spark

2019-07-16 Thread Abdeali Kothari

Hi, In spark 2.3+ I saw that pyarrow was being used in a bunch of places in spark. And I was trying to understand the benefit in terms of serialization / deserializaiton it provides. I understand that the new pandas-udf works only if pyarrow is installed. But what about the plain old PythonUDF whi

Re: Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-16 Thread Dongjoon Hyun

Thank you for volunteering for 2.3.4 release manager, Kazuaki! It's great to see a new release manager in advance. :D Thank you for reply, Stavros. In addition to that issue, I'm also monitoring some other K8s issues and PRs. But, I'm not sure we can have that because some PRs seems to fail at bui

Re: [PySpark] [SparkR] Is it possible to invoke a PySpark function with a SparkR DataFrame?

2019-07-16 Thread Felix Cheung

Not currently in Spark. However, there are systems out there that can share DataFrame between languages on top of Spark - it’s not calling the python UDF directly but you can pass the DataFrame to python and then .map(UDF) that way. From: Fiske, Danny Sent: Mo

Re: Sorting tuples with byte key and byte value

2019-07-16 Thread Supun Kamburugamuve

Thanks, Keith. we have set the SPARK_WORKER_INSTANCES=8. So that means we are running 8 workers in a single machine with 1 thread and this gives the 8 threads? Is there a preference for running 1 worker and 8 threads inside it? These are dual CPU machines, so I believe we at least need 2 worker in

Re: spark python script importError problem

2019-07-16 Thread Patrick McCarthy

Your module 'feature' isn't available to the yarn workers, so you'll need to either install it on them if you have access, or else upload to the workers at runtime using --py-files or similar. On Tue, Jul 16, 2019 at 7:16 AM zenglong chen wrote: > Hi,all, > When i run a run a python script

spark python script importError problem

2019-07-16 Thread zenglong chen

Hi,all, When i run a run a python script on spark submit,it done well in local[*] mode,but not in standalone mode or yarn mode.The error like below: Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/pyspa

Re: Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-16 Thread Kazuaki Ishizaki

Thank you Dongjoon for being a release manager. If the assumed dates are ok, I would like to volunteer for an 2.3.4 release manager. Best Regards, Kazuaki Ishizaki, From: Dongjoon Hyun To: dev , "user @spark" , Apache Spark PMC Date: 2019/07/13 07:18 Subject:[EXTERNAL] Re:

Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-16 Thread Stavros Kontopoulos

Hi Dongjoon, Should we also consider fixing https://issues.apache.org/jira/browse/SPARK-27812 before the cut? Best, Stavros On Mon, Jul 15, 2019 at 7:04 PM Dongjoon Hyun wrote: > Hi, Apache Spark PMC members. > > Can we cut Apache Spark 2.4.4 next Monday (22nd July)? > > Bests, > Dongjoon. > >

event log directory(spark-history) filled by large .inprogress files for spark streaming applications

2019-07-16 Thread raman gugnani

HI , I have long running spark streaming jobs. Event log directories are getting filled with .inprogress files. Is there fix or work around for spark streaming. There is also one jira raised for the same by one reporter. https://issues.apache.org/jira/browse/SPARK-22783 -- Raman Gugnani 85888

Usage of PyArrow in Spark

Re: Re: Release Apache Spark 2.4.4 before 3.0.0

Re: [PySpark] [SparkR] Is it possible to invoke a PySpark function with a SparkR DataFrame?

Re: Sorting tuples with byte key and byte value

Re: spark python script importError problem

spark python script importError problem

Re: Re: Release Apache Spark 2.4.4 before 3.0.0

Re: Release Apache Spark 2.4.4 before 3.0.0

event log directory(spark-history) filled by large .inprogress files for spark streaming applications

9 matches

Site Navigation

Mail list logo

Footer information