from:"Michael Mansour"

Re: [EXT] How do I extract a value in foreachRDD operation

2018-01-22 Thread Michael Mansour

Toy, I suggest your partition your data according to date, and use the forEachPartition function, using the partition as the bucket location. This would require you to define a custom hash partitioner function, but that is not too difficult. -- Michael Mansour Data Scientist Symantec From: Toy

Re: [EXT] Debugging a local spark executor in pycharm

2018-03-13 Thread Michael Mansour

pass it into the function. This alleviates the need to write debugging code etc. I find this model useful and a bit more fast, but it does not offer the step-through capability. Best of luck! M -- Michael Mansour Data Scientist Symantec CASB From: Vitaliy Pisarev Date: Sunday, March 11, 2018 at 8

Re: [EXT] [Spark 2.x Core] .collect() size limit

2018-04-30 Thread Michael Mansour

. Please expand on what you're trying to achieve here. -- Michael Mansour Data Scientist Symantec CASB On 4/28/18, 8:41 AM, "klrmowse" wrote: i am currently trying to find a workaround for the Spark application i am working on so that it does not have to use .collect()

Re: [EXT] handling skewness issues

2019-04-29 Thread Michael Mansour

There were recently some fantastic talks about this at the SparkSummit conference in San Francisco. I suggest you check out the SparkSummit YouTube channel after May 9th for a deep dive into this topic. From: rajat kumar Date: Monday, April 29, 2019 at 9:34 AM To: "user@spark.apache.org" Subj

Re: [EXT] Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Michael Mansour (CS)

expression” tool, and pass them through the function In expression evaluator. Hope this helps -- Michael Mansour -- Michael Mansour Data Scientist Symantec Cloud Security From: Pavel Klemenkov Date: Wednesday, May 10, 2017 at 10:43 AM To: "user@spark.apache.org" Subject: [EXT] Re: [

[PySpark] - Broadcast Variable Pickle Registry Usage?

2017-05-24 Thread Michael Mansour (CS)

Hi all, I’m poking around the Pyspark.Broadcast module, and I notice that one can pass in a `pickle_registry` and a `path`. The documentation does not outline the pickle registry use and I’m curious about how to use it, and if there are any advantages to it. Thanks, Michael Mansour

Re: [EXT] How do I extract a value in foreachRDD operation

Re: [EXT] Debugging a local spark executor in pycharm

Re: [EXT] [Spark 2.x Core] .collect() size limit

Re: [EXT] handling skewness issues

Re: [EXT] Re: [Spark Core]: Python and Scala generate different DAGs for identical code

[PySpark] - Broadcast Variable Pickle Registry Usage?

6 matches

Site Navigation

Mail list logo

Footer information