Re: foreachPartition in Spark Java API

2017-05-30 Thread Anton Kravchenko
//ForEachPartFunction.java: import org.apache.spark.api.java.function.ForeachPartitionFunction; import org.apache.spark.sql.Row; import java.util.ArrayList; import java.util.Iterator; import java.util.List; public class ForEachPartFunction implements ForeachPartitionFunction{ public void call

Re: foreachPartition in Spark Java API

2017-05-30 Thread Anton Kravchenko
Ok, there are at least two ways to do it: Dataset df = spark.read.csv("file:///C:/input_data/*.csv") df.foreachPartition(new ForEachPartFunction()); df.toJavaRDD().foreachPartition(new Void_java_func()); where ForEachPartFunction and Void_java_func are defined below: // ForEachPartFunction.java:

Re: Foreachpartition in spark streaming

2017-03-20 Thread Ryan
foreachPartition is an action but run on each worker, which means you won't see anything on driver. mapPartitions is a transformation which is lazy and won't do anything until an action. it depends on the specific use case which is better. To output sth(like a print in single machine) you could r

Re: foreachPartition

2015-10-30 Thread Alex Nastetsky
Ahh, makes sense. Knew it was going to be something simple. Thanks. On Fri, Oct 30, 2015 at 7:45 PM, Mark Hamstra wrote: > The closure is sent to and executed an Executor, so you need to be looking > at the stdout of the Executors, not on the Driver. > > On Fri, Oct 30, 2015 at 4:42 PM, Alex Nas

Re: foreachPartition

2015-10-30 Thread Mark Hamstra
The closure is sent to and executed an Executor, so you need to be looking at the stdout of the Executors, not on the Driver. On Fri, Oct 30, 2015 at 4:42 PM, Alex Nastetsky < alex.nastet...@vervemobile.com> wrote: > I'm just trying to do some operation inside foreachPartition, but I can't > even

Re: foreachPartition and task status

2014-10-14 Thread Salman Haq
On Tue, Oct 14, 2014 at 12:42 PM, Sean McNamara wrote: > Are you using spark streaming? > > No, not at this time.

Re: foreachPartition and task status

2014-10-14 Thread Sean McNamara
Are you using spark streaming? On Oct 14, 2014, at 10:35 AM, Salman Haq wrote: > Hi, > > In my application, I am successfully using foreachPartition to write large > amounts of data into a Cassandra database. > > What is the recommended practice if the application wants to know that the > ta

Re: foreachPartition: write to multiple files

2014-10-08 Thread david
Hi, I finally found a solution after reading the post : http://apache-spark-user-list.1001560.n3.nabble.com/how-to-split-RDD-by-key-and-save-to-different-path-td11887.html#a11983 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/foreachPartition-write-to-