Hi! I put a quick summary into the wiki. For future reference.
https://cwiki.apache.org/confluence/display/FLINK/Variables+Closures+vs.+Broadcast+Variables Greetings, Stephan On Mon, Apr 27, 2015 at 11:10 AM, Stephan Ewen <se...@apache.org> wrote: > Adding to Fabian's and Sebastian's answer: > > > Variable in Closure (global variable) > ------------------------------------------------------ > - Happens when you reference some variable in the program from a > function. The variable becomes part of the Function's closure. > - The variable is distributed with the CODE. It is part of the function > object and is distributed with by the TaskDeployment messages. > - Data needs to be available in the driver program (cannot be a Flink > DataSet, which lives distributedly) > - Should be used for constants or config parameters or simple scalar > values. > > Summary: Small data that is available on the client (driver program) > > > > Broadcast set > ------------------------------------------------------ > - Refers to data that is produced by a Flink operation (DataSet) and that > lives in the cluster, rather than on the client (or in the driver program) > - Data distribution is part of the distributed data flow and happens > through the Flink network stack > - Can be much larger than the closure variables. > - Should be used when you want to make an intermediate result of a Flink > computation accessible to all functions. > > > Greetings, > Stephan > > > > On Mon, Apr 27, 2015 at 10:23 AM, Fabian Hueske <fhue...@gmail.com> wrote: > >> You should also be aware that the value of a static variable is only >> accessible within the same JVM. >> Flink is a distributed system and runs in multiple JVMs. So if you set a >> value in one JVM it is not visible in another JVM (on a different node). >> >> In general, I would avoid to use static variables in Flink programs. >> >> Best, Fabian >> >> 2015-04-26 9:54 GMT+02:00 Sebastian <s...@apache.org>: >> >>> Hi Hung, >>> >>> A broadcast variable can also refer to an intermediate result of a Flink >>> computation. >>> >>> Best, >>> Sebastian >>> >>> >>> On 25.04.2015 21:10, HungChang wrote: >>> >>>> Hi, >>>> >>>> What would be the difference between using global variable and >>>> broadcasting >>>> it? >>>> >>>> A toy example: >>>> >>>> // Using global >>>> {{... >>>> private static int num = 10; >>>> } >>>> >>>> public class DivByTen implements FlatMapFunction<Tuple1<Double>, >>>> Tuple1<Double>> { >>>> @Override >>>> public void flatMap(Tuple1<Double>value, >>>> Collector<Tuple1<Double>> out) >>>> { >>>> out.collect(new Tuple1<Double>(value/ num)); >>>> } >>>> }} >>>> >>>> // Using broadcasting : >>>> {... >>>> public static class DivByTen extends >>>> RichGMapFunction<Tuple1<Double>, >>>> Tuple1<Double>>{ >>>> >>>> private long num; >>>> >>>> @Override >>>> public void open(Configuration parameters) throws >>>> Exception { >>>> super.open(parameters); >>>> num = getRuntimeContext().<Integer> >>>> getBroadcastVariable( >>>> "num").get(0); >>>> } >>>> >>>> @Override >>>> public void map(Tuple1<Double>value, >>>> Collector<Tuple1<Double>> out)) >>>> throws Exception{ >>>> out.collect(new Tuple1<Double>(value/num)); >>>> } >>>> } >>>> } >>>> >>>> Best regards, >>>> >>>> Hung >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Difference-between-using-a-global-variable-and-broadcasting-a-variable-tp1128.html >>>> Sent from the Apache Flink User Mailing List archive. mailing list >>>> archive at Nabble.com. >>>> >>>> >> >