Hi Jon, I'm pretty sure your input will be processed only once. I may be wrong ( correction needed if so ), but your pipeline should be compiled as : source --> flatmap(words) -> result /sink |---> flatmap(chars) -> result /sink
As your input become streamed, each "line" goes through pipeline and is processed by each flatmap. These flatmap will produce both a result and send it to their sink. So your input is processed twice but at the same time. hope that help 2016-05-27 22:33 GMT+02:00 Jon Stewart <jstew...@strozfriedberg.com>: > Hello, > > I am new to Apache Flink, so my apologies if this is a common question. I > have a rather complex operation I'd like to apply to an item in a data set. > Conceptually, the operation could produce many types of each data, each one > that I'd like to flow into a different result set. > > In Flink, it looks like the output of a flatMap operation must be of the > same type, so I would need to split my processing up from a complex map > operation to several to express the flow. For example, I might want to > split a data set of text lines into words as well as individual characters: > > val lines: DataSet[String] = // lines of text > val words = lines.flatMap { _.split(" ") } > val chars = lines.flatMap { _.toCharArray() } > > Since "words" and "chars" in the example above have the same input DataSet > and both have a flatMap operation applied to them, will "lines" only be > iterated once and have both operations computed simultaneously? The big > problem I have is that my objects are considerably heavier-weight than > lines of text, so I really only want to iterate them once while performing > multiple operations on them. > > Thank in advance, > > Jon > > -- Alexis Gendronneau alexis.gendronn...@corp.ovh.com a.gendronn...@gmail.com