Hello,
I am new to Apache Flink, so my apologies if this is a common question. I have
a rather complex operation I'd like to apply to an item in a data set.
Conceptually, the operation could produce many types of each data, each one
that I'd like to flow into a different result set.
In Flink, it looks like the output of a flatMap operation must be of the same
type, so I would need to split my processing up from a complex map operation to
several to express the flow. For example, I might want to split a data set of
text lines into words as well as individual characters:
val lines: DataSet[String] = // lines of text
val words = lines.flatMap { _.split(" ") }
val chars = lines.flatMap { _.toCharArray() }
Since "words" and "chars" in the example above have the same input DataSet and
both have a flatMap operation applied to them, will "lines" only be iterated
once and have both operations computed simultaneously? The big problem I have
is that my objects are considerably heavier-weight than lines of text, so I
really only want to iterate them once while performing multiple operations on
them.
Thank in advance,
Jon