Hi, non-equi joins are only supported by building the cross product. This is essentially the nested-loop join strategy, that a conventional database system would chose. However, such joins are prohibitively expensive when applied to large data sets. If you have one small and another large data set, you can do the join by broadcasting the smaller side to a MapFunction (withBroadcastSet() [1]) that has the larger data set as regular input and evaluate the join condition in the MapFunction.
The problem with the Any key-selector is, that Flink needs to know the types when the program is optimized because it generates type specific serializers. I think an Any type does not work as join key. Best, Fabian [1] http://flink.apache.org/docs/0.8/programming_guide.html#broadcast-variables 2015-02-21 10:28 GMT+01:00 Vinh June <[email protected]>: > Hello, > > I have some questions concerning Join: > > 1. I would like to make join with different conditions, is there any way to > create a Join with conditions different to "equalTo", for example, how > would > I make a join with > or >= > > 2. I have a DataSet[Map[String, Any]]. Is it possible to specify > KeySelector > using a map key? I tried to use below Scala code but it doesn't work > > Set1.join(Set2).where(_.get("key")).equalTo(_.get("key")) > > > > -- > View this message in context: > http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Some-questions-about-Join-tp780.html > Sent from the Apache Flink (Incubator) User Mailing List archive. mailing > list archive at Nabble.com. >
