> > * unionAll preserve duplicate v/s union that does not > This is true, if you want to eliminate duplicate items you should follow the union with a distinct()
> * SQL union and unionAll result in same output format i.e. another SQL v/s > different RDD types here. > * Understand the existing union contract issue. This may be a class > hierarchy discussion for SchemaRDD, UnionRDD etc. ? > This is unfortunately going to be a limitation of the query DSL since it extends standard RDDs. It is not possible for us to return specialized types from functions that are already defined in RDD (such as union) as the base RDD class has a very opaque notion of schema, and at this point the API for RDDs is very fixed. If you use SQL however, you will always get back SchemaRDDs.