Alan, On my second thought, union of two incompatible data streams can cause undefined state in downstream operators, resulting in a mix of good output and garbage. This seems to break the rule of least surprise. What do you think?
Pi On Wed, May 14, 2008 at 9:06 AM, pi song <[EMAIL PROTECTED]> wrote: > Ok, will follow that. > > > On 5/14/08, Alan Gates <[EMAIL PROTECTED]> wrote: >> >> I agree that option 3 is the correct course. >> >> One note, you say: >> >> In case that schemas from all the input ports are not compatible, no >> problem >> because we won't process it. >> >> How do you mean "won't process it"? We still have to allow a union >> operation between two non-compatible inputs (otherwise we can only use union >> when we have schemas). But the resulting union will not have a schema >> (since the output no longer has a consistent schema). >> >> Alan. >> >> >> pi song wrote: >> >>> Union is an example of bag (relational) operators that can have more than >>> one input. >>> >>> In case that schemas from all the input ports are the same, no problem. >>> In case that schemas from all the input ports are not compatible, no >>> problem >>> because we won't process it. >>> In case that schemas from all the input ports are not the same, but >>> compatible, here comes a problem. >>> >>> Example: >>> >>> C = UNION A,B ; >>> >>> Schema(A) = < Int, Chararray > >>> Schema(B) = < Double, Chararray > >>> >>> The output schema will get resolved to < Double, Chararray >. Here is the >>> problem. The Union operator at the moment doesn't support casting in any >>> layer. In this case if we don't cast it, the binary data of Int will get >>> picked up as Double by the downstream operator!! There are a couple >>> solutions for this:- >>> >>> 1) Implement LOUnion and POUnion to support type casting internally >>> 2) Add casting support in LOUnion operator and let the LogicalToPhysical >>> compiler generates LOForeach for it. >>> 3) Explicitly insert LOForEach to do necessary casting between Union and >>> the >>> problematic input. This is analogous to the way we implement implicit >>> casting for expression operators. >>> 4) Don't support "not same but compatible" case at all. >>> >>> I will do (3) because it makes the most sense to me plus incurs the least >>> impact on other modules. Does anyone have problem with it? >>> >>> Pi >>> >>> >>> >> >
