We can have that "strict typing" option in pig.properties and then make the
type checking validation consuming that config key. However by default I
want to turn it on.

Pi


On 5/15/08, Alan Gates <[EMAIL PROTECTED]> wrote:
>
> I agree this will be somewhat surprising, perhaps we should give a warning.
>  But we need to preserve our philosophy that "Pig's eat anything".  This
> would seem to dictate that we allow people to use union regardless of the
> schemas.  One open question in my mind is whether we have a "strict mode"
> (similar to 'use strict' in perl) where things like this cause errors
> instead of (possibly) warnings.
>
> Alan.
>
> pi song wrote:
>
>> Alan,
>>
>> On my second thought, union of two incompatible data streams can cause
>> undefined state in downstream operators, resulting in a mix of good output
>> and garbage. This seems to break the rule of least surprise. What do you
>> think?
>>
>> Pi
>>
>> On Wed, May 14, 2008 at 9:06 AM, pi song <[EMAIL PROTECTED]> wrote:
>>
>>
>>
>>> Ok, will follow that.
>>>
>>>
>>> On 5/14/08, Alan Gates <[EMAIL PROTECTED]> wrote:
>>>
>>>
>>>> I agree that option 3 is the correct course.
>>>>
>>>> One note, you say:
>>>>
>>>> In case that schemas from all the input ports are not compatible, no
>>>> problem
>>>> because we won't process it.
>>>>
>>>> How do you mean "won't process it"?  We still have to allow a union
>>>> operation between two non-compatible inputs (otherwise we can only use
>>>> union
>>>> when we have schemas).  But the resulting union will not have a schema
>>>> (since the output no longer has a consistent schema).
>>>>
>>>> Alan.
>>>>
>>>>
>>>> pi song wrote:
>>>>
>>>>
>>>>
>>>>> Union is an example of bag (relational) operators that can have more
>>>>> than
>>>>> one input.
>>>>>
>>>>> In case that schemas from all the input ports are the same, no problem.
>>>>> In case that schemas from all the input ports are not compatible, no
>>>>> problem
>>>>> because we won't process it.
>>>>> In case that schemas from all the input ports are not the same, but
>>>>> compatible, here comes a problem.
>>>>>
>>>>> Example:
>>>>>
>>>>> C = UNION A,B ;
>>>>>
>>>>> Schema(A) = < Int, Chararray >
>>>>> Schema(B) = < Double, Chararray >
>>>>>
>>>>> The output schema will get resolved to < Double, Chararray >. Here is
>>>>> the
>>>>> problem. The Union operator at the moment doesn't support casting in
>>>>> any
>>>>> layer. In this case if we don't cast it, the binary data of Int will
>>>>> get
>>>>> picked up as Double by the downstream operator!! There are a couple
>>>>> solutions for this:-
>>>>>
>>>>> 1) Implement LOUnion and POUnion to support type casting internally
>>>>> 2) Add casting support in LOUnion operator and let the
>>>>> LogicalToPhysical
>>>>> compiler generates LOForeach for it.
>>>>> 3) Explicitly insert LOForEach to do necessary casting between Union
>>>>> and
>>>>> the
>>>>> problematic input. This is analogous to the way we implement implicit
>>>>> casting for expression operators.
>>>>> 4) Don't support "not same but compatible" case at all.
>>>>>
>>>>> I will do (3) because it makes the most sense to me plus incurs the
>>>>> least
>>>>> impact on other modules. Does anyone have problem with it?
>>>>>
>>>>> Pi
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>
>>
>

Reply via email to