IC

On 5/15/08, Alan Gates <[EMAIL PROTECTED]> wrote:
>
> I doubt you'll get the votes on setting it on by default.  Pig's founders
> have been fairly adamant that pig continue to work in the no metadata case.
>  Turning this on by default would break that rule.
>
> Alan.
>
> pi song wrote:
>
>> We can have that "strict typing" option in pig.properties and then make
>> the
>> type checking validation consuming that config key. However by default I
>> want to turn it on.
>>
>> Pi
>>
>>
>> On 5/15/08, Alan Gates <[EMAIL PROTECTED]> wrote:
>>
>>
>>> I agree this will be somewhat surprising, perhaps we should give a
>>> warning.
>>>  But we need to preserve our philosophy that "Pig's eat anything".  This
>>> would seem to dictate that we allow people to use union regardless of the
>>> schemas.  One open question in my mind is whether we have a "strict mode"
>>> (similar to 'use strict' in perl) where things like this cause errors
>>> instead of (possibly) warnings.
>>>
>>> Alan.
>>>
>>> pi song wrote:
>>>
>>>
>>>
>>>> Alan,
>>>>
>>>> On my second thought, union of two incompatible data streams can cause
>>>> undefined state in downstream operators, resulting in a mix of good
>>>> output
>>>> and garbage. This seems to break the rule of least surprise. What do you
>>>> think?
>>>>
>>>> Pi
>>>>
>>>> On Wed, May 14, 2008 at 9:06 AM, pi song <[EMAIL PROTECTED]> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Ok, will follow that.
>>>>>
>>>>>
>>>>> On 5/14/08, Alan Gates <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> I agree that option 3 is the correct course.
>>>>>>
>>>>>> One note, you say:
>>>>>>
>>>>>> In case that schemas from all the input ports are not compatible, no
>>>>>> problem
>>>>>> because we won't process it.
>>>>>>
>>>>>> How do you mean "won't process it"?  We still have to allow a union
>>>>>> operation between two non-compatible inputs (otherwise we can only use
>>>>>> union
>>>>>> when we have schemas).  But the resulting union will not have a schema
>>>>>> (since the output no longer has a consistent schema).
>>>>>>
>>>>>> Alan.
>>>>>>
>>>>>>
>>>>>> pi song wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Union is an example of bag (relational) operators that can have more
>>>>>>> than
>>>>>>> one input.
>>>>>>>
>>>>>>> In case that schemas from all the input ports are the same, no
>>>>>>> problem.
>>>>>>> In case that schemas from all the input ports are not compatible, no
>>>>>>> problem
>>>>>>> because we won't process it.
>>>>>>> In case that schemas from all the input ports are not the same, but
>>>>>>> compatible, here comes a problem.
>>>>>>>
>>>>>>> Example:
>>>>>>>
>>>>>>> C = UNION A,B ;
>>>>>>>
>>>>>>> Schema(A) = < Int, Chararray >
>>>>>>> Schema(B) = < Double, Chararray >
>>>>>>>
>>>>>>> The output schema will get resolved to < Double, Chararray >. Here is
>>>>>>> the
>>>>>>> problem. The Union operator at the moment doesn't support casting in
>>>>>>> any
>>>>>>> layer. In this case if we don't cast it, the binary data of Int will
>>>>>>> get
>>>>>>> picked up as Double by the downstream operator!! There are a couple
>>>>>>> solutions for this:-
>>>>>>>
>>>>>>> 1) Implement LOUnion and POUnion to support type casting internally
>>>>>>> 2) Add casting support in LOUnion operator and let the
>>>>>>> LogicalToPhysical
>>>>>>> compiler generates LOForeach for it.
>>>>>>> 3) Explicitly insert LOForEach to do necessary casting between Union
>>>>>>> and
>>>>>>> the
>>>>>>> problematic input. This is analogous to the way we implement implicit
>>>>>>> casting for expression operators.
>>>>>>> 4) Don't support "not same but compatible" case at all.
>>>>>>>
>>>>>>> I will do (3) because it makes the most sense to me plus incurs the
>>>>>>> least
>>>>>>> impact on other modules. Does anyone have problem with it?
>>>>>>>
>>>>>>> Pi
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>>
>

Reply via email to