[Pig Wiki] Update of "SemanticsCleanup" by AlanGates
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "SemanticsCleanup" page has been changed by AlanGates. http://wiki.apache.org/pig/SemanticsCleanup?action=diff&rev1=2&rev2=3 -- || [[https://issues.apache.org/jira/browse/PIG-1584|PIG-1584]] || Grammar || Cogroup inner does not match the semantics of inner join. It is also not clear what value the inner keyword has for cogroup. Consider removing it. || || || [[https://issues.apache.org/jira/browse/PIG-1538|PIG-1538]] || Nested types || Remove two level access || Maybe, if we can find a way to ignore calls to Schema.isTwoLevelAccessRequired(). || || [[https://issues.apache.org/jira/browse/PIG-1536|PIG-1536]] || Schema || Pick one semantic for schema merges and use it consistently throughout Pig || no || + || [[https://issues.apache.org/jira/browse/PIG-1371|PIG-1371]] || Nested types || unknown || || || [[https://issues.apache.org/jira/browse/PIG-1341|PIG-1341]] || Dynamic type binding || Close as won't fix || yes || || [[https://issues.apache.org/jira/browse/PIG-1281|PIG-1281]] || Dynamic type binding || In situations where a Hadoop shuffle key is assumed to be of type bytearray wrap the value in a tuple so that if the type is actually something else Hadoop can still process it. || yes || || [[https://issues.apache.org/jira/browse/PIG-1277|PIG-1277]] || Nested types || Unknown || || + || [[https://issues.apache.org/jira/browse/PIG-1222|PIG-1222]] || Dynamic type binding || The issue here is that Pig thinks the field is a bytearray while BinStorage actually produces a String. Need a way to handle these issues on the fly. || || || [[https://issues.apache.org/jira/browse/PIG-1188|PIG-1188]] || Schema || Make sure Pig handles missing data in Tuples by returning a null rather than failing. || yes || || [[https://issues.apache.org/jira/browse/PIG-1112|PIG-1112]] || Schema || When user provides AS to flatten of undefined bag or tuple, the contents of that AS are taken to be the schema of the bag or tuple. || yes || || [[https://issues.apache.org/jira/browse/PIG-1065|PIG-1065]] || Dynamic type binding || In situations where a Hadoop shuffle key is assumed to be of type bytearray wrap the value in a tuple so that if the type is actually something else Hadoop can still process it. || yes || || [[https://issues.apache.org/jira/browse/PIG-999|PIG-999]] || Dynamic type binding || In situations where a Hadoop shuffle key is assumed to be of type bytearray wrap the value in a tuple so that if the type is actually something else Hadoop can still process it. || yes || + || [[https://issues.apache.org/jira/browse/PIG-847|PIG-847]] || Nested types || Remove two level access || maybe || + || [[https://issues.apache.org/jira/browse/PIG-828|PIG-828]] || Nested types || According to the rules of Pig Latin, this should produce a bag with one field. Need to make sure that is what Pig is trying to do in this case. || yes || || [[https://issues.apache.org/jira/browse/PIG-767|PIG-767]] || Nested types || Remove two level access; bring DUMP and DESCRIBE output into sync. || no || + || [[https://issues.apache.org/jira/browse/PIG-749|PIG-749]] || Schema || Related to PIG-1112 || yes || || [[https://issues.apache.org/jira/browse/PIG-730|PIG-730]] || Nested types || Make sure schema of union is the same as schema before union (suspect his is a two level access issue) || unclear || || [[https://issues.apache.org/jira/browse/PIG-723|PIG-723]] || Nested types || Suspect this is a two level access issue || unclear || || [[https://issues.apache.org/jira/browse/PIG-696|PIG-696]] || Dynamic type binding || Class cast exceptions such as this should result in a null value and a warning, not a failure. || yes || || [[https://issues.apache.org/jira/browse/PIG-694|PIG-694]] || Nested types || Determine the semantics for merging tuples and bags. || unclear || + || [[https://issues.apache.org/jira/browse/PIG-678|PIG-678]] || Grammar || Decide whether we want to support this extension. || yes || || [[https://issues.apache.org/jira/browse/PIG-621|PIG-621]] || Dynamic type binding || Class cast exceptions such as this should result in a null value and a warning, not a failure. || yes || || [[https://issues.apache.org/jira/browse/PIG-435|PIG-435]] || Schema || Decide definitely on what it means when users declare a schema for a load. || unclear || || [[https://issues.apache.org/jira/browse/PIG-333|PIG-333]] || Dynamic type binding || Since it is specified that MIN and MAX treat unknown types as double, all the actual string data should be converted to NULLs, rather than cause errors. || yes || || [[https://issues.apache.org/jira/browse/PIG-313|PIG-313]] || Grammar || I propose that we continue not supporting this. But we should detect it at compile time rather than at runtime. || yes || + + Bugs I need to
[Pig Wiki] Update of "SemanticsCleanup" by AlanGates
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "SemanticsCleanup" page has been changed by AlanGates. http://wiki.apache.org/pig/SemanticsCleanup?action=diff&rev1=1&rev2=2 -- The bugs have been placed into the following categories: * Schema: These are related to schemas that are improperly inferred, etc. * Grammar: Places where the grammar is unclear or produces unexpected results. - * Two Level Access: The concept of two level access was introduced long ago to deal with oddities in bag schemas. Ideally we will remove this. At least we have to improve it. + * Nested Types: Issues dealing with bags, tuples, and maps. + * Dynamic Type Binding: In certain situations Pig assumes a value to be of type byte array when it does not know the actual type, and handles whatever actual type it is at runtime. There are situations where this does not work properly. == Bug Table == - || *JIRA* || *Category* || *Proposed Solution* || + || '''JIRA''' || '''Category''' || '''Proposed Solution''' || '''Backward Compatible''' || - || [[https://issues.apache.org/jira/browse/PIG-1627|PIG-1627]] || Schema || Flattening a bag with an unknown schema should produce a record with an unknown schema || + || [[https://issues.apache.org/jira/browse/PIG-1627|PIG-1627]] || Schema || Flattening a bag with an unknown schema should produce a record with an unknown schema || no || - || [[https://issues.apache.org/jira/browse/PIG-1584|PIG-1584]] || Grammar || Cogroup inner does not match the semantics of inner join. It is also not clear what value the inner keyword has for cogroup. || + || [[https://issues.apache.org/jira/browse/PIG-1584|PIG-1584]] || Grammar || Cogroup inner does not match the semantics of inner join. It is also not clear what value the inner keyword has for cogroup. Consider removing it. || || - || [[https://issues.apache.org/jira/browse/PIG-1538|PIG-1538]] || Two level access || Remove two level access || + || [[https://issues.apache.org/jira/browse/PIG-1538|PIG-1538]] || Nested types || Remove two level access || Maybe, if we can find a way to ignore calls to Schema.isTwoLevelAccessRequired(). || - || [[https://issues.apache.org/jira/browse/PIG-1536|PIG-1536]] || Schema || Pig one semantic for schema merges and use it consistently throughout Pig || + || [[https://issues.apache.org/jira/browse/PIG-1536|PIG-1536]] || Schema || Pick one semantic for schema merges and use it consistently throughout Pig || no || + || [[https://issues.apache.org/jira/browse/PIG-1341|PIG-1341]] || Dynamic type binding || Close as won't fix || yes || + || [[https://issues.apache.org/jira/browse/PIG-1281|PIG-1281]] || Dynamic type binding || In situations where a Hadoop shuffle key is assumed to be of type bytearray wrap the value in a tuple so that if the type is actually something else Hadoop can still process it. || yes || + || [[https://issues.apache.org/jira/browse/PIG-1277|PIG-1277]] || Nested types || Unknown || || + || [[https://issues.apache.org/jira/browse/PIG-1188|PIG-1188]] || Schema || Make sure Pig handles missing data in Tuples by returning a null rather than failing. || yes || + || [[https://issues.apache.org/jira/browse/PIG-1112|PIG-1112]] || Schema || When user provides AS to flatten of undefined bag or tuple, the contents of that AS are taken to be the schema of the bag or tuple. || yes || + || [[https://issues.apache.org/jira/browse/PIG-1065|PIG-1065]] || Dynamic type binding || In situations where a Hadoop shuffle key is assumed to be of type bytearray wrap the value in a tuple so that if the type is actually something else Hadoop can still process it. || yes || + || [[https://issues.apache.org/jira/browse/PIG-999|PIG-999]] || Dynamic type binding || In situations where a Hadoop shuffle key is assumed to be of type bytearray wrap the value in a tuple so that if the type is actually something else Hadoop can still process it. || yes || + || [[https://issues.apache.org/jira/browse/PIG-767|PIG-767]] || Nested types || Remove two level access; bring DUMP and DESCRIBE output into sync. || no || + || [[https://issues.apache.org/jira/browse/PIG-730|PIG-730]] || Nested types || Make sure schema of union is the same as schema before union (suspect his is a two level access issue) || unclear || + || [[https://issues.apache.org/jira/browse/PIG-723|PIG-723]] || Nested types || Suspect this is a two level access issue || unclear || + || [[https://issues.apache.org/jira/browse/PIG-696|PIG-696]] || Dynamic type binding || Class cast exceptions such as this should result in a null value and a warning, not a failure. || yes || + || [[https://issues.apache.org/jira/browse/PIG-694|PIG-694]] || Nested types || Determine the semantics for merging tuples and bags. || unclear || + || [[https://issues.apache.org/jira/browse/PIG-621|PIG-621]] || Dynamic type binding |
[Pig Wiki] Update of "SemanticsCleanup" by AlanGates
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification. The "SemanticsCleanup" page has been changed by AlanGates. http://wiki.apache.org/pig/SemanticsCleanup -- New page: == Introduction == A number of bugs have been filed against Pig that roughly fall under the area of poorly defined or undefined semantics. In the 0.9 Pig release we would like to take on a number of these issues, clarifying semantics where they are unclear, defining them where they are undefined, and correctly them where they are clearly wrong. This page will classifies the existing bugs and indicates what we believe the proper fix is for them. == Categories == The bugs have been placed into the following categories: * Schema: These are related to schemas that are improperly inferred, etc. * Grammar: Places where the grammar is unclear or produces unexpected results. * Two Level Access: The concept of two level access was introduced long ago to deal with oddities in bag schemas. Ideally we will remove this. At least we have to improve it. == Bug Table == || *JIRA* || *Category* || *Proposed Solution* || || [[https://issues.apache.org/jira/browse/PIG-1627|PIG-1627]] || Schema || Flattening a bag with an unknown schema should produce a record with an unknown schema || || [[https://issues.apache.org/jira/browse/PIG-1584|PIG-1584]] || Grammar || Cogroup inner does not match the semantics of inner join. It is also not clear what value the inner keyword has for cogroup. || || [[https://issues.apache.org/jira/browse/PIG-1538|PIG-1538]] || Two level access || Remove two level access || || [[https://issues.apache.org/jira/browse/PIG-1536|PIG-1536]] || Schema || Pig one semantic for schema merges and use it consistently throughout Pig ||