[Pig Wiki] Update of "SemanticsCleanup" by AlanGates

2010-09-21 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "SemanticsCleanup" page has been changed by AlanGates.
http://wiki.apache.org/pig/SemanticsCleanup?action=diff&rev1=2&rev2=3

--

  || [[https://issues.apache.org/jira/browse/PIG-1584|PIG-1584]] || Grammar || 
Cogroup inner does not match the semantics of inner join.  It is also not clear 
what value the inner keyword has for cogroup. Consider removing it. || ||
  || [[https://issues.apache.org/jira/browse/PIG-1538|PIG-1538]] || Nested 
types || Remove two level access || Maybe, if we can find a way to ignore calls 
to Schema.isTwoLevelAccessRequired(). ||
  || [[https://issues.apache.org/jira/browse/PIG-1536|PIG-1536]] || Schema || 
Pick one semantic for schema merges and use it consistently throughout Pig || 
no ||
+ || [[https://issues.apache.org/jira/browse/PIG-1371|PIG-1371]] || Nested 
types || unknown || ||
  || [[https://issues.apache.org/jira/browse/PIG-1341|PIG-1341]] || Dynamic 
type binding || Close as won't fix || yes ||
  || [[https://issues.apache.org/jira/browse/PIG-1281|PIG-1281]] || Dynamic 
type binding || In situations where a Hadoop shuffle key is assumed to be of 
type bytearray wrap the value in a tuple so that if the type is actually 
something else Hadoop can still process it. || yes ||
  || [[https://issues.apache.org/jira/browse/PIG-1277|PIG-1277]] || Nested 
types || Unknown || ||
+ || [[https://issues.apache.org/jira/browse/PIG-1222|PIG-1222]] || Dynamic 
type binding || The issue here is that Pig thinks the field is a bytearray 
while BinStorage actually produces a String.  Need a way to handle these issues 
on the fly. || ||
  || [[https://issues.apache.org/jira/browse/PIG-1188|PIG-1188]] || Schema || 
Make sure Pig handles missing data in Tuples by returning a null rather than 
failing. || yes ||
  || [[https://issues.apache.org/jira/browse/PIG-1112|PIG-1112]] || Schema || 
When user provides AS to flatten of undefined bag or tuple, the contents of 
that AS are taken to be the schema of the bag or tuple. || yes ||
  || [[https://issues.apache.org/jira/browse/PIG-1065|PIG-1065]] || Dynamic 
type binding ||  In situations where a Hadoop shuffle key is assumed to be of 
type bytearray wrap the value in a tuple so that if the type is actually 
something else Hadoop can still process it. || yes ||
  || [[https://issues.apache.org/jira/browse/PIG-999|PIG-999]] || Dynamic type 
binding ||  In situations where a Hadoop shuffle key is assumed to be of type 
bytearray wrap the value in a tuple so that if the type is actually something 
else Hadoop can still process it. || yes ||
+ || [[https://issues.apache.org/jira/browse/PIG-847|PIG-847]] || Nested types 
|| Remove two level access || maybe ||
+ || [[https://issues.apache.org/jira/browse/PIG-828|PIG-828]] || Nested types 
|| According to the rules of Pig Latin, this should produce a bag with one 
field.  Need to make sure that is what Pig is trying to do in this case. || yes 
||
  || [[https://issues.apache.org/jira/browse/PIG-767|PIG-767]] || Nested types 
|| Remove two level access; bring DUMP and DESCRIBE output into sync. || no ||
+ || [[https://issues.apache.org/jira/browse/PIG-749|PIG-749]] || Schema || 
Related to PIG-1112 || yes ||
  || [[https://issues.apache.org/jira/browse/PIG-730|PIG-730]] || Nested types 
|| Make sure schema of union is the same as schema before union (suspect his is 
a two level access issue) || unclear ||
  || [[https://issues.apache.org/jira/browse/PIG-723|PIG-723]] || Nested types 
|| Suspect this is a two level access issue || unclear ||
  || [[https://issues.apache.org/jira/browse/PIG-696|PIG-696]] || Dynamic type 
binding || Class cast exceptions such as this should result in a null value and 
a warning, not a failure. || yes ||
  || [[https://issues.apache.org/jira/browse/PIG-694|PIG-694]] || Nested types 
|| Determine the semantics for merging tuples and bags. || unclear ||
+ || [[https://issues.apache.org/jira/browse/PIG-678|PIG-678]] || Grammar || 
Decide whether we want to support this extension. || yes ||
  || [[https://issues.apache.org/jira/browse/PIG-621|PIG-621]] || Dynamic type 
binding || Class cast exceptions such as this should result in a null value and 
a warning, not a failure. || yes ||
  || [[https://issues.apache.org/jira/browse/PIG-435|PIG-435]] || Schema || 
Decide definitely on what it means when users declare a schema for a load. || 
unclear ||
  || [[https://issues.apache.org/jira/browse/PIG-333|PIG-333]] || Dynamic type 
binding || Since it is specified that MIN and MAX treat unknown types as 
double, all the actual string data should be converted to NULLs, rather than 
cause errors. || yes ||
  || [[https://issues.apache.org/jira/browse/PIG-313|PIG-313]] || Grammar || I 
propose that we continue not supporting this.  But we should detect it at 
compile time rather than at runtime. || yes ||
+ 
+ Bugs I need to 

[Pig Wiki] Update of "SemanticsCleanup" by AlanGates

2010-09-20 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "SemanticsCleanup" page has been changed by AlanGates.
http://wiki.apache.org/pig/SemanticsCleanup?action=diff&rev1=1&rev2=2

--

  The bugs have been placed into the following categories:
   * Schema:  These are related to schemas that are improperly inferred, etc.
   * Grammar:  Places where the grammar is unclear or produces unexpected 
results.
-  * Two Level Access:  The concept of two level access was introduced long ago 
to deal with oddities in bag schemas.  Ideally we will remove this.  At least 
we have to improve it.
+  * Nested Types:  Issues dealing with bags, tuples, and maps.
+  * Dynamic Type Binding:  In certain situations Pig assumes a value to be of 
type byte array when it does not know the actual type, and handles whatever 
actual type it is at runtime.  There are situations where this does not work 
properly.
  
  == Bug Table ==
- || *JIRA* || *Category* || *Proposed Solution* ||
+ || '''JIRA''' || '''Category''' || '''Proposed Solution''' || '''Backward 
Compatible''' ||
- || [[https://issues.apache.org/jira/browse/PIG-1627|PIG-1627]] || Schema || 
Flattening a bag with an unknown schema should produce a record with an unknown 
schema ||
+ || [[https://issues.apache.org/jira/browse/PIG-1627|PIG-1627]] || Schema || 
Flattening a bag with an unknown schema should produce a record with an unknown 
schema || no ||
- || [[https://issues.apache.org/jira/browse/PIG-1584|PIG-1584]] || Grammar || 
Cogroup inner does not match the semantics of inner join.  It is also not clear 
what value the inner keyword has for cogroup. ||
+ || [[https://issues.apache.org/jira/browse/PIG-1584|PIG-1584]] || Grammar || 
Cogroup inner does not match the semantics of inner join.  It is also not clear 
what value the inner keyword has for cogroup. Consider removing it. || ||
- || [[https://issues.apache.org/jira/browse/PIG-1538|PIG-1538]] || Two level 
access || Remove two level access ||
+ || [[https://issues.apache.org/jira/browse/PIG-1538|PIG-1538]] || Nested 
types || Remove two level access || Maybe, if we can find a way to ignore calls 
to Schema.isTwoLevelAccessRequired(). ||
- || [[https://issues.apache.org/jira/browse/PIG-1536|PIG-1536]] || Schema || 
Pig one semantic for schema merges and use it consistently throughout Pig ||
+ || [[https://issues.apache.org/jira/browse/PIG-1536|PIG-1536]] || Schema || 
Pick one semantic for schema merges and use it consistently throughout Pig || 
no ||
+ || [[https://issues.apache.org/jira/browse/PIG-1341|PIG-1341]] || Dynamic 
type binding || Close as won't fix || yes ||
+ || [[https://issues.apache.org/jira/browse/PIG-1281|PIG-1281]] || Dynamic 
type binding || In situations where a Hadoop shuffle key is assumed to be of 
type bytearray wrap the value in a tuple so that if the type is actually 
something else Hadoop can still process it. || yes ||
+ || [[https://issues.apache.org/jira/browse/PIG-1277|PIG-1277]] || Nested 
types || Unknown || ||
+ || [[https://issues.apache.org/jira/browse/PIG-1188|PIG-1188]] || Schema || 
Make sure Pig handles missing data in Tuples by returning a null rather than 
failing. || yes ||
+ || [[https://issues.apache.org/jira/browse/PIG-1112|PIG-1112]] || Schema || 
When user provides AS to flatten of undefined bag or tuple, the contents of 
that AS are taken to be the schema of the bag or tuple. || yes ||
+ || [[https://issues.apache.org/jira/browse/PIG-1065|PIG-1065]] || Dynamic 
type binding ||  In situations where a Hadoop shuffle key is assumed to be of 
type bytearray wrap the value in a tuple so that if the type is actually 
something else Hadoop can still process it. || yes ||
+ || [[https://issues.apache.org/jira/browse/PIG-999|PIG-999]] || Dynamic type 
binding ||  In situations where a Hadoop shuffle key is assumed to be of type 
bytearray wrap the value in a tuple so that if the type is actually something 
else Hadoop can still process it. || yes ||
+ || [[https://issues.apache.org/jira/browse/PIG-767|PIG-767]] || Nested types 
|| Remove two level access; bring DUMP and DESCRIBE output into sync. || no ||
+ || [[https://issues.apache.org/jira/browse/PIG-730|PIG-730]] || Nested types 
|| Make sure schema of union is the same as schema before union (suspect his is 
a two level access issue) || unclear ||
+ || [[https://issues.apache.org/jira/browse/PIG-723|PIG-723]] || Nested types 
|| Suspect this is a two level access issue || unclear ||
+ || [[https://issues.apache.org/jira/browse/PIG-696|PIG-696]] || Dynamic type 
binding || Class cast exceptions such as this should result in a null value and 
a warning, not a failure. || yes ||
+ || [[https://issues.apache.org/jira/browse/PIG-694|PIG-694]] || Nested types 
|| Determine the semantics for merging tuples and bags. || unclear ||
+ || [[https://issues.apache.org/jira/browse/PIG-621|PIG-621]] || Dynamic type 
binding |

[Pig Wiki] Update of "SemanticsCleanup" by AlanGates

2010-09-20 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "SemanticsCleanup" page has been changed by AlanGates.
http://wiki.apache.org/pig/SemanticsCleanup

--

New page:
== Introduction ==
A number of bugs have been filed against Pig that roughly fall under the area 
of poorly defined or undefined semantics.  In the 0.9 Pig release
we would like to take on a number of these issues, clarifying semantics where 
they are unclear, defining them where they are undefined, and
correctly them where they are clearly wrong.  This page will classifies the 
existing bugs and indicates what we believe the proper fix is for
them.

== Categories ==
The bugs have been placed into the following categories:
 * Schema:  These are related to schemas that are improperly inferred, etc.
 * Grammar:  Places where the grammar is unclear or produces unexpected results.
 * Two Level Access:  The concept of two level access was introduced long ago 
to deal with oddities in bag schemas.  Ideally we will remove this.  At least 
we have to improve it.

== Bug Table ==
|| *JIRA* || *Category* || *Proposed Solution* ||
|| [[https://issues.apache.org/jira/browse/PIG-1627|PIG-1627]] || Schema || 
Flattening a bag with an unknown schema should produce a record with an unknown 
schema ||
|| [[https://issues.apache.org/jira/browse/PIG-1584|PIG-1584]] || Grammar || 
Cogroup inner does not match the semantics of inner join.  It is also not clear 
what value the inner keyword has for cogroup. ||
|| [[https://issues.apache.org/jira/browse/PIG-1538|PIG-1538]] || Two level 
access || Remove two level access ||
|| [[https://issues.apache.org/jira/browse/PIG-1536|PIG-1536]] || Schema || Pig 
one semantic for schema merges and use it consistently throughout Pig ||