Hi Keren, Hope this is too late.
>> I am wondering why is LogicalFieldShema containing a LogicalSchema member? That's for nested tuple fields. For example, consider "( i:int, t:tuple(j:int) )". The field t:tuple needs to contain a list of field schemas, so you need a LogicalSchema. Here is how you can verify it. 1) Debug Pig main in eclipse. 2) Set a breakpoint in the LogicalFieldSchema constructor. 3) Run "a = load '/dev/null' as (i:int, t:tuple(j:int));" on grunt. Thanks, Cheolsoo On Thu, Aug 8, 2013 at 2:42 PM, Keren Ouaknine <[email protected]> wrote: > Hi, > > A schema in Pig (LogicalSchema.java) is defined as an array list of > LogicalFieldSchema whose class members are: > - String alias > - byte type > - long uid > - LogicalSchema schema > > I am wondering why is LogicalFieldShema containing a LogicalSchema member? > My guess so far is that perhaps there's a subschema used by some operators? > I tried to figure out which operators might be using it and categorized the > main ones as follow: > > ==> SCHEMA IS DEFINED BY INPUT SCHEMA ONLY > LOAD > DISTINCT > FILTER > ORDER BY > SPLIT > > ==> SCHEMA IS DEFINED BY THE LIST OF "AS" IN THE FOREACH STATEMENT > FOREACH > > ==> IF SCHEMA CAN BE DEFINED (SAME LENGTH AND CASTABLE) OR UNKNOWN SCHEMA > UNION > > ==> SCHEMA IS DEFINED BY THE CONCATENATION OF THE TWO INPUT SCHEMAS (+ > ADDING THE ALIAS TO THE FIELD NAME x ==> A::x) > JOIN > *Are the two inputs here considered subschemas?* > > ==> SCHEMA: (key_to_order_by, bag) > GROUP > > Thanks, > Keren > > -- > Keren Ouaknine > Web: www.kereno.com >
