The twoLevelAccessRequired flag is not quite a long term solution to the 
problem. The problem is that we treat output of relations to be bags but their 
schemas do NOT have twoLevelAccessRequired to be true. Only bag constants and 
bags from input data have this flag set to true. We need to move to either 
*all* bag schemas having a tuple schema with the real schema which reflects the 
layout of the bag or think of an alternative. Implementing the solution may 
have many more details which will need to be looked at. This flag should be 
removed and should not be needed once we arrive at a solution. Otherwise 
Resource Schema would also need to have this notion of two level access for bag 
fields.

Pradeep.

-----Original Message-----
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] 
Sent: Tuesday, November 03, 2009 12:30 PM
To: pig-dev@hadoop.apache.org
Subject: Re: two-level access problem?

Thanks Pradeep,
I saw that comment. I guess my question is, given the solution this
comment describes, what are you referring to in the Load/Store
redesign doc when you say "we must fix the two level access issues
with schema of bags in current schema before we make these changes,
otherwise that same contagion will afflict us here?"

-D

On Tue, Nov 3, 2009 at 2:10 PM, Pradeep Kamath <prade...@yahoo-inc.com> wrote:
> From comments in Schema.java:
>    // In bags which have a schema with a tuple which contains
>    // the fields present in it, if we access the second field (say)
>    // we are actually trying to access the second field in the
>    // tuple in the bag. This is currently true for two cases:
>    // 1) bag constants - the schema of bag constant has a tuple
>    // which internally has the actual elements
>    // 2) When bags are loaded from input data, if the user
>    // specifies a schema with the "bag" type, he has to specify
>    // the bag as containing a tuple with the actual elements in
>    // the schema declaration. However in both the cases above,
>    // the user can still say b.i where b is the bag and i is
>    // an element in the bag's tuple schema. So in these cases,
>    // the access should translate to a lookup for "i" in the
>    // tuple schema present in the bag. To indicate this, the
>    // flag below is used. It is false by default because,
>    // currently we use bag as the type for relations. However
>    // the schema of a relation does NOT have a tuple fieldschema
>    // with items in it. Instead, the schema directly has the
>    // field schema of the items. So for a relation "b", the
>    // above b.i access would be a direct single level access
>    // of i in b's schema. This is treated as the "default" case
>    private boolean twoLevelAccessRequired = false;
>
> -----Original Message-----
> From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
> Sent: Monday, November 02, 2009 5:33 PM
> To: pig-dev@hadoop.apache.org
> Subject: two-level access problem?
>
> Could someone explain the nature of the "two-level access problem"
> referred to in the Load/Store redesign wiki and in the DataType code?
>
>
> Thanks,
> -D
>

Reply via email to