Hi Prashant:

I read about the map data type in the book "Programming Pig", it says:
"... By default there is no requirement that all values in a map must be of
the same type. It is legitimate to have a map with two keys name and age,
where the value for name is a chararray and the value for age is an int.
Beginning in Pig 0.9, a map can declare its values to all be of the same
type... "

I agree that all values in the map can be of the same type but this is not
required in pig.

Best Regards,

Jerry


On Thu, Apr 18, 2013 at 10:37 AM, Jerry Lam <[email protected]> wrote:

> Hi Rusian:
>
> I used PigStorage to store the data that is originally using Pig data
> type. It is strange (or a bug in Pig) that I cannot read the data using
> PigStorage that have been stored using PigStorage, isn't it?
>
> Best Regards,
>
> Jerry
>
>
>
> On Wed, Apr 17, 2013 at 10:52 PM, Ruslan Al-Fakikh 
> <[email protected]>wrote:
>
>> The output:
>> ({ ([c#11,d#22]),([c#33,d#44]) })
>> ()
>> looks weird.
>>
>> Jerry, maybe the problem is in using PigStorage. As its javadoc says:
>>
>> A load function that parses a line of input into fields using a character
>> delimiter
>>
>> So I guess this is just for simple csv lines.
>> But you are trying to load a complicated Map structure as it was formatted
>> by previous storing.
>> Probably you'll need to write your own Loader for this. Another hint:
>> using
>> the -schema paramenter to PigStorage, but I am not sure it can help:(
>>
>> Ruslan
>>
>>
>> On Wed, Apr 17, 2013 at 11:48 PM, Jerry Lam <[email protected]> wrote:
>>
>> > Hi Rusian:
>> >
>> > I did a describe B followed by a dump B, the output is:
>> > B: {b: {()}}
>> >
>> > ({ ([c#11,d#22]),([c#33,d#44]) })
>> > ()
>> >
>> > but when I executed
>> >
>> > C = foreach B generate flatten(b);
>> >
>> > dump C;
>> >
>> > I got the exception again...
>> >
>> > 2013-04-17 15:47:39,933 [Thread-26] WARN
>> >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
>> > java.lang.Exception: java.lang.ClassCastException:
>> > org.apache.pig.data.DataByteArray cannot be cast to
>> > org.apache.pig.data.DataBag
>> > at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
>> > Caused by: java.lang.ClassCastException:
>> org.apache.pig.data.DataByteArray
>> > cannot be cast to org.apache.pig.data.DataBag
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>> > at
>> >
>> >
>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
>> > at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> > at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>> > at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>> > at java.lang.Thread.run(Thread.java:680)
>> >
>> >
>> > Best Regards,
>> >
>> > Jerry
>> >
>> >
>> > On Wed, Apr 17, 2013 at 3:26 PM, Ruslan Al-Fakikh <[email protected]
>> > >wrote:
>> >
>> > > I think that before doing the FLATTEN, you should be 100% sure that
>> your
>> > > cast worked properly. Can you first DESCRIBE B and then DUMP B right
>> > away?
>> > > Or probably it just can't be cast in this way. Honestly I don't know
>> > > exactly how it works, but here:
>> > > http://pig.apache.org/docs/r0.10.0/basic.html#cast
>> > > I see that casting from a map to a bag should produce an error.
>> > > Hope that helps.
>> > >
>> > >
>> > > On Wed, Apr 17, 2013 at 9:38 PM, Jerry Lam <[email protected]>
>> wrote:
>> > >
>> > > > Hi Rusian:
>> > > >
>> > > > Thanks for your help. I really appreciate it. It really puzzled me.
>> > > >
>> > > > I did a "describe B", the output is "B: {b: bytearray}".
>> > > >
>> > > > I then tried to cast it as suggested, I got:
>> > > > B = foreach A generate document#'b' as b:{};
>> > > > describe B;
>> > > > B: {b: {()}}
>> > > >
>> > > > Then I proceed with:
>> > > > C = foreach B generate flatten(b);
>> > > >
>> > > > I got:
>> > > > 2013-04-17 13:38:04,601 [Thread-16] WARN
>> > > >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
>> > > > java.lang.Exception: java.lang.ClassCastException:
>> > > > org.apache.pig.data.DataByteArray cannot be cast to
>> > > > org.apache.pig.data.DataBag
>> > > > at
>> > >
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
>> > > > Caused by: java.lang.ClassCastException:
>> > > org.apache.pig.data.DataByteArray
>> > > > cannot be cast to org.apache.pig.data.DataBag
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>> > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
>> > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
>> > > > at
>> > >
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> > > > at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> > > > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>> > > > at java.lang.Thread.run(Thread.java:680)
>> > > >
>> > > > Best Regards,
>> > > >
>> > > > Jerry
>> > > >
>> > > >
>> > > > On Wed, Apr 17, 2013 at 1:24 PM, Ruslan Al-Fakikh <
>> > [email protected]
>> > > > >wrote:
>> > > >
>> > > > > Hey, and as for converting a map of tuples, probably i got you
>> wrong.
>> > > If
>> > > > > you can get to every value manually withing FOREACH then I see no
>> > > problem
>> > > > > in doing so.
>> > > > >
>> > > > >
>> > > > > On Wed, Apr 17, 2013 at 9:22 PM, Ruslan Al-Fakikh <
>> > > [email protected]
>> > > > > >wrote:
>> > > > >
>> > > > > > I am not sure whether you can convert a map to a tuple.
>> > > > > > But I am curious about one thing:
>> > > > > > your are trying to use 'b' as a Bag, right? Because FLATTEN
>> needs
>> > it
>> > > to
>> > > > > be
>> > > > > > a Bag I guess:
>> > > > > > http://pig.apache.org/docs/r0.10.0/basic.html#flatten
>> > > > > > But it seems that Pig thinks that b is a byte array:
>> > > > > > java.lang.ClassCastException: org.apache.pig.data.DataByteArray
>> > > cannot
>> > > > be
>> > > > > > cast to org.apache.pig.data.DataBag
>> > > > > > Can you do this?:
>> > > > > > DESCRIBE B
>> > > > > >
>> > > > > > I suppose it can look like a Bag in the output of DUMP, but I
>> think
>> > > Pig
>> > > > > > doesn't know it is a Bag, maybe you'll need some kind of
>> explicit
>> > > cast?
>> > > > > >
>> > > > > >
>> > > > > > On Wed, Apr 17, 2013 at 9:11 PM, Jerry Lam <
>> [email protected]>
>> > > > wrote:
>> > > > > >
>> > > > > >> Hi Rusian,
>> > > > > >>
>> > > > > >> I tried to debug each step already but no luck.
>> > > > > >> I did a dump (dump B;) after B = foreach A generate
>> document#'b'
>> > as
>> > > b;
>> > > > > >> I got {([c#11,d#22]),([c#33,d#44])}
>> > > > > >> but it fails when I did C = foreach B generate flatten(b);
>> > > > > >>
>> > > > > >> I don't have controls over the input. It is passed as Map of
>> > Maps. I
>> > > > > guess
>> > > > > >> it makes lookup easier using a map with keys.
>> > > > > >>
>> > > > > >> Can I convert map to tuple?
>> > > > > >>
>> > > > > >> Best Regards,
>> > > > > >>
>> > > > > >> Jerry
>> > > > > >>
>> > > > > >>
>> > > > > >>
>> > > > > >> On Wed, Apr 17, 2013 at 11:57 AM, Ruslan Al-Fakikh <
>> > > > > [email protected]
>> > > > > >> >wrote:
>> > > > > >>
>> > > > > >> > Hi Jerry,
>> > > > > >> >
>> > > > > >> > I would recommend to debug the issue step by step. Just after
>> > this
>> > > > > line:
>> > > > > >> > A = load 'data.txt' as document:[];
>> > > > > >> > and then right after that:
>> > > > > >> > DESCRIBE A;
>> > > > > >> > DUMP A;
>> > > > > >> > and so on...
>> > > > > >> >
>> > > > > >> > To be honest I haven't used maps that much. Just curious, why
>> > did
>> > > > you
>> > > > > >> > choose to use them? You can also use regular tuples for
>> storing
>> > > the
>> > > > > >> > relations. Also you can store the tuples with a schema file.
>> > > > > >> >
>> > > > > >> > Ruslan
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > On Wed, Apr 17, 2013 at 5:28 AM, Jerry Lam <
>> > [email protected]>
>> > > > > >> wrote:
>> > > > > >> >
>> > > > > >> > > Hi pig users,
>> > > > > >> > >
>> > > > > >> > > I tried to load data using PigStorage that was previously
>> > stored
>> > > > > using
>> > > > > >> > > PigStorage but it failed.
>> > > > > >> > >
>> > > > > >> > > Each line looks like this in the data file that is
>> generated
>> > by
>> > > > > >> > PigStorage:
>> > > > > >> > > [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
>> > > > > >> > >
>> > > > > >> > > I did the following:
>> > > > > >> > > A = load 'data.txt' as document:[];
>> > > > > >> > > B = foreach A generate document#'b' as b;
>> > > > > >> > > C = foreach B generate flatten(b);
>> > > > > >> > > dump C;
>> > > > > >> > >
>> > > > > >> > > I expect to see the following output:
>> > > > > >> > > ([c#11,d#22])
>> > > > > >> > > ([c#33,d#44])
>> > > > > >> > >
>> > > > > >> > > Instead, I got:
>> > > > > >> > > java.lang.ClassCastException:
>> > org.apache.pig.data.DataByteArray
>> > > > > >> cannot be
>> > > > > >> > > cast to org.apache.pig.data.DataBag
>> > > > > >> > >
>> > > > > >> > > Anyone encounters this problem before? How can I read the
>> data
>> > > > back?
>> > > > > >> > >
>> > > > > >> > > Thanks,
>> > > > > >> > >
>> > > > > >> > > Jerry
>> > > > > >> > >
>> > > > > >> >
>> > > > > >>
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to