[ 
https://issues.apache.org/jira/browse/PIG-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845521#action_12845521
 ] 

Dmitriy V. Ryaboy commented on PIG-1285:
----------------------------------------

Thanks for the feedback.

Looking at the code, writeFields() and readFields() are actually implemented in 
DefaultAbstractBag, and have no dependencies on the memory manager. Is there a 
good reason to not allow deserialization of SingleTupleBags?  Seems to me that 
we can simply change SingleTupleBag to extend DefaultAbstractBag and get rid of 
writeFields and readFields methods, allowing the defaults to take care of 
(de)serialization. Everything else would remain as-is, since currently 
SingleTupleBag implements the complete interface and therefore will override 
anything memory-related DefaultAbstractBag does.

What do you think?

> Allow SingleTupleBag to be serialized
> -------------------------------------
>
>                 Key: PIG-1285
>                 URL: https://issues.apache.org/jira/browse/PIG-1285
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.7.0
>
>         Attachments: PIG-1285.patch
>
>
> Currently, Pig uses a SingleTupleBag for efficiency when a full-blown 
> spillable bag implementation is not needed in the Combiner optimization.
> Unfortunately this can create problems. The below Initial.exec() code fails 
> at run-time with the message that a SingleTupleBag cannot be serialized:
> {code}
> @Override
> public Tuple exec(Tuple in) throws IOException {
>       // single record. just copy.
>       if (in == null) return null;   
>       try {
>          Tuple resTuple = tupleFactory_.newTuple(in.size());
>          for (int i=0; i< in.size(); i++) {
>            resTuple.set(i, in.get(i));
>         }
>         return resTuple;
>        } catch (IOException e) {
>          log.warn(e);
>          return null;
>       }
>     }
> {code}
> The code below can fix the problem in the UDF, but it seems like something 
> that should be handled transparently, not requiring UDF authors to know about 
> SingleTupleBags.
> {code}
> @Override
> public Tuple exec(Tuple in) throws IOException {
>       // single record. just copy.
>       if (in == null) return null;   
>       
>       /*
>        * Unfortunately SingleTupleBags are not serializable. We cache whether 
> a given index contains a bag
>        * in the map below, and copy all bags into DefaultBags before 
> returning to avoid serialization exceptions.
>        */
>       Map<Integer, Boolean> isBagAtIndex = Maps.newHashMap();
>       
>       try {
>         Tuple resTuple = tupleFactory_.newTuple(in.size());
>         for (int i=0; i< in.size(); i++) {
>           Object obj = in.get(i);
>           if (!isBagAtIndex.containsKey(i)) {
>             isBagAtIndex.put(i, obj instanceof SingleTupleBag);
>           }
>           if (isBagAtIndex.get(i)) {
>             DataBag newBag = bagFactory_.newDefaultBag();
>             newBag.addAll((DataBag)obj);
>             obj = newBag;
>           }
>           resTuple.set(i, obj);
>         }
>         return resTuple;
>       } catch (IOException e) {
>         log.warn(e);
>         return null;
>       }
>     }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to