Re: [jira] Commented: (PIG-30) Get rid of DataBag and always use BigDataBag

Utkarsh Srivastava Mon, 07 Jan 2008 09:48:15 -0800

Ok, all sounds good.

On Jan 7, 2008, at 8:30 AM, Alan Gates (JIRA) wrote:

[ https://issues.apache.org/jira/browse/PIG-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556616#action_12556616 ]
Alan Gates commented on PIG-30:
-------------------------------

Responses to Utkarsh's comments:
0. TreeSet.add() only adds an element if it is not already present(see http://java.sun.com/j2se/1.5.0/docs/api/java/util/TreeSet.html#add(E)). This guarantees that the element already inthe tree will not be obliterated. That's why if that call returnsfalse, the code goes back and rereads from the file it read thelast element from. This guarantees that we read from that fileuntil either the file is empty or we find a new unique element toput in the TreeSet.
1.  Good catch, I'll add a hashcode() implementation for DataBag.
2. They aren't quite as combinable as they first appear. The codein next() is identical, and could be combined.DistinctDataBag.readFromTree() and SortedDataBag.readFromPriorityQ() create different containers and access them differently. Icould put just the create and access methods in each and combinethe rest of the logic. The addToQueue() functions in each aredifferent and have different logic about how to add an element tothe queue. I can work on this, but it may be a bit before I getto it.
Get rid of DataBag and always use BigDataBag
--------------------------------------------

                Key: PIG-30
                URL: https://issues.apache.org/jira/browse/PIG-30
            Project: Pig
         Issue Type: Bug
         Components: data
           Reporter: Benjamin Reed
           Assignee: Alan Gates
        Attachments: bagrewrite.patch
We should never use DataBag directly; instead, we should alwaysuse BigDataBag. I think we already do this. The problem is thatthe logic in BigDataBag is hard to follow and it is made morecomplicated because it subclasses DataBag. We should merge thesetwo classes together.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (PIG-30) Get rid of DataBag and always use BigDataBag

Reply via email to