Ok, all sounds good.

On Jan 7, 2008, at 8:30 AM, Alan Gates (JIRA) wrote:


[ https://issues.apache.org/jira/browse/PIG-30? page=com.atlassian.jira.plugin.system.issuetabpanels:comment- tabpanel&focusedCommentId=12556616#action_12556616 ]

Alan Gates commented on PIG-30:
-------------------------------

Responses to Utkarsh's comments:

0. TreeSet.add() only adds an element if it is not already present (see http://java.sun.com/j2se/1.5.0/docs/api/java/util/ TreeSet.html#add(E)). This guarantees that the element already in the tree will not be obliterated. That's why if that call returns false, the code goes back and rereads from the file it read the last element from. This guarantees that we read from that file until either the file is empty or we find a new unique element to put in the TreeSet.

1.  Good catch, I'll add a hashcode() implementation for DataBag.

2. They aren't quite as combinable as they first appear. The code in next() is identical, and could be combined. DistinctDataBag.readFromTree() and SortedDataBag.readFromPriorityQ () create different containers and access them differently. I could put just the create and access methods in each and combine the rest of the logic. The addToQueue() functions in each are different and have different logic about how to add an element to the queue. I can work on this, but it may be a bit before I get to it.

Get rid of DataBag and always use BigDataBag
--------------------------------------------

                Key: PIG-30
                URL: https://issues.apache.org/jira/browse/PIG-30
            Project: Pig
         Issue Type: Bug
         Components: data
           Reporter: Benjamin Reed
           Assignee: Alan Gates
        Attachments: bagrewrite.patch


We should never use DataBag directly; instead, we should always use BigDataBag. I think we already do this. The problem is that the logic in BigDataBag is hard to follow and it is made more complicated because it subclasses DataBag. We should merge these two classes together.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Reply via email to