[jira] Updated: (XALANJ-984) Performance Enhancement to Xalan distinct function

Henry Zongaro (JIRA) Fri, 17 Dec 2004 10:53:30 -0800

     [ http://nagoya.apache.org/jira/browse/XALANJ-984?page=history ]


Henry Zongaro updated XALANJ-984:
---------------------------------

      Assign To:     (was: Xalan Developers Mailing List)
           type: Improvement  (was: Bug)
    Description: 
In Extensions.java, the distinct function uses the Hashtable object to track 
unique nodes.  The Hashtable object synchronizes all access to instances of 
itself.  In Xalan 2.3.1, the current code is as follows:

  public static NodeSet distinct(ExpressionContext myContext, NodeIterator ni)
          throws javax.xml.transform.TransformerException
  {

    // Set up our resulting NodeSet and the hashtable we use to keep track of 
duplicate
    // strings.

    NodeSet dist = new NodeSet();
    dist.setShouldCacheNodes(true);

    Hashtable stringTable = new Hashtable();

    Node currNode = ni.nextNode();

    while (currNode != null)
    {
      String key = myContext.toString(currNode);

      if (!stringTable.containsKey(key))
      {
        stringTable.put(key, currNode);
        dist.addElement(currNode);
      }
      currNode = ni.nextNode();
    }

    return dist;
  }

Since the Hashtable instance is used locally within the method, there really is 
not need to use an object that synchronizes access to its instance.  To improve 
performance, a HashSet should be used.  Furthermore, it is a good idea to 
manually clear the HashSet at the end of the method to ensure the HashSet 
instance is garbage collected.  The enhanced code is as follows:

  public static NodeSet distinct(ExpressionContext myContext, NodeIterator ni)
          throws javax.xml.transform.TransformerException
  {

    // Set up our resulting NodeSet and the hashtable we use to keep track of 
duplicate
    // strings.

    NodeSet dist = new NodeSet();
    dist.setShouldCacheNodes(true);

    HashSet stringSet = new HashSet();

    Node currNode = ni.nextNode();

    while (currNode != null)
    {
      String key = myContext.toString(currNode);

      if (stringSet.add(key))
      {
        dist.addElement(currNode);
      }
      currNode = ni.nextNode();
    }

    stringSet.clear();

    return dist;
  }


If you want to "completely" ensure the HashSet is garbage collected (due a 
TransformerException being thrown), the following enhanced code could be used 
instead of the above enhanced code:

  public static NodeSet distinct(ExpressionContext myContext, NodeIterator ni)
          throws javax.xml.transform.TransformerException
  {

    // Set up our resulting NodeSet and the hashtable we use to keep track of 
duplicate
    // strings.

    NodeSet dist = new NodeSet();
    dist.setShouldCacheNodes(true);

    HashSet stringSet = new HashSet();

    try
    {
      Node currNode = ni.nextNode();

      while (currNode != null)
      {
        String key = myContext.toString(currNode);

        if (stringSet.add(key))
        {
          dist.addElement(currNode);
        }
        currNode = ni.nextNode();
      }
    }
    finally
    {
      stringSet.clear();
    }

    return dist;
  }

  was:
In Extensions.java, the distinct function uses the Hashtable object to track 
unique nodes.  The Hashtable object synchronizes all access to instances of 
itself.  In Xalan 2.3.1, the current code is as follows:

  public static NodeSet distinct(ExpressionContext myContext, NodeIterator ni)
          throws javax.xml.transform.TransformerException
  {

    // Set up our resulting NodeSet and the hashtable we use to keep track of 
duplicate
    // strings.

    NodeSet dist = new NodeSet();
    dist.setShouldCacheNodes(true);

    Hashtable stringTable = new Hashtable();

    Node currNode = ni.nextNode();

    while (currNode != null)
    {
      String key = myContext.toString(currNode);

      if (!stringTable.containsKey(key))
      {
        stringTable.put(key, currNode);
        dist.addElement(currNode);
      }
      currNode = ni.nextNode();
    }

    return dist;
  }

Since the Hashtable instance is used locally within the method, there really is 
not need to use an object that synchronizes access to its instance.  To improve 
performance, a HashSet should be used.  Furthermore, it is a good idea to 
manually clear the HashSet at the end of the method to ensure the HashSet 
instance is garbage collected.  The enhanced code is as follows:

  public static NodeSet distinct(ExpressionContext myContext, NodeIterator ni)
          throws javax.xml.transform.TransformerException
  {

    // Set up our resulting NodeSet and the hashtable we use to keep track of 
duplicate
    // strings.

    NodeSet dist = new NodeSet();
    dist.setShouldCacheNodes(true);

    HashSet stringSet = new HashSet();

    Node currNode = ni.nextNode();

    while (currNode != null)
    {
      String key = myContext.toString(currNode);

      if (stringSet.add(key))
      {
        dist.addElement(currNode);
      }
      currNode = ni.nextNode();
    }

    stringSet.clear();

    return dist;
  }


If you want to "completely" ensure the HashSet is garbage collected (due a 
TransformerException being thrown), the following enhanced code could be used 
instead of the above enhanced code:

  public static NodeSet distinct(ExpressionContext myContext, NodeIterator ni)
          throws javax.xml.transform.TransformerException
  {

    // Set up our resulting NodeSet and the hashtable we use to keep track of 
duplicate
    // strings.

    NodeSet dist = new NodeSet();
    dist.setShouldCacheNodes(true);

    HashSet stringSet = new HashSet();

    try
    {
      Node currNode = ni.nextNode();

      while (currNode != null)
      {
        String key = myContext.toString(currNode);

        if (stringSet.add(key))
        {
          dist.addElement(currNode);
        }
        currNode = ni.nextNode();
      }
    }
    finally
    {
      stringSet.clear();
    }

    return dist;
  }

    Environment: 
Operating System: All
Platform: All

  was:
Operating System: All
Platform: All

       Priority: Major
    Bugzilla Id:   (was: 8612)

> Performance Enhancement to Xalan distinct function
> --------------------------------------------------
>
>          Key: XALANJ-984
>          URL: http://nagoya.apache.org/jira/browse/XALANJ-984
>      Project: XalanJ2
>         Type: Improvement
>   Components: XPath-function
>     Versions: 2.3
>  Environment: Operating System: All
> Platform: All
>     Reporter: Larry Becker

>
> In Extensions.java, the distinct function uses the Hashtable object to track 
> unique nodes.  The Hashtable object synchronizes all access to instances of 
> itself.  In Xalan 2.3.1, the current code is as follows:
>   public static NodeSet distinct(ExpressionContext myContext, NodeIterator ni)
>           throws javax.xml.transform.TransformerException
>   {
>     // Set up our resulting NodeSet and the hashtable we use to keep track of 
> duplicate
>     // strings.
>     NodeSet dist = new NodeSet();
>     dist.setShouldCacheNodes(true);
>     Hashtable stringTable = new Hashtable();
>     Node currNode = ni.nextNode();
>     while (currNode != null)
>     {
>       String key = myContext.toString(currNode);
>       if (!stringTable.containsKey(key))
>       {
>         stringTable.put(key, currNode);
>         dist.addElement(currNode);
>       }
>       currNode = ni.nextNode();
>     }
>     return dist;
>   }
> Since the Hashtable instance is used locally within the method, there really 
> is 
> not need to use an object that synchronizes access to its instance.  To 
> improve 
> performance, a HashSet should be used.  Furthermore, it is a good idea to 
> manually clear the HashSet at the end of the method to ensure the HashSet 
> instance is garbage collected.  The enhanced code is as follows:
>   public static NodeSet distinct(ExpressionContext myContext, NodeIterator ni)
>           throws javax.xml.transform.TransformerException
>   {
>     // Set up our resulting NodeSet and the hashtable we use to keep track of 
> duplicate
>     // strings.
>     NodeSet dist = new NodeSet();
>     dist.setShouldCacheNodes(true);
>     HashSet stringSet = new HashSet();
>     Node currNode = ni.nextNode();
>     while (currNode != null)
>     {
>       String key = myContext.toString(currNode);
>       if (stringSet.add(key))
>       {
>         dist.addElement(currNode);
>       }
>       currNode = ni.nextNode();
>     }
>     stringSet.clear();
>     return dist;
>   }
> If you want to "completely" ensure the HashSet is garbage collected (due a 
> TransformerException being thrown), the following enhanced code could be used 
> instead of the above enhanced code:
>   public static NodeSet distinct(ExpressionContext myContext, NodeIterator ni)
>           throws javax.xml.transform.TransformerException
>   {
>     // Set up our resulting NodeSet and the hashtable we use to keep track of 
> duplicate
>     // strings.
>     NodeSet dist = new NodeSet();
>     dist.setShouldCacheNodes(true);
>     HashSet stringSet = new HashSet();
>     try
>     {
>       Node currNode = ni.nextNode();
>       while (currNode != null)
>       {
>         String key = myContext.toString(currNode);
>         if (stringSet.add(key))
>         {
>           dist.addElement(currNode);
>         }
>         currNode = ni.nextNode();
>       }
>     }
>     finally
>     {
>       stringSet.clear();
>     }
>     return dist;
>   }

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://nagoya.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (XALANJ-984) Performance Enhancement to Xalan distinct function

Reply via email to