Hi JM,

Many thanks :)

I'm just curious to know what makes it lag.

Now it's much more clear to me.

Best Regards.




Jack Chan. 

From: Jean-Marc Spaggiari
Date: 2013-10-19 04:46
To: [email protected]; cdj0579
Subject: Re: why HTableDescriptor.getFamiliesKeys is so lag?
Hi Jack,


From the code...


// method 1 will call
  /** 
   * Returns an array all the {@link HColumnDescriptor} of the column families 
   * of the table.
   *  
   * @return Array of all the HColumnDescriptors of the current table 
   * 
   * @see #getFamilies()
   */
  public HColumnDescriptor[] getColumnFamilies() {
    return getFamilies().toArray(new HColumnDescriptor[0]);
  }


Where getFamilies is return 
Collections.unmodifiableCollection(this.families.values());


// method 2 will call
  /**
   * Returns all the column family names of the current table. The map of 
   * HTableDescriptor contains mapping of family name to HColumnDescriptors. 
   * This returns all the keys of the family map which represents the column 
   * family names of the table. 
   * 
   * @return Immutable sorted set of the keys of the families.
   */
  public Set<byte[]> getFamiliesKeys() {
    return Collections.unmodifiableSet(this.families.keySet());
  }


// method 3 will call
  /**
   * Returns an unmodifiable collection of all the {@link HColumnDescriptor} 
   * of all the column families of the table.
   *  
   * @return Immutable collection of {@link HColumnDescriptor} of all the
   * column families. 
   */
  public Collection<HColumnDescriptor> getFamilies() {
    return Collections.unmodifiableCollection(this.families.values());
  }




So method 1 and 3 are almost the same thing. 1 is a wrapper around 3.


So let's see the difference betwee, 2 and 3. They both do almost the samething, 
but one arround keySet() and the otherone around values(). Both of them are 
calling those mehods on families which is a TreeMap. So sound like 
TreeMap.values() is faster than TreeMap.keySet();


Looking into the TreeMap code (and we are no more into HBase here):
    public Collection<V> values() {
        Collection<V> vs = values;
        return (vs != null) ? vs : (values = new Values());
    }




values() will just return the internal values object if it exist (which is most 
probably the case), while keySet() will do almost the same thing but has to 
call another method too:


    /**
     * Returns a {@link Set} view of the keys contained in this map.
     * The set's iterator returns the keys in ascending order.
     * The set is backed by the map, so changes to the map are
     * reflected in the set, and vice-versa.  If the map is modified
     * while an iteration over the set is in progress (except through
     * the iterator's own <tt>remove</tt> operation), the results of
     * the iteration are undefined.  The set supports element removal,
     * which removes the corresponding mapping from the map, via the
     * <tt>Iterator.remove</tt>, <tt>Set.remove</tt>,
     * <tt>removeAll</tt>, <tt>retainAll</tt>, and <tt>clear</tt>
     * operations.  It does not support the <tt>add</tt> or <tt>addAll</tt>
     * operations.
     */
    public Set<K> keySet() {
        return navigableKeySet();
    }


    /**
     * @since 1.6
     */
    public NavigableSet<K> navigableKeySet() {
        KeySet<K> nks = navigableKeySet;
        return (nks != null) ? nks : (navigableKeySet = new KeySet(this));
    }




So now, 2 options.


1) If you can run each of your method twice, most probably the 2nd time they 
will all be as fast.
2) the navigableKeySet() call from keySet costs 100ms, which will really 
surprise me since I guess the compiler will optimize that.


Last, I'm not sure why those 100ms are important for you, but if they are 
because you need to call this method multiple times, then just cache the result 
on the client side.


HTH.


JM


Le jeudi 17 octobre 2013, Jack Chan a écrit :

Hi all~
    I need to get all column families from specified table,When I look into the 
class "org.apache.hadoop.hbase.HTableDescriptor",I found that
there are more than three methods can be used.
    See the code below,there are method1,method2,method3 to do the same thing:

/*___________code begin___________*/

HTable table = new HTable(config, "mytable");
HTableDescriptor htd = table.getTableDescriptor();
//method 1
TimeCounter tc = new TimeCounter().run();
HColumnDescriptor[] cfs = htd.getColumnFamilies();
for(int i=0;i< cfs.length;i++){
    System.out.println("column family:"+new String(cfs[i].getName()));
}
System.out.println("time with 
getColumnFamilies-->"+tc.stop().getMicroSeconds());

//method2
TimeCounter tc2 = new TimeCounter().run();
Set<byte[]> family_keys = htd.getFamiliesKeys();
for(byte[] _f :family_keys){
    System.out.println("column family:"+new String(_f));
}
System.out.println("time with getFamiliesKeys-->"+tc2.stop().getMicroSeconds());

//method3
TimeCounter tc3 = new TimeCounter().run();
Collection<HColumnDescriptor> family_co = htd.getFamilies();
for(HColumnDescriptor family_co_entry :family_co){
    System.out.println("column family:"+new String(family_co_entry.getName()));
}
System.out.println("time with getFamilies-->"+tc3.stop().getMicroSeconds());

/*___________________code end_____________________*/

I found that the efficience of method 1 and method 3 are the same,about 120 us.
but the method2 is lagging,about 500us.

I just need to retieve the column families' names.So method2 is just meet my 
need.
but why is it so lag?

Thanks.



Jack Chan.
A new Apache-Camel rider.
sina-weibo:@for-each

Reply via email to