On Thu, Apr 11, 2013 at 7:15 AM, Jeff Kubina <[email protected]> wrote: > Is there a method in the accumulo api to get the total bytes used and/or > total key/value pairs for each tablet? I believe I can get the total bytes > used per tablet using HDFS file size calls on the tables directory, but what > about the total key/value pairs for each tablet? >
Jeff, You can scan the metadata table to get this info. A few pointers : * call Connector.tableOperations().tableIdMap() to convert your table name to table id * do "new org.apache.accumulo.core.data.KeyExtent(Text, Text, Text)" to create a KeyExtent that represents the tablet you are interested in. * call KeyExtent.toMetaDataRange() to get a range to scan the metadata table * add the column org.apache.accumulo.core.Constants.METADATA_DATAFILE_COLUMN_FAMILY to the metadata table scanner * take the value from this scan and create a org.apache.accumulo.core.util.MetadataTable.DataFileValue object, that will have info you need This file data in the metadata table may be an estimate or not present. In the case of a split, the children of the split have estimated file sizes. The sum of the childrens info is correct until one of them compacts. For bulk imported files, there is no info about file size or #entries. After a tablet is compacted, all of this info will be correct. You could call Connector.tableOperations.compact() passing in a range that will compact just the tablet you want stats about. Keith
