Re: Metadata DataFileValue not Matching the Output of rfile-info Command

2018-02-13 Thread Michael Wall
Yes, compact the table and then count the entries.  You can get close by
looking at the monitor page.  The tables list has a column called Entries
which should be close to counting up those entries in the metadata by hand.


On Tue, Feb 13, 2018 at 2:19 PM Dong Zhou  wrote:

> I see. Yes, the file is loaded via bulk import.
> I would like to find out the most precise number of entries a table
> contains, would running a compaction, and then scanning metadata table for
> the entry number be sufficient method?
> Also, what would happen is merge operation runs before the compaction?
> Would it try to merge this tablet into other tablets since the file size
> and entry number look fair small at the time it scans the metadata table?
> Or, it would compact the table before running the merge.
>
> By the way, thanks for the quick reply. :)
>
> Cheers,
> -Dong
>
>
>
> On Tue, Feb 13, 2018 at 11:05 AM Michael Wall  wrote:
>
>> Hi Dong,
>>
>> That file is the result of a bulk import.  I can tell because it starts
>> with a capital "I", see
>> http://accumulo.apache.org/1.8/accumulo_user_manual.html#_file_naming_conventions.
>> Bulk files are inspected on import to find all the ranges of data they
>> contain.  They are then assigned to all the tablets hosting that data.  So
>> one "I" file can belong to more than one tablet.  When that file is
>> included in a compaction, the data that is not part of the range the tablet
>> is hosting is not rewritten to the new files.
>>
>> When inspecting "I" files, Accumulo does not keep track of how many keys
>> are in each range.  So for "I" files in the metadata table, the number of
>> keys is 0 until that file is compacted.
>>
>> HTH
>>
>> Mike
>>
>>
>>
>> On Tue, Feb 13, 2018 at 1:37 PM Dong Zhou  wrote:
>>
>>> Hi all,
>>>
>>> We have noticed that the Accumulo metadata entry reports certain RFile
>>> has file size but no entry number.
>>> For example, ;
>>> file:hdfs://apps/accumulo/tables///I001ahdz.rf []   48,0
>>>
>>> From Metadata's perspective, it looks like this the RFile contains zero
>>> entries, but if we run an RFILE-INFO command against the same file, the
>>> outcome shows that the RFile has a bunch of entries. If we dump the RFile,
>>> we can see that it spills out the actual data too.
>>>
>>> We wonder what is the reason behind it.
>>>
>>> Thanks,
>>> -Dong Zhou
>>>
>>


Re: Metadata DataFileValue not Matching the Output of rfile-info Command

2018-02-13 Thread Dong Zhou
I see. Yes, the file is loaded via bulk import.
I would like to find out the most precise number of entries a table
contains, would running a compaction, and then scanning metadata table for
the entry number be sufficient method?
Also, what would happen is merge operation runs before the compaction?
Would it try to merge this tablet into other tablets since the file size
and entry number look fair small at the time it scans the metadata table?
Or, it would compact the table before running the merge.

By the way, thanks for the quick reply. :)

Cheers,
-Dong



On Tue, Feb 13, 2018 at 11:05 AM Michael Wall  wrote:

> Hi Dong,
>
> That file is the result of a bulk import.  I can tell because it starts
> with a capital "I", see
> http://accumulo.apache.org/1.8/accumulo_user_manual.html#_file_naming_conventions.
> Bulk files are inspected on import to find all the ranges of data they
> contain.  They are then assigned to all the tablets hosting that data.  So
> one "I" file can belong to more than one tablet.  When that file is
> included in a compaction, the data that is not part of the range the tablet
> is hosting is not rewritten to the new files.
>
> When inspecting "I" files, Accumulo does not keep track of how many keys
> are in each range.  So for "I" files in the metadata table, the number of
> keys is 0 until that file is compacted.
>
> HTH
>
> Mike
>
>
>
> On Tue, Feb 13, 2018 at 1:37 PM Dong Zhou  wrote:
>
>> Hi all,
>>
>> We have noticed that the Accumulo metadata entry reports certain RFile
>> has file size but no entry number.
>> For example, ;
>> file:hdfs://apps/accumulo/tables///I001ahdz.rf []   48,0
>>
>> From Metadata's perspective, it looks like this the RFile contains zero
>> entries, but if we run an RFILE-INFO command against the same file, the
>> outcome shows that the RFile has a bunch of entries. If we dump the RFile,
>> we can see that it spills out the actual data too.
>>
>> We wonder what is the reason behind it.
>>
>> Thanks,
>> -Dong Zhou
>>
>


Re: Metadata DataFileValue not Matching the Output of rfile-info Command

2018-02-13 Thread Michael Wall
Hi Dong,

That file is the result of a bulk import.  I can tell because it starts
with a capital "I", see
http://accumulo.apache.org/1.8/accumulo_user_manual.html#_file_naming_conventions.
Bulk files are inspected on import to find all the ranges of data they
contain.  They are then assigned to all the tablets hosting that data.  So
one "I" file can belong to more than one tablet.  When that file is
included in a compaction, the data that is not part of the range the tablet
is hosting is not rewritten to the new files.

When inspecting "I" files, Accumulo does not keep track of how many keys
are in each range.  So for "I" files in the metadata table, the number of
keys is 0 until that file is compacted.

HTH

Mike



On Tue, Feb 13, 2018 at 1:37 PM Dong Zhou  wrote:

> Hi all,
>
> We have noticed that the Accumulo metadata entry reports certain RFile has
> file size but no entry number.
> For example, ;
> file:hdfs://apps/accumulo/tables///I001ahdz.rf []   48,0
>
> From Metadata's perspective, it looks like this the RFile contains zero
> entries, but if we run an RFILE-INFO command against the same file, the
> outcome shows that the RFile has a bunch of entries. If we dump the RFile,
> we can see that it spills out the actual data too.
>
> We wonder what is the reason behind it.
>
> Thanks,
> -Dong Zhou
>