Re: Efficient Tablet Merging [SEC=UNOFFICIAL]

Eric Newton Thu, 03 Oct 2013 20:28:57 -0700

Great details... but I need to sleep.  I'll dig in more tomorrow.  Sorry!



On Thu, Oct 3, 2013 at 11:20 PM, Dickson, Matt MR
<[email protected]> wrote:
> UNOFFICIAL
>
> Hi Eric,
> Our answers are in blue. Just a note that we do have the write ahead log
> disabled for ingest performance.
> We have a public holiday on Monday, so we may be delayed in our response.
>
> Cheers
> Matt
>
> ________________________________
> From: Eric Newton [mailto:[email protected]]
> Sent: Friday, 4 October 2013 11:20
>
> To: [email protected]
> Subject: Re: Efficient Tablet Merging [SEC=UNOFFICIAL]
>
> Any errors on those servers?  Each server should be checking periodically
> for compactions, some crazy errors might escape error handling, though that
> is rare these days.
> In the tserver debug log there is a repeating error of  "Internal error
> processing applyUpdates
> org.apache.accumulo.server.tabletserver.HoldTimeoutException: Commits are
> held"
>
> Also found in the tserver log:
> ERROR: Failed to find midpoint Filesystem closed
> WARN: Tablet .... has too many files, batch lookup cannont run
>
> Are you experiencing any table level errors?  Unable to read or write files?
> No table level errors or read errors
>
>
> How full is HDFS?
> 32%
>
> If you scan the !METADATA table, are you seeing any trend in the tablets
> that have problems?
> By getting the extent id of the tablets that are large and then finding the
> range of that tablet by using 'getsplits -v' I have scanned the !METADATA
> table and can see a massive number of *.rf files associated with the range.
> Is there anything particular I should look at.
>
> At this point, we're looking for logged anomalies, the earlier the better.
> Anything red or yellow on the monitor pages.
> I ran one of the scans that hang and then see the following:
>
> Several "WARN Exception sying java.lang.reflect.InvocationTargetException"
>
> Several "ERROR  Unexpected error writing to log, retrying attempt 1
>     InvocationTargetException
>     Caused by LeaseExpiredException: Lease mismatch on /accumulo/wal/...
> owned by DFSClient_NOMAPREDUCE_56390516_13 but is accessed by
> DFSClient_NOMAPREDUCE_1080760417_13"
>
> "ERROR TTransportException: javav.net.SocketTimeoutException: ... while
> waiting for channel to be ready for write. ...."
>
> Bunch of "WARN Tablet 234234234 has too many files..."
>
>
>
>
>
>
>
> On Thu, Oct 3, 2013 at 8:43 PM, Dickson, Matt MR
> <[email protected]> wrote:
>>
>> UNOFFICIAL
>>
>> We have restarted the tablet servers that contain tablets with high
>> volumes of files and did not see any majc's run.
>>
>> Some more details are:
>> On 3 of our nodes we have 10-15 times the number of entries that are on
>> the other nodes.  When I view the tablets for one of these nodes there are 2
>> tablets with almost 10 times the the number of entries as the others.
>>
>> When we query on the date rowid's the queries are now hanging and there
>> are several scans running on the 3 nodes that have higher entries and they
>> are not completing, can I cancel these?
>>
>> In the logs we are getting "tablet ..... has too many files, batch lookup
>> can not run"
>>
>> At this point I'm stuck for ideas, so any suggestions would be great.
>>
>> ________________________________
>> From: Eric Newton [mailto:[email protected]]
>> Sent: Thursday, 3 October 2013 23:52
>>
>> To: [email protected]
>> Subject: Re: Efficient Tablet Merging [SEC=UNOFFICIAL]
>>
>> You should have a major compaction running if your tablet has too many
>> files.  If you don't, something is wrong. It does take some time to re-write
>> 10G of data.
>>
>> If many merges occurred on a single tablet server, you may have these
>> many-file tablets on the same server, and there are not enough major
>> compaction threads to re-write those files right away.  If that's true, you
>> may wish to restart the tablet server in order to get the tablets pushed to
>> other idle servers.
>>
>> Again, if you don't have major compactions running, you will want to start
>> looking for other problems.
>>
>> -Eric
>>
>>
>>
>> On Thu, Oct 3, 2013 at 2:29 AM, Dickson, Matt MR
>> <[email protected]> wrote:
>>>
>>> UNOFFICIAL
>>>
>>> Hi Eric,
>>>
>>> We have gone with the second more conservative option. We changed our
>>> split threshold to 10GB and then we ran a merge over a week worth of tablets
>>> which has resulted in one tablet with a massive number of files. We then ran
>>> a query over that range and it is returning an message saying:
>>>
>>> Tablet has too many files (3n;20130914;20130907...) retrying...
>>>
>>> We assumed that when the merge was done that a major compaction would be
>>> started, which would notice that the tablet is too large, split it into 10GB
>>> tablets. We assumed that we would not have to manually start any compaction
>>> but instead it would be scheduled at some point after the merge finished.
>>>
>>> We have completed three separate merges of week long ranges and now have
>>> identified 3 tablet extents with too many files.
>>>
>>> Can you please explain what is supposed to happen? And whether after the
>>> merge, compact command for those ranges needs to be run (or will it do it
>>> automatically, as we have not seen any started)?
>>>
>>> Cheers
>>> Matt
>>>
>>> ________________________________
>>> From: Eric Newton [mailto:[email protected]]
>>> Sent: Thursday, 3 October 2013 13:28
>>> To: [email protected]
>>> Subject: Re: Efficient Tablet Merging [SEC=UNOFFICIAL]
>>>
>>> I'll use ASCII graphics to demonstrate the size of a tablet.
>>>
>>> Small: []
>>> Medium: [ ]
>>> Large: [  ]
>>>
>>> Think of it like this... if you are running age-off... you probably have
>>> lots of little buckets of rows at the beginning and larger buckets at the
>>> end:
>>>
>>> [][][][][][][][][]...[ ][ ][ ][ ][ ][  ][  ][    ][    ][    ][    ][
>>> ][    ]
>>>
>>> What you probably want is something like this:
>>>
>>> [               ][       ][       ][       ][       ][       ][       ][
>>> ]
>>>
>>> Some big bucket at the start, with old data, and some larger buckets for
>>> everything afterwards.  But... this would probably work:
>>>
>>> [       ][       ][       ][       ][       ][       ][       ][       ][
>>> ]
>>>
>>> Just a bunch of larger tablets throughout.
>>>
>>> So you need to set your merge size to "[      ]" (4G), and you can always
>>> keep creating smaller tablets for future rows with manual splits:
>>>
>>> [       ][       ][       ][       ][       ][       ][       ][       ][
>>> ][  ][  ][  ][  ][  ]
>>>
>>>
>>> So increase the split threshold to 4G, and merge on 4G, but continue to
>>> make manual splits for your current days, as necessary.  Merge them away
>>> later.
>>>
>>>
>>> -Eric
>>>
>>>
>>>
>>>
>>> On Wed, Oct 2, 2013 at 6:35 PM, Dickson, Matt MR
>>> <[email protected]> wrote:
>>>>
>>>> UNOFFICIAL
>>>>
>>>> Thanks Eric,
>>>>
>>>> If I do the merge with size of 4G does the split threshold need to be
>>>> increased to the 4G also?
>>>>
>>>> ________________________________
>>>> From: Eric Newton [mailto:[email protected]]
>>>> Sent: Wednesday, 2 October 2013 23:05
>>>> To: [email protected]
>>>> Subject: Re: Efficient Tablet Merging [SEC=UNOFFICIAL]
>>>>
>>>> The most efficient way is kind of scary.  If this is a production
>>>> system, I would not recommend it.
>>>>
>>>> First, find out the size of your 10x tablets.  Let's say it's 10G.  Set
>>>> your split threshold to 10G.  Then merge all old tablets.... all of them
>>>> into one tablet.  This will dump thousands of files into a single tablet,
>>>> but it will soon split out again into the nice 10G tablets you are looking
>>>> for.  The system will probably be unusable during this operation.
>>>>
>>>> The more conservative way is to specify the merge in single steps (the
>>>> master will only coordinate a single merge on a table at a time anyhow).
>>>> You can do it by range or by size... I would do it by size, especially if
>>>> you are aging off your old data.
>>>>
>>>> Compacting the data won't have any effect on the speed of the merge.
>>>>
>>>> -Eric
>>>>
>>>>
>>>>
>>>> On Tue, Oct 1, 2013 at 11:58 PM, Dickson, Matt MR
>>>> <[email protected]> wrote:
>>>>>
>>>>> UNOFFICIAL
>>>>>
>>>>> I have a table that we create splits of the form yyyymmdd-nnnn where
>>>>> nnnn ranges from 0000 to 0840.  The bulk of our data is loaded for the
>>>>> current date with no data loaded for days older than 3 days so from my
>>>>> understanding it would be wise to merge splits older than 3 days in order 
>>>>> to
>>>>> reduce the overall tablet count.  It would still be optimal to maintain 
>>>>> some
>>>>> distribution of tablets for a day across the cluster so I'm looking at
>>>>> merging splits in 10 increments eg, merge -b 20130901-0000 -e 
>>>>> 20130901-0009,
>>>>> therefore reducing 840 splits per day to 84.
>>>>>
>>>>> Currently we have 120K tablets (size 1G) on a cluster of 56 nodes and
>>>>> our ingest has slowed as the data quantity and tablet count has grown.
>>>>> Initialy we were achieving 200-300K, now 50-100K.
>>>>>
>>>>> My question is, what is the best way to do this merge?  Should we use
>>>>> the merge command with the size option set at something like 5G, or maybe
>>>>> use the compaction command?
>>>>>
>>>>> From my tests this process could take some time so I'm keen to
>>>>> understand the most efficient approach.
>>>>>
>>>>> Thanks in advance,
>>>>> Matt Dickson
>>>>
>>>>
>>>
>>
>

Re: Efficient Tablet Merging [SEC=UNOFFICIAL]

Reply via email to