Great details... but I need to sleep. I'll dig in more tomorrow. Sorry!
On Thu, Oct 3, 2013 at 11:20 PM, Dickson, Matt MR <[email protected]> wrote: > UNOFFICIAL > > Hi Eric, > Our answers are in blue. Just a note that we do have the write ahead log > disabled for ingest performance. > We have a public holiday on Monday, so we may be delayed in our response. > > Cheers > Matt > > ________________________________ > From: Eric Newton [mailto:[email protected]] > Sent: Friday, 4 October 2013 11:20 > > To: [email protected] > Subject: Re: Efficient Tablet Merging [SEC=UNOFFICIAL] > > Any errors on those servers? Each server should be checking periodically > for compactions, some crazy errors might escape error handling, though that > is rare these days. > In the tserver debug log there is a repeating error of "Internal error > processing applyUpdates > org.apache.accumulo.server.tabletserver.HoldTimeoutException: Commits are > held" > > Also found in the tserver log: > ERROR: Failed to find midpoint Filesystem closed > WARN: Tablet .... has too many files, batch lookup cannont run > > Are you experiencing any table level errors? Unable to read or write files? > No table level errors or read errors > > > How full is HDFS? > 32% > > If you scan the !METADATA table, are you seeing any trend in the tablets > that have problems? > By getting the extent id of the tablets that are large and then finding the > range of that tablet by using 'getsplits -v' I have scanned the !METADATA > table and can see a massive number of *.rf files associated with the range. > Is there anything particular I should look at. > > At this point, we're looking for logged anomalies, the earlier the better. > Anything red or yellow on the monitor pages. > I ran one of the scans that hang and then see the following: > > Several "WARN Exception sying java.lang.reflect.InvocationTargetException" > > Several "ERROR Unexpected error writing to log, retrying attempt 1 > InvocationTargetException > Caused by LeaseExpiredException: Lease mismatch on /accumulo/wal/... > owned by DFSClient_NOMAPREDUCE_56390516_13 but is accessed by > DFSClient_NOMAPREDUCE_1080760417_13" > > "ERROR TTransportException: javav.net.SocketTimeoutException: ... while > waiting for channel to be ready for write. ...." > > Bunch of "WARN Tablet 234234234 has too many files..." > > > > > > > > On Thu, Oct 3, 2013 at 8:43 PM, Dickson, Matt MR > <[email protected]> wrote: >> >> UNOFFICIAL >> >> We have restarted the tablet servers that contain tablets with high >> volumes of files and did not see any majc's run. >> >> Some more details are: >> On 3 of our nodes we have 10-15 times the number of entries that are on >> the other nodes. When I view the tablets for one of these nodes there are 2 >> tablets with almost 10 times the the number of entries as the others. >> >> When we query on the date rowid's the queries are now hanging and there >> are several scans running on the 3 nodes that have higher entries and they >> are not completing, can I cancel these? >> >> In the logs we are getting "tablet ..... has too many files, batch lookup >> can not run" >> >> At this point I'm stuck for ideas, so any suggestions would be great. >> >> ________________________________ >> From: Eric Newton [mailto:[email protected]] >> Sent: Thursday, 3 October 2013 23:52 >> >> To: [email protected] >> Subject: Re: Efficient Tablet Merging [SEC=UNOFFICIAL] >> >> You should have a major compaction running if your tablet has too many >> files. If you don't, something is wrong. It does take some time to re-write >> 10G of data. >> >> If many merges occurred on a single tablet server, you may have these >> many-file tablets on the same server, and there are not enough major >> compaction threads to re-write those files right away. If that's true, you >> may wish to restart the tablet server in order to get the tablets pushed to >> other idle servers. >> >> Again, if you don't have major compactions running, you will want to start >> looking for other problems. >> >> -Eric >> >> >> >> On Thu, Oct 3, 2013 at 2:29 AM, Dickson, Matt MR >> <[email protected]> wrote: >>> >>> UNOFFICIAL >>> >>> Hi Eric, >>> >>> We have gone with the second more conservative option. We changed our >>> split threshold to 10GB and then we ran a merge over a week worth of tablets >>> which has resulted in one tablet with a massive number of files. We then ran >>> a query over that range and it is returning an message saying: >>> >>> Tablet has too many files (3n;20130914;20130907...) retrying... >>> >>> We assumed that when the merge was done that a major compaction would be >>> started, which would notice that the tablet is too large, split it into 10GB >>> tablets. We assumed that we would not have to manually start any compaction >>> but instead it would be scheduled at some point after the merge finished. >>> >>> We have completed three separate merges of week long ranges and now have >>> identified 3 tablet extents with too many files. >>> >>> Can you please explain what is supposed to happen? And whether after the >>> merge, compact command for those ranges needs to be run (or will it do it >>> automatically, as we have not seen any started)? >>> >>> Cheers >>> Matt >>> >>> ________________________________ >>> From: Eric Newton [mailto:[email protected]] >>> Sent: Thursday, 3 October 2013 13:28 >>> To: [email protected] >>> Subject: Re: Efficient Tablet Merging [SEC=UNOFFICIAL] >>> >>> I'll use ASCII graphics to demonstrate the size of a tablet. >>> >>> Small: [] >>> Medium: [ ] >>> Large: [ ] >>> >>> Think of it like this... if you are running age-off... you probably have >>> lots of little buckets of rows at the beginning and larger buckets at the >>> end: >>> >>> [][][][][][][][][]...[ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ >>> ][ ] >>> >>> What you probably want is something like this: >>> >>> [ ][ ][ ][ ][ ][ ][ ][ >>> ] >>> >>> Some big bucket at the start, with old data, and some larger buckets for >>> everything afterwards. But... this would probably work: >>> >>> [ ][ ][ ][ ][ ][ ][ ][ ][ >>> ] >>> >>> Just a bunch of larger tablets throughout. >>> >>> So you need to set your merge size to "[ ]" (4G), and you can always >>> keep creating smaller tablets for future rows with manual splits: >>> >>> [ ][ ][ ][ ][ ][ ][ ][ ][ >>> ][ ][ ][ ][ ][ ] >>> >>> >>> So increase the split threshold to 4G, and merge on 4G, but continue to >>> make manual splits for your current days, as necessary. Merge them away >>> later. >>> >>> >>> -Eric >>> >>> >>> >>> >>> On Wed, Oct 2, 2013 at 6:35 PM, Dickson, Matt MR >>> <[email protected]> wrote: >>>> >>>> UNOFFICIAL >>>> >>>> Thanks Eric, >>>> >>>> If I do the merge with size of 4G does the split threshold need to be >>>> increased to the 4G also? >>>> >>>> ________________________________ >>>> From: Eric Newton [mailto:[email protected]] >>>> Sent: Wednesday, 2 October 2013 23:05 >>>> To: [email protected] >>>> Subject: Re: Efficient Tablet Merging [SEC=UNOFFICIAL] >>>> >>>> The most efficient way is kind of scary. If this is a production >>>> system, I would not recommend it. >>>> >>>> First, find out the size of your 10x tablets. Let's say it's 10G. Set >>>> your split threshold to 10G. Then merge all old tablets.... all of them >>>> into one tablet. This will dump thousands of files into a single tablet, >>>> but it will soon split out again into the nice 10G tablets you are looking >>>> for. The system will probably be unusable during this operation. >>>> >>>> The more conservative way is to specify the merge in single steps (the >>>> master will only coordinate a single merge on a table at a time anyhow). >>>> You can do it by range or by size... I would do it by size, especially if >>>> you are aging off your old data. >>>> >>>> Compacting the data won't have any effect on the speed of the merge. >>>> >>>> -Eric >>>> >>>> >>>> >>>> On Tue, Oct 1, 2013 at 11:58 PM, Dickson, Matt MR >>>> <[email protected]> wrote: >>>>> >>>>> UNOFFICIAL >>>>> >>>>> I have a table that we create splits of the form yyyymmdd-nnnn where >>>>> nnnn ranges from 0000 to 0840. The bulk of our data is loaded for the >>>>> current date with no data loaded for days older than 3 days so from my >>>>> understanding it would be wise to merge splits older than 3 days in order >>>>> to >>>>> reduce the overall tablet count. It would still be optimal to maintain >>>>> some >>>>> distribution of tablets for a day across the cluster so I'm looking at >>>>> merging splits in 10 increments eg, merge -b 20130901-0000 -e >>>>> 20130901-0009, >>>>> therefore reducing 840 splits per day to 84. >>>>> >>>>> Currently we have 120K tablets (size 1G) on a cluster of 56 nodes and >>>>> our ingest has slowed as the data quantity and tablet count has grown. >>>>> Initialy we were achieving 200-300K, now 50-100K. >>>>> >>>>> My question is, what is the best way to do this merge? Should we use >>>>> the merge command with the size option set at something like 5G, or maybe >>>>> use the compaction command? >>>>> >>>>> From my tests this process could take some time so I'm keen to >>>>> understand the most efficient approach. >>>>> >>>>> Thanks in advance, >>>>> Matt Dickson >>>> >>>> >>> >> >
