RE: Straggler problem in Accumulo BatchScans

Slater, David M. Wed, 21 Aug 2013 17:14:00 -0700

Thanks Eric,

Just to make sure I'm going in the right direction, this would involve 
extending the TabletBalancer class, correct? How do I add it to the table after 
that (and remove the old one)? I don't see it under the Connector's 
TableOperations().


Is using a load-balancer what you would recommend if I wanted to make sure that 
two different tables stored related information (e.g. data and indexes) on the 
same tablets?

Thanks,
David

From: Eric Newton [mailto:[email protected]]
Sent: Wednesday, August 21, 2013 8:03 PM
To: [email protected]
Subject: Re: Straggler problem in Accumulo BatchScans

A new balancer is a plug-in class that instructs the Master process where to 
place tablets.

If you know you need your tablets spread out over servers based on time (row 
id), you can do that.  It's pretty common, in fact.

-Eric

On Wed, Aug 21, 2013 at 7:54 PM, Slater, David M. 
<[email protected]<mailto:[email protected]>> wrote:
Hi Dave,

The table is currently organizing netflow data with its rowID of 
timestamp_netflowRecordID, some columns corresponding to various netflow 
quantites, and one column representing the entire netflow in binary form.

The table is about 1.2 TB, and I am scanning 5-40 GB per scan, which scans 
about 7-28 tablets.

What do you mean by a custom load balancer? Do you mean balancing the data on 
ingest, or balancing the query load? What would you recommend for balancing the 
query load if I can only retrieve the data from a particular tablet?

I've played with index/data caches, though I haven't used readahead threads or 
max open files. Is that referring to rfiles?

I'm noticing that most of the queries are CPU bound, and that read i/o is not 
being hit very hard. Is that a typical behavior for scans?

Thanks,
David

From: Dave Marion [mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, August 21, 2013 7:29 PM
To: [email protected]<mailto:[email protected]>
Subject: RE: Straggler problem in Accumulo BatchScans

How is the table organized?
What percent of the table are you scanning in these large operations?
Have you considered writing a custom load balancer?

I don't think that a tablet can be hosted on multiple servers. But you might be 
able to play around with the index/data caches, readahead threads (concurrent 
queries), and max open files to achieve better performance.

From: Slater, David M. [mailto:[email protected]]
Sent: Wednesday, August 21, 2013 7:09 PM
To: [email protected]<mailto:[email protected]>
Subject: Straggler problem in Accumulo BatchScans

Hey, I have a 7 node network running accumulo 1.4.1 and hadoop 1.0.4.

When I run large BatchScanner operations, the number of tablets scanned per 
node is not uniform, leading to the overloaded nodes taking much longer to 
finish than the others. For queries that require all of the scans to finish 
before returning, this is a major latency issue. What are some practical means 
of load-balancing this to reduce delay?

Is it possible for tablets to be hosted on multiple tablet servers, up to the 
replication factor of the underlying hdfs? Are there reasons this might be an 
undesirable design?

Thanks in advance,
David

RE: Straggler problem in Accumulo BatchScans

Reply via email to