Re: increase "running scans" in monitor?

Josh Elser Tue, 02 Apr 2013 08:07:29 -0700

Hi Marc,

How many tablets are in the table you're running MR over (see themonitor)? Might adding some more splits to your table (`addsplits` inthe Accumulo shell) get you better parallelism?

What does your data look like in your table? Lots of small rows? Fewvery large rows?


On 4/2/13 10:56 AM, Marc Reichman wrote:

Hello,
I am running a accumulo-based MR job using the AccumuloRowInputFormaton 1.4.1. Config is more-or-less default, using the native-standalone3GB template, but with the TServer memory put up to 2GB inaccumulo-env.sh from its default. accumulo-site.xml hastserver.memory.maps.max at 1G, tserver.cache.data.size at 50M, andtserver.cache.index.size at 512M.
My tables are created with maxversions for all three types (scan,minc, majc) at 1 and compress type as gz.
I am finding, on an 8 node test cluster with 64 map task slots, thatwhen a job is running, the 'Running Scans' count in the monitor isroughly 0-4 on average for each tablet server. When viewed at thetable view, this puts the running scans anywhere from 4-24 on average.I would expect/hope the scans to be somewhere close to the map taskcount. To me, this means one of the following.1. There is a configuration setting inhibiting the amount of scansfrom accumulating (excuse the pun) to about the same amount as my maptasks2. My map task job is cpu-intensive enough to introduce delays betweenscans and everything is fine
3. Some combination of 1/2.
On an alternate cluster, 40 nodes with 320 task slots, we haven't seenanywhere near full capacity scanning with map tasks which have thesame performance, and the problem seems much worse.
I am experimenting with some of the readahead configuration variablesfor the tablet servers in the meantime, but haven't found any smokingguns yet.
Thank you,
Marc


--
http://saucyandbossy.wordpress.com

Re: increase "running scans" in monitor?

Reply via email to