Re: Losing tservers - Unusually high Last Contact times

thomasa Wed, 21 May 2014 09:02:17 -0700

Increasing the timeout settings helped a little, but when I tried to increase
the number of map tasks for the workers I ran into instability issues.


After re-reading my original post, I think I left out some important
details. The type of job I am trying to run is a map reduce ingest that uses
batch writers to populate an accumulo table. On previous, smaller clouds, I
have had control of disk allocation and made sure to assign a disk per
worker to avoid write conflicts. On this larger cloud, the disk management
is transparent to me, but I believe the physical disks backing the vms are
seen as one large virtual pool. Write times on the big, unstable cloud are
very fast, 3-4xtimes that of our smaller clouds, but that is seen when I dd
a file on just one vm. I think when all 150+ nodes are writing to disk, more
than one node will try to write to the same physical disk and cause
problematic iowait% (20-50% at least). 

So, given my situation, what is the best way to configure accumulo knowing
that the workers share disks and will have write conflicts? Do I just bump
resources down for ingest for stability then ramp them up for non-ingest
jobs?



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/Losing-tservers-Unusually-high-Last-Contact-times-tp9950p10005.html
Sent from the Users mailing list archive at Nabble.com.

Re: Losing tservers - Unusually high Last Contact times

Reply via email to