I haven't had time to dig into it yet but am hoping the Zipkin will help
with some of these insights. (Unless that is the distributed trace you
were referring to?)
-Jonathan
On 08/26/2016 04:54 PM, Mario Pastorelli wrote:
I would like to understand the performance of a batch scan and I would
like to have some hints on how to proceed. I have enabled the
distributed trace, and it tells me that some batch scanner threads
take much more time than others to complete but this is not helpful
enough because it's not telling me why some threads take more. My gut
feeling is that one batch thread is scanning more data than the
others, which means that the data is not well distributed for a query,
but I use a random shard byte as prefix of the keys which should
guarantee that data of the same range is almost equally distributed
among the tservers. I enabled JMX on the tservers and attached
jvisualvm to get an idea of the state of each tserver but I couldn't
find anything meaningful. I would like to know if there is a way to
profile what's going on on a single tserver for a single scan thread
and by this I mean:
1. where are the tablets required by a scan? Which tablet server?
2. how fast was the lookups on the index for that scan?
3. how many bytes/records were read for that scan without the iterators
4. how many seeks are done by the scan and possibly why
The main Accumulo UI is fine to get an overview of Accumulo but don't
really give you any information about the performance of a single
query and it seems to me that they are heavily affected by what
iterators do. Profiling a single scan is much more interesting. Is
there a way to profile a single (batch) scan in Accumulo such that I
have a complete overview of the entire process of reading and sending
back records to the driver?
Thanks,
Mario
--
Mario Pastorelli| TERALYTICS
*software engineer*
Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
phone:+41794381682
email: [email protected]
<mailto:[email protected]>
www.teralytics.net <http://www.teralytics.net/>
Company registration number: CH-020.3.037.709-7 | Trade register
Canton Zurich
Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
Yann de Vries
This e-mail message contains confidential information which is for the
sole attention and use of the intended recipient. Please notify us at
once if you think that it may not be intended for you and delete it
immediately.