Hi, I'm using endpoint coprocessors to do intense scans in parallel on some tables. I log the time it takes for each coprocessor to finish its job on the region. Each coprocessor rarely takes longer than a few seconds, maximum of 5 seconds (there are only 5 regions on the tables right now). As my cluster grows with data the call HTable.coprocessorExec takes longer and longer but the coprocessors themselves finish quickly (under 5 seconds). Eventually I see all my regionservers die because the coprocessorExec call timed out and zookeeper kills the connection, which makes the regionserver die.
In terms of code structure, the coprocessorExec call is done inside a for-loop. The for-loop iterates over a List of objects to help form filters for the endpoint and then calls the coprocessorExec once per object processed. What would be the bottleneck? Is calling the coprocessor like this in a for-loop loading the regions down and not allowing them time to do GC? Is there a way to ping a table and judge if it'll be ready for the endpoint call? Thanks, -Kevin
