Answers below.
________________________________ From: Jean-Daniel Cryans <[email protected]> To: [email protected] Sent: Friday, September 16, 2011 2:08 PM Subject: Re: REcovering from SocketTimeout during scan in 90.3 On Fri, Sep 16, 2011 at 12:17 PM, Douglas Campbell <[email protected]> wrote: > The min/max keys are for each region right? Are they pretty big? > > doug : Typically around 100 keys and each key is 24bytes A typical region would be like - stores=4, storefiles=4, storefileSizeMB=1005, memstoreSizeMB=46, storefileIndexSizeMB=6 Sorry, I meant to ask how big the regions were, not the rows. > Are you sharing scanners between multiple threads? > > doug: no - but each Result from the scan is passed to a thread to merge with > input and write back. Yeah, this really isn't what I'm reading tho... Would it be possible to see a full stack trace that contains those BLOCKED threads? (please put it in a pastebin) http://kpaste.net/02f67d >> I had one or more runs where this error occured and I wasn't taking care to >> call scanner.close() The other thing I was thinking, did you already implement the re-init of the scanner? If so, what's the code like? >>> The code traps runtime exception around the scanner iterator (pseudoish) while (toprocess.size() > 0 && !donescanning) { Scanner scanner = table.getScanner(buildScan(toprocess)); try { for (Result r: scanner) { toprocess.remove(r.getRow()); // fork thread with r if (toprocess.size() == 0) donescanning = true; } } catch (RuntimeException e) { scanner.close(); if (e.getCause() instanceof IOEXception) { // probably hbase ex scanner = getScanner(buildScan(toprocess)); } else { donescanning = true; } } >>> buildScan takes the keys and crams them in the filter. Thx, J-D
