[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-11-16 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214002#comment-14214002
 ] 

Vladimir Rodionov commented on HBASE-7336:
--

[~lhofhansl]

Yes that would be helpful for some use cases (compaction + 1 application 
scanner). There is a (not so simple) patch in HBASE-12031 that fixes the issue 
for all use cases (compaction + N application scanners) but it seems that 
nobody has tried it yet. Phoenix parallel intra region scanners would benefit 
from HBASE-12031 the most. 

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-11-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214031#comment-14214031
 ] 

Lars Hofhansl commented on HBASE-7336:
--

See patch in HBASE-12411, relatively easy way to have compactions work with 
their own scanners.

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-11-15 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14213870#comment-14213870
 ] 

Lars Hofhansl commented on HBASE-7336:
--

[~vrodionov], I have a quite simple patch that would allow compaction to have 
their own, private readers. With what you've seen, would that be a benefit?

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-07-25 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074812#comment-14074812
 ] 

Lars Hofhansl commented on HBASE-7336:
--

[~vrodionov], curious about how you will find good splitpoints *inside* a 
region. Regions can be assumed to be roughly of equal size (in terms of bytes, 
not rows), but inside a region the distribution of keys can be arbitrarily 
skewed, and hence unless you have more state you cannot find good splits inside 
a region.
(the region split points actually are a very rough histogram for data 
distribution)

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-07-25 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074849#comment-14074849
 ] 

Vladimir Rodionov commented on HBASE-7336:
--

[~lhofhansl]

The major priority right now is to improve compaction and normal operations 
during compaction. Sure we need to track region stats to make optimal 
inter-region splits, but even without such a stats we can decrease data skew 
significantly. 

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-07-16 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063209#comment-14063209
 ] 

Liang Xie commented on HBASE-7336:
--

Yes, i observed similar problem.  Long time ago i had a raw idea to impl a 
multi streams/readers prototype, maybe i can share the patch once ready:)

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-07-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063630#comment-14063630
 ] 

Lars Hofhansl commented on HBASE-7336:
--

Local mode is not the same. It does not actually run HDFS, but just pretty ad 
hoc DFS wrapper.
Tests are only valid when tested again real HDFS - even when HDFS is in single 
node mode.
 
With actual HDFS I have not observed your numbers [~vrodionov].


 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-07-16 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063652#comment-14063652
 ] 

Vladimir Rodionov commented on HBASE-7336:
--

HBase mini cluster runs HDFS and code path to access files/data is the same as 
in a real cluster mode, I think. At least, PackageSender/PackageReceiver(s) and 
all IPC stuff for NN and DN are present in stack traces.

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-07-16 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063662#comment-14063662
 ] 

Vladimir Rodionov commented on HBASE-7336:
--

[~xieliang007]], I am working on efficient multiple scanners support for single 
store file as well. We  need this for several purposes:

* Improve single task granularity during query execution (currently, it is a 
single region)
* Improve scanner performance during compaction(s)
* Improve compaction performance during normal operations.

The patch will be submitted soon when we get all testing done.  

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-07-15 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062356#comment-14062356
 ] 

Vladimir Rodionov commented on HBASE-7336:
--

h3. Effect of compaction on large scan operations and vice verse

The compaction scanner and user scanner will compete for the same input stream 
(DFSInputStream). This results in a sub optimal performance for both of them, 
becuase *there is no guarantee that next call to read HFile block from the 
lucky scanner will use the same streaming API and pre-cached data will still 
be valid*. Yep? Both scanners, periodically, switch between stream/pread API 
calls, hdfs cache can not be used (?), performance of both of them is defined 
by positional read performance (which is low for scan mode operation).

Is this correct assessment?

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-07-15 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062733#comment-14062733
 ] 

Vladimir Rodionov commented on HBASE-7336:
--

DFSInputStream class is heavily synchronized (at least in HDFS 2.2) and 
regardless of a read op type (stream, positional) all readers will be waiting 
on a single lock eventually.  This is what I see in my local tests. 

1 scanner - 14 sec
2 scanners - 36 sec (!!!) 
4 scanners - too long to be true. 

I have no explanation yet, but something is wrong here.

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-07-15 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062740#comment-14062740
 ] 

Vladimir Rodionov commented on HBASE-7336:
--

Forgot to mention:

HBase 0.98.3 hadoop2. All tests are in a local mode with HBase mini cluster.

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-07-15 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062749#comment-14062749
 ] 

stack commented on HBASE-7336:
--

[~vrodionov] Can you try on hdfs?

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-07-15 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062814#comment-14062814
 ] 

Vladimir Rodionov commented on HBASE-7336:
--

I monitor thread stack traces during test run. Usually, just one thread 
(Scanner) is running, all others are waiting on DFSInputStream in some places 
(as I said, too many synchronized methods). This is HDFS, not HBase.

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-07-02 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050538#comment-14050538
 ] 

Vladimir Rodionov commented on HBASE-7336:
--

I am looking into this stuff now, trying to figure out how to make parallel 
scan on a region more efficient. The code in AbstractFSReader looks dangerous 
and does not provide any benefits in terms of MT performance.
{code}
protected int readAtOffset(FSDataInputStream istream,
byte[] dest, int destOffset, int size,
boolean peekIntoNextBlock, long fileOffset, boolean pread)
throws IOException {
  if (peekIntoNextBlock 
  destOffset + size + hdrSize  dest.length) {
// We are asked to read the next block's header as well, but there is
// not enough room in the array.
throw new IOException(Attempted to read  + size +  bytes and  +
hdrSize +  bytes of next header into a  + dest.length +
-byte array at offset  + destOffset);
  }

  if (!pread  streamLock.tryLock()) {
// Seek + read. Better for scanning.
try {
  istream.seek(fileOffset);

  long realOffset = istream.getPos();
  if (realOffset != fileOffset) {
throw new IOException(Tried to seek to  + fileOffset +  to 
+ read  + size +  bytes, but pos= + realOffset
+  after seek);
  }

  if (!peekIntoNextBlock) {
IOUtils.readFully(istream, dest, destOffset, size);
return -1;
  }

  // Try to read the next block header.
  if (!readWithExtra(istream, dest, destOffset, size, hdrSize))
return -1;
} finally {
  streamLock.unlock();
}
  } else {
// Positional read. Better for random reads; or when the streamLock is 
already locked.
int extraSize = peekIntoNextBlock ? hdrSize : 0;

int ret = istream.read(fileOffset, dest, destOffset, size + extraSize);
if (ret  size) {
  throw new IOException(Positional read of  + size +  bytes  +
  failed at offset  + fileOffset +  (returned  + ret + ));
}

if (ret == size || ret  size + extraSize) {
  // Could not read the next block's header, or did not try.
  return -1;
}
  }

  assert peekIntoNextBlock;
  return Bytes.toInt(dest, destOffset + size + BlockType.MAGIC_LENGTH) +
  hdrSize;
}
{code}

Positional reads in FSInputStream (DFSInputStream) are heavily synchronized. It 
is lock on stream than seek and read, unlock. Here is the code for 
FSInputStream:
{code}
  @Override
  public int read(long position, byte[] buffer, int offset, int length)
throws IOException {
synchronized (this) {
  long oldPos = getPos();
  int nread = -1;
  try {
seek(position);
nread = read(buffer, offset, length);
  } finally {
seek(oldPos);
  }
  return nread;
}
  }
{code}

DFSInputStream extends FSInputStream but does not override the above method. 
Taking into account that code is synchronized, it is hard to explain observed 
performance improvement published in this JIRA.


  


 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-07-02 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050541#comment-14050541
 ] 

Vladimir Rodionov commented on HBASE-7336:
--

Upd. The code is not dangerous - it just does not do what it was supposed to do.

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-07-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050768#comment-14050768
 ] 

Lars Hofhansl commented on HBASE-7336:
--

The trylock stuff is from the patch here:
# try to do seek + read (if requested, i.e. a scan)
# if that is not possible rather than locking, do a pread immediately.

My tests showed a significant improvement, you can always verify yourself. Note 
that this was 1.5 years ago.

Curious... If what you say is true there would be no difference at all between 
seek+read and pread. That would indeed be bad. Hmm.


 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-07-02 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050777#comment-14050777
 ] 

Vladimir Rodionov commented on HBASE-7336:
--

I was not right, Lars. *DFSInputStream* overrides positional read - no locks. 
But there is something else ...

There is no much sense in allowing one random scanner run in a stream mode as 
since, there is no guarantee that next call to read HFile block from the 
lucky scanner will use the same streaming API and pre-cached data will still 
be valid. Some other scanner might dump this data before. Correct? 

You may try all *pread*'s, for all scanners and compare performance. I bet it 
will be close to what we have right now. 

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-07-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050778#comment-14050778
 ] 

Lars Hofhansl commented on HBASE-7336:
--

DFSInputStream *does* override this method (checked Hadoop-trunk). The 
overridden method directly calculates the offset without locking as it should. 
All is good on this front.

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2014-07-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050791#comment-14050791
 ] 

Lars Hofhansl commented on HBASE-7336:
--

I agree with your assessment on the seek+read vs pread. The current should not 
be worse than doing all pread, though.

I see you commented on HBASE-5979 as well, and that would be a correct way to 
fix this. Each scanner would have its own stream and hence seek+read should be 
better there.
The issue there is invalidation after a compaction or flush, although that 
needs some fixing anyway - I tried (unsuccessfully) in HBASE-10060. The issue 
there is that even the memory barriers taken for a lock that is uncontended 
99.% of the time is a significant performance problem, but I have not been 
able to remove it and still ensure correct behavior.

I'm glad you're looking at this, this area needs some TLC.


 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
  Components: Performance
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.4, 0.95.0

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2013-01-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13544426#comment-13544426
 ] 

Hudson commented on HBASE-7336:
---

Integrated in HBase-0.94-security-on-Hadoop-23 #10 (See 
[https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/10/])
HBASE-7336 Reapply, the OOMs were not caused by this. (Revision 1423084)
HBASE-7336 Revert due to OOMs on TestHFileBlock potentially caused by this. 
(Revision 1422767)
HBASE-7336 HFileBlock.readAtOffset does not work well with multiple threads 
(Revision 1421439)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java

larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java

larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java


 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537918#comment-13537918
 ] 

Hudson commented on HBASE-7336:
---

Integrated in HBase-0.94-security #87 (See 
[https://builds.apache.org/job/HBase-0.94-security/87/])
HBASE-7336 Reapply, the OOMs were not caused by this. (Revision 1423084)
HBASE-7336 Revert due to OOMs on TestHFileBlock potentially caused by this. 
(Revision 1422767)
HBASE-7336 HFileBlock.readAtOffset does not work well with multiple threads 
(Revision 1421439)

 Result = SUCCESS
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java

larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java

larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java


 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533726#comment-13533726
 ] 

Hudson commented on HBASE-7336:
---

Integrated in HBase-0.94 #632 (See 
[https://builds.apache.org/job/HBase-0.94/632/])
HBASE-7336 Revert due to OOMs on TestHFileBlock potentially caused by this. 
(Revision 1422767)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java


 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-17 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534149#comment-13534149
 ] 

Lars Hofhansl commented on HBASE-7336:
--

The 0.94 tests fail with the same OOM even without this patch, so I am going to 
reapply. Sorry for the noise, but I had to make sure.

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534423#comment-13534423
 ] 

Hudson commented on HBASE-7336:
---

Integrated in HBase-0.94 #635 (See 
[https://builds.apache.org/job/HBase-0.94/635/])
HBASE-7336 Reapply, the OOMs were not caused by this. (Revision 1423084)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java


 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533567#comment-13533567
 ] 

Lars Hofhansl commented on HBASE-7336:
--

TestHFileBlock.testConcurrentReading[1] is failing in the 0.94 test runs now 
(with OOMs).

I do not think this is because of this change, but at the same time I do not 
see any other related changes.
It does not fail locally.


 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533583#comment-13533583
 ] 

Lars Hofhansl commented on HBASE-7336:
--

Running the test locally also seems to consume *less* memory with the patch. So 
I have no explanation really, why this test is failing now.

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533606#comment-13533606
 ] 

Lars Hofhansl commented on HBASE-7336:
--

The latest test run passed.

Looking at testConcurrentReading, it is a time bounded test. Checking the 
number of blocks that the concurrent readers manage to read within the time 
bound is actually *larger* without this patch (i.e. this patch is slowing this 
down)... Presumably because more threads are now using pread.
(This test is reading random blocks randomly choosing pread vs not from many 
threads. And these blocks are very small too. So not sure how real world that 
is.)

On the other hand, without this patch scanners that scan large HFiles 
concurrently in a tight loop are useless (i.e. never make enough progress to 
not time out).


 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533608#comment-13533608
 ] 

Lars Hofhansl commented on HBASE-7336:
--

Somewhat tempted to revert this patch.

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533655#comment-13533655
 ] 

Lars Hofhansl commented on HBASE-7336:
--

Actually that test is not performance representative. When I revert this change 
and then have this test only do preads it is quite slow. If I have this test 
only do seek+read it is much faster. And that is even though the reads of these 
blocks are random (each thread on each iteration reads a random block), which 
should favor preads.

My changes then makes this slightly slower, because the likelihood of a pread 
is slightly higher.


 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532736#comment-13532736
 ] 

Lars Hofhansl commented on HBASE-7336:
--

Note that the previous tests all were scans that did not return anything to the 
client.
Did another verification test with scans in blocks and returns data to the 
client. Even in that case the performance is better, because of improved 
concurrency.
(Three concurrent clients went from 74s to 65s)


 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531265#comment-13531265
 ] 

stack commented on HBASE-7336:
--

+1 on committing this for now.

On Reader per long-running scanner, it will complicate the swapping in of new 
files on compaction but probably worth figuring out.  Lets file issues for 
further improvement.

Good stuff Lars.

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531500#comment-13531500
 ] 

Hudson commented on HBASE-7336:
---

Integrated in HBase-0.94 #625 (See 
[https://builds.apache.org/job/HBase-0.94/625/])
HBASE-7336 HFileBlock.readAtOffset does not work well with multiple threads 
(Revision 1421439)

 Result = SUCCESS
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java


 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531506#comment-13531506
 ] 

Hudson commented on HBASE-7336:
---

Integrated in HBase-TRUNK #3618 (See 
[https://builds.apache.org/job/HBase-TRUNK/3618/])
HBASE-7336 HFileBlock.readAtOffset does not work well with multiple threads 
(Revision 1421440)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java


 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13531664#comment-13531664
 ] 

Hudson commented on HBASE-7336:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #295 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/295/])
HBASE-7336 HFileBlock.readAtOffset does not work well with multiple threads 
(Revision 1421440)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java


 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529730#comment-13529730
 ] 

Lars Hofhansl commented on HBASE-7336:
--

bq. Compactions should go get their own Reader?
That sounds like a save and important improvement.

In other cases it actually seems best to try to get a stream and fall back to 
pread if that fails.

Could drive # of reader by he size of the store file, something like a reader 
per n GB (n = 1 or 2 maybe). Then we round robin the readers.

Should I commit this for now (assuming it passes HadoopQA and no objections), 
and we investigate other options further? Or discuss a bit more to see if we 
kind other options?


 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529785#comment-13529785
 ] 

Hadoop QA commented on HBASE-7336:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12560514/7336-0.96.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
104 warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 23 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestMultiParallel

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3490//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3490//console

This message is automatically generated.

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530074#comment-13530074
 ] 

Lars Hofhansl commented on HBASE-7336:
--

TestMultiParallel passed locally.

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530512#comment-13530512
 ] 

Lars Hofhansl commented on HBASE-7336:
--

Any objections to committing this (0.94 and 0.96). I'm pretty sure it won't 
make things worse, and it provably improves some scenarios.

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530580#comment-13530580
 ] 

Lars Hofhansl commented on HBASE-7336:
--

Numbers with patch and read short circuit enabled for completeness:
One client: 15s
Two clients: 22s

(same as without short circuiting)

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-12 Thread liang xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530612#comment-13530612
 ] 

liang xie commented on HBASE-7336:
--

+1, LGTM,a nice test case, Lars

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529642#comment-13529642
 ] 

Lars Hofhansl commented on HBASE-7336:
--

Here's my test case: 20m rows, single column family, single column, blockcache 
disabled for the scan, no HDFS short circuiting, all data fits into the OS 
buffer cache. Full scan over all rows.

One client scanning: 15s (regionserver keeps one core busy ~120%)
Two clients scanning along the same set of blocks: They both time out. 
(regionserver is a 5-6% CPU, clearly just waiting)

Then I hacked readAtOffset to always do preads. Now:
One client scanning: 39s (regionserver at ~120%)
Two clients scanning: 39s each (regionserver at ~210%)

With short circuiting enabled:
One client: 15s (120% CPU)
Two clients: 41s each (160% CPU)

with pread:
One client: 18s (160% CPU)
Two clients: 19s each (250% CPU)

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529669#comment-13529669
 ] 

stack commented on HBASE-7336:
--

Should the tryLock have a timeout on it?

Else patch is an improvement for sure worth committing till we do better.

I wonder what the difference is if more than one Reader.

Looking at hdfs, each open goes to the namenode so can't open a reader per 
scanner... would cost a bit on setup?  If we could distingush short from long 
scan we could do pread on the shorts and open a reader per Scanner for the long 
runners?

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529682#comment-13529682
 ] 

Lars Hofhansl commented on HBASE-7336:
--

tryLock without a timeout returns immediately. I think in this case it should, 
waiting here would potentially wait (at least a threads context switch) on each 
single next or get request.

I think this issue would be most prominent in long scans, where the scans 
typically do not want to mess up the block cache.

Multiple readers is probably the right solution. How many, though? And I am not 
sure how this would interact with the block cache (would probably be OK).


 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

2012-12-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529712#comment-13529712
 ] 

stack commented on HBASE-7336:
--

bq. tryLock without a timeout returns immediately.

Sorry.  Thought for some reason it waited.

bq. How many, though? And I am not sure how this would interact with the block 
cache (would probably be OK).

One per scanner?  Can we look at a scanner and figure somehow if its long or 
short?  If start/stop row?  Limit?  Maybe allow someone pass a hint?  
Compactions should go get their own Reader?

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira