> 
> 
> 1. After reading the materials you sent to me, I am confused how Bloom Filter 
> could save I/O during random read. Supposing I am not using Bloom Filter, in 
> order to find whether a row (or row-key) exists, we need to scan the index 
> block which is at the end part of an HFile, the scan is in memory (I think 
> index block is always in memory, please feel free to correct me if I am 
> wrong) using binary search -- it should be pretty fast. With Bloom Filter, we 
> could be a bit faster by looking up Bloom Filter bit vector in memory. Since 
> both index block binary search and Bloom Filter bit vector search are doing 
> in memory (no I/O is involved), what kinds of I/O is saved? :-)
> 

If bloom says the Row *may* be present.. the block is loaded otherwise not...
If there is no bloom... you have to load every block and scan to find if the 
row exists..

This may incur more IO 


> 2. 
> 
> > One Hadoop job doing random reads is perfectly fine.  but , since you said 
> > "Handling directly user traffic"... i assumed you wanted to
> > expose HBase independently to every client request, thereby having as many 
> > connections as the number of simultaneous req..
> 
> Sorry I need to confirm again on this point. I think you mean establishing a 
> new connection for each request is not good, using connection pool or 
> asynchronous I/O is preferred?
> 


Yes.

Reply via email to