Hi Chris,

Thanks for the explaination.

Regards,
Raj




________________________________
 From: Chris Embree <[email protected]>
To: [email protected]; Raj Hadoop <[email protected]> 
Sent: Monday, May 20, 2013 1:51 PM
Subject: Re: Low latency data access Vs High throughput of data
 


I'll take a swing at this one.

Low latency data access:  I hit the enter key (or submit button) and I expect 
results within seconds at most.  My database query time should be sub-second.
High throughput of data:  I want to scan millions of rows of data and count or 
sum some subset.  I expect this will take a few minutes (or much longer 
depending on complexity) to complete.  Think of more batch style jobs.

Caveats: This is really a map/reduce issue also.  The Set up and processing of 
M/R jobs takes a bit of overhead.  There are a couple of projects working now 
to move toward lower latency data access.

Also, HDFS stores data in blocks and distributes them across many nodes.  This 
means that there will (almost) always be some network data transfer required to 
get the final answer, and that "slows" things down a bit, depending on 
throughput and various other factors.

Hope that helps. :)



On Mon, May 20, 2013 at 10:48 AM, Raj Hadoop <[email protected]> wrote:

Hi,
>
>
>I have a basic question on HDFS. I was reading that HDFS doesnt work well with 
low latency data access. Rather it is designed for the high throughput 
of data. Can you please explain in simple words the difference between 
"Low latency data access Vs High throughput of data".
>
>
>
>Thanks,
>Raj

Reply via email to