On Thu, Feb 1, 2018 at 9:45 PM, Jp Gupta <newlife...@gmail.com> wrote:
> As an existing HBase user, we handle close to 20TB of data everyday.
What does "handle" mean in this case? You are inserting 20TB of new data
each day, so that your total dataset grows by that amount? How much data do
you retain? How many nodes is your cluster? (I would guess many hundred?)
> While we are contemplating on moving to Kudu to take advantage of the new
> technology, I am yet to hear of an real industry use case where Kudu is
> being to used to handle of huge amount of data.
If you are seeing Kudu as an "improved HBase" that isn't really accurate.
Of course there are some things we can do better than HBase, but there are
some things HBase can do better than Kudu.
As for Kudu data sizes, I am aware of some organizations storing several
hundred TB in a Kudu cluster, but I have not yet heard of a use case with
1PB+. If you are looking to run at that scale you may hit some issues, but
we are standing ready to help you overcome them. I don't see any
fundamental problems that would prevent it, and I have run some basic smoke
tests of Kudu on ~800 nodes before.
> Looking forward to your inputs on any organisation using Kudu where data
> volumes of more than 10 TB is ingested everyday.
Hope some other users can chime in.
Software Engineer, Cloudera