Re: mass user realtime, high concurrency , hardware resource

Nicolas Paris Sun, 14 Feb 2016 08:38:33 -0800

@Ted

 My company (MapR), for instance, makes a database that would likely work,
but many


will not work.

You must be thinking about maprDB right ? Can't drill query directly maprDB
? What is the botleneck in such situation ? Does drill wait for maprDB to
give the resultset ? Where can we get such answers technical details ?

Thanks a lot,

2016-02-14 17:22 GMT+01:00 Ted Dunning <[email protected]>:

> Your answer doesn't really quite provide enough information to permit an
> answer.
>
> First, number of users doesn't really tell us how many queries will happen
> in real time. What you need there is the number of queries per second. That
> number can vary for the same number of users by a factor of 1000 quite
> easily.
>
> Second, querying 10 billion rows can take less than a millisecond for the
> query itself. Or minutes. What kind of query to you mean to do?
>
> If you are talking about something like messaging and personalization which
> high usage per day and very fast response requirements, then Drill is
> likely to be inappropriate almost by definition. The problem is that Drill
> spends a lot of time (100 ms or more) thinking about how to execute the
> query on the theory that most queries in Drill will be complex enough that
> this planning will allow savings of seconds or tens of seconds. That is a
> fine trade-off for complex queries. If you just want to show the last ten
> messages for a user, it is a very bad trade-off and you should probably use
> something other than SQL to do this.
>
> The modern trend for data oriented web access is to expose a REST interface
> that is framed in terms of your business needs. This interface and the
> resulting data are manipulated using browser resident Javascript. To make
> this efficient, you want a database that has direct Javascript access and,
> preferably, one that has a storage module written already for meteor.js or
> similar package.
>
> If you are talking about something like account maintenance where you need
> to do complex queries very rarely (say once a month), then SQL is much more
> plausible since users are likely to accept a long (1 s or more) delay to
> access information. Even here, a good data abstraction in terms of a REST
> microservice is likely to be good.
>
> So... back to your question.
>
> Let's compute some query rates:
>
> 100 million users accessing the system once per month mostly during peak
> hours will cause 100 million accesses / (30 days * 20,000 peak seconds /
> day) = 100e6 / 600e3 < 200 queries per second.
>
> 100 million users with 10 million active users accessing the system 100
> times per day during peak hours will cause 100 * 10 million queries/day /
> 20,000 peak seconds / day = 1e9 / 20e3 = 50e3 queries per second.
>
> Note that the number of *concurrent* queries does not matter here. For the
> first rate, if you have a system that responds in 1ms, there will be
> essentially no concurrency whereas with a horrible system that responds in
> 10 s will require concurrency of 2000 simultaneous queries. What you want
> is throughput at acceptable response time.
>
> Concurrent users is also clearly almost irrelevant. What you want is usage
> patterns x user population to get queries per second.
>
> So the answer is that the first rate of 200 queries per second could
> probably be dealt with using Drill given enough hardware, but a
> programmatic interface to even a fairly small scale data base would
> probably be much, much better. In general, I expect that a document
> oriented database will suit your needs much better than a relational one.
> My guess is that not very many analytical query engines like Drill will be
> cost effective in this range. Almost every document oriented database will
> be able to handle this.
>
> For the second rate of 50,000 per second, there very few SQL based systems
> that will suit your needs, but programmatic access to a document oriented
> database is likely to be your only cost effective solution here. My company
> (MapR), for instance, makes a database that would likely work, but many
> will not work.
>
> On Sun, Feb 14, 2016 at 4:27 AM, <[email protected]> wrote:
>
> > hi,     i want to build a app which need for support sevaral  hundred
> > million users realtime query from about ten billion row records. dose
> > apache drill fit for this requirement? dose it support High concurrency?
> > dose it need mass hardware resource to archive the low latency
> > performance?  resource exchange performance? i use hbase as database.
> >
> >
> >                         李启明 from China
>

Re: mass user realtime, high concurrency , hardware resource

Reply via email to