@Ted My company (MapR), for instance, makes a database that would likely work, but many
will not work. You must be thinking about maprDB right ? Can't drill query directly maprDB ? What is the botleneck in such situation ? Does drill wait for maprDB to give the resultset ? Where can we get such answers technical details ? Thanks a lot, 2016-02-14 17:22 GMT+01:00 Ted Dunning <[email protected]>: > Your answer doesn't really quite provide enough information to permit an > answer. > > First, number of users doesn't really tell us how many queries will happen > in real time. What you need there is the number of queries per second. That > number can vary for the same number of users by a factor of 1000 quite > easily. > > Second, querying 10 billion rows can take less than a millisecond for the > query itself. Or minutes. What kind of query to you mean to do? > > If you are talking about something like messaging and personalization which > high usage per day and very fast response requirements, then Drill is > likely to be inappropriate almost by definition. The problem is that Drill > spends a lot of time (100 ms or more) thinking about how to execute the > query on the theory that most queries in Drill will be complex enough that > this planning will allow savings of seconds or tens of seconds. That is a > fine trade-off for complex queries. If you just want to show the last ten > messages for a user, it is a very bad trade-off and you should probably use > something other than SQL to do this. > > The modern trend for data oriented web access is to expose a REST interface > that is framed in terms of your business needs. This interface and the > resulting data are manipulated using browser resident Javascript. To make > this efficient, you want a database that has direct Javascript access and, > preferably, one that has a storage module written already for meteor.js or > similar package. > > If you are talking about something like account maintenance where you need > to do complex queries very rarely (say once a month), then SQL is much more > plausible since users are likely to accept a long (1 s or more) delay to > access information. Even here, a good data abstraction in terms of a REST > microservice is likely to be good. > > So... back to your question. > > Let's compute some query rates: > > 100 million users accessing the system once per month mostly during peak > hours will cause 100 million accesses / (30 days * 20,000 peak seconds / > day) = 100e6 / 600e3 < 200 queries per second. > > 100 million users with 10 million active users accessing the system 100 > times per day during peak hours will cause 100 * 10 million queries/day / > 20,000 peak seconds / day = 1e9 / 20e3 = 50e3 queries per second. > > Note that the number of *concurrent* queries does not matter here. For the > first rate, if you have a system that responds in 1ms, there will be > essentially no concurrency whereas with a horrible system that responds in > 10 s will require concurrency of 2000 simultaneous queries. What you want > is throughput at acceptable response time. > > Concurrent users is also clearly almost irrelevant. What you want is usage > patterns x user population to get queries per second. > > So the answer is that the first rate of 200 queries per second could > probably be dealt with using Drill given enough hardware, but a > programmatic interface to even a fairly small scale data base would > probably be much, much better. In general, I expect that a document > oriented database will suit your needs much better than a relational one. > My guess is that not very many analytical query engines like Drill will be > cost effective in this range. Almost every document oriented database will > be able to handle this. > > For the second rate of 50,000 per second, there very few SQL based systems > that will suit your needs, but programmatic access to a document oriented > database is likely to be your only cost effective solution here. My company > (MapR), for instance, makes a database that would likely work, but many > will not work. > > On Sun, Feb 14, 2016 at 4:27 AM, <[email protected]> wrote: > > > hi, i want to build a app which need for support sevaral hundred > > million users realtime query from about ten billion row records. dose > > apache drill fit for this requirement? dose it support High concurrency? > > dose it need mass hardware resource to archive the low latency > > performance? resource exchange performance? i use hbase as database. > > > > > > 李启明 from China >
