Your answer doesn't really quite provide enough information to permit an answer.
First, number of users doesn't really tell us how many queries will happen in real time. What you need there is the number of queries per second. That number can vary for the same number of users by a factor of 1000 quite easily. Second, querying 10 billion rows can take less than a millisecond for the query itself. Or minutes. What kind of query to you mean to do? If you are talking about something like messaging and personalization which high usage per day and very fast response requirements, then Drill is likely to be inappropriate almost by definition. The problem is that Drill spends a lot of time (100 ms or more) thinking about how to execute the query on the theory that most queries in Drill will be complex enough that this planning will allow savings of seconds or tens of seconds. That is a fine trade-off for complex queries. If you just want to show the last ten messages for a user, it is a very bad trade-off and you should probably use something other than SQL to do this. The modern trend for data oriented web access is to expose a REST interface that is framed in terms of your business needs. This interface and the resulting data are manipulated using browser resident Javascript. To make this efficient, you want a database that has direct Javascript access and, preferably, one that has a storage module written already for meteor.js or similar package. If you are talking about something like account maintenance where you need to do complex queries very rarely (say once a month), then SQL is much more plausible since users are likely to accept a long (1 s or more) delay to access information. Even here, a good data abstraction in terms of a REST microservice is likely to be good. So... back to your question. Let's compute some query rates: 100 million users accessing the system once per month mostly during peak hours will cause 100 million accesses / (30 days * 20,000 peak seconds / day) = 100e6 / 600e3 < 200 queries per second. 100 million users with 10 million active users accessing the system 100 times per day during peak hours will cause 100 * 10 million queries/day / 20,000 peak seconds / day = 1e9 / 20e3 = 50e3 queries per second. Note that the number of *concurrent* queries does not matter here. For the first rate, if you have a system that responds in 1ms, there will be essentially no concurrency whereas with a horrible system that responds in 10 s will require concurrency of 2000 simultaneous queries. What you want is throughput at acceptable response time. Concurrent users is also clearly almost irrelevant. What you want is usage patterns x user population to get queries per second. So the answer is that the first rate of 200 queries per second could probably be dealt with using Drill given enough hardware, but a programmatic interface to even a fairly small scale data base would probably be much, much better. In general, I expect that a document oriented database will suit your needs much better than a relational one. My guess is that not very many analytical query engines like Drill will be cost effective in this range. Almost every document oriented database will be able to handle this. For the second rate of 50,000 per second, there very few SQL based systems that will suit your needs, but programmatic access to a document oriented database is likely to be your only cost effective solution here. My company (MapR), for instance, makes a database that would likely work, but many will not work. On Sun, Feb 14, 2016 at 4:27 AM, <[email protected]> wrote: > hi, i want to build a app which need for support sevaral hundred > million users realtime query from about ten billion row records. dose > apache drill fit for this requirement? dose it support High concurrency? > dose it need mass hardware resource to archive the low latency > performance? resource exchange performance? i use hbase as database. > > > 李启明 from China
