Hi, for queries like that there are a couple of functionalities of HAWQ that will help you. One is columnar storage like Parquet. This will help you when you are only selecting columns a,b,c and the table has columns a,b,...z The other functionality that will help you is partitioning to reduce the initial set without having to read the data. How to choose the partitioning will depend on your query patterns and the selectivity of the column values. For example in your query you could partition on column a. But as mentioned if a only had values 1 and 2 that would only half the number of rows being scanned etc.
Another observation is that you are selecting individual rows in your example rather than grouped results. Potentially this could result in a lot of data having to be returned by the query. Is that the case? How many rows would you expect queries to return? The answer for your 10 seconds is it is certainly possible due to HAWQs linear scalability but it depends on a number of factors. hth Martin On Fri, Jan 29, 2016 at 5:34 AM, 陶进 <[email protected]> wrote: > hi guys, > > We have several huge tables,and some of the table would more than 10 > billion rows.each table had the same columns,each row is about 100 Byte. > > Our query run on each singal table to filter and sort some records,such > as select a,b,c from t where a=1 b='hello' order by 1,2. > > Now we use mongodb,and the bigest table had 4 billion rows.it could > returned in 10 seconds.Now we want to use hawq as our query engine.Could > they run the above query in 10 seconds? what the hardware of the > server?how many node would need? > > > Thanks. > > --- > Avast 防毒软件已对此电子邮件执行病毒检查。 > https://www.avast.com/antivirus > >
