Re: hawq peerformance on 10 billion rows table

Martin Visser Fri, 29 Jan 2016 01:30:07 -0800

Hi,

for queries like that there are a couple of functionalities of HAWQ that
will help you.  One is columnar storage like Parquet.  This will help you
when you are only selecting columns a,b,c and the table has columns
a,b,...z   The other functionality that will help you is partitioning to
reduce the initial set without having to read the data.  How to choose the
partitioning will depend on your query patterns and the selectivity of the
column values.  For example in your query you could partition on column a.
But as mentioned if a only had values 1 and 2 that would only half the
number of rows being scanned etc.

Another observation is that you are selecting individual rows in your
example rather than grouped results. Potentially this could result in a lot
of data having to be returned by the query.  Is that the case?  How many
rows would you expect queries to return?

The answer for your 10 seconds is it is certainly possible due to HAWQs
linear scalability but it depends on a number of factors.

hth
Martin

On Fri, Jan 29, 2016 at 5:34 AM, 陶进 <[email protected]> wrote:

> hi guys,
>
> We have several huge tables,and some of the table would more than 10
> billion rows.each table had the same  columns,each row is about 100 Byte.
>
> Our query run on each  singal table to filter and sort some records,such
> as select a,b,c from t where a=1 b='hello' order by 1,2.
>
> Now we use mongodb,and the bigest table had 4 billion rows.it could
> returned in 10 seconds.Now we want to use hawq as our query engine.Could
> they run the above query in 10 seconds?  what the hardware of the
> server?how many node would need?
>
>
> Thanks.
>
> ---
> Avast 防毒软件已对此电子邮件执行病毒检查。
> https://www.avast.com/antivirus
>
>

Re: hawq peerformance on 10 billion rows table

Reply via email to