billions with 'B': looks like MongoDB is web-scale after all! On Fri, Jan 29, 2016 at 11:59AM, Alexey Grishchenko wrote: > The main thing to consider for you is that HAWQ does not have indexes. So > the only way to limit the amount of data it scans is to use partitioning + > columnar tables (Parquet) > In contrast, Greenplum has indexes, and if your query returns 100s of > records from 10'000'000'000 rows table it might be a good thing for you. > But you should be careful here - if you have "where" conditions on > different columns you might end up building many indexes, which would lead > you to the situation where index size for the table is greater than the > size of its data > > On Fri, Jan 29, 2016 at 10:26 AM, 陶进 <[email protected]> wrote: > > > hi Martin, > > > > Mnay thanks for your kindly help. > > > > I could find little performance case of greenplum/hawq on > > google,especialy on 10 billion row data.Your replying inspire confidence in > > me. :-) > > > > our real-time query only returns hundreds of row from a huge table. I'll > > test and tuning HAWQ after our machines are avaliable to approve the > > performance. > > > > Thank you again for your promptly repling. > > > > > > Best regards! > > > > Tony. > > > > > > > > 在 2016/1/29 17:29, Martin Visser 写道: > > > > Hi, > > > > for queries like that there are a couple of functionalities of HAWQ that > > will help you. One is columnar storage like Parquet. This will help you > > when you are only selecting columns a,b,c and the table has columns > > a,b,...z The other functionality that will help you is partitioning to > > reduce the initial set without having to read the data. How to choose the > > partitioning will depend on your query patterns and the selectivity of the > > column values. For example in your query you could partition on column a. > > But as mentioned if a only had values 1 and 2 that would only half the > > number of rows being scanned etc. > > > > Another observation is that you are selecting individual rows in your > > example rather than grouped results. Potentially this could result in a lot > > of data having to be returned by the query. Is that the case? How many > > rows would you expect queries to return? > > > > The answer for your 10 seconds is it is certainly possible due to HAWQs > > linear scalability but it depends on a number of factors. > > > > hth > > Martin > > > > On Fri, Jan 29, 2016 at 5:34 AM, 陶进 <[email protected]> wrote: > > > >> hi guys, > >> > >> We have several huge tables,and some of the table would more than 10 > >> billion rows.each table had the same columns,each row is about 100 Byte. > >> > >> Our query run on each singal table to filter and sort some records,such > >> as select a,b,c from t where a=1 b='hello' order by 1,2. > >> > >> Now we use mongodb,and the bigest table had 4 billion rows.it could > >> returned in 10 seconds.Now we want to use hawq as our query engine.Could > >> they run the above query in 10 seconds? what the hardware of the > >> server?how many node would need? > >> > >> > >> Thanks. > >> > >> --- > >> Avast 防毒软件已对此电子邮件执行病毒检查。 > >> https://www.avast.com/antivirus > >> > >> > > > > > > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> > > 受 > > Avast 保护的无病毒计算机已发送该电子邮件。 > > www.avast.com > > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> > > > > > > -- > Best regards, > Alexey Grishchenko
signature.asc
Description: Digital signature
