Thanks Eric and Qifan. I am sorry to reply after so long a time because I'm on the Chinese mid-autumn festival holiday. According to Qifan's advice, I upload a log which contains the DDL and the query plan hope to get more advice.
and to Eric, I summit a jira about the block-encoding and the compression, https://issues.apache.org/jira/browse/TRAFODION-2195, so I only use the hbase compression. Qifan Chen <qifan.c...@esgyn.com>于2016年9月12日周一 下午10:43写道: > Hi Qiao, > > You can also send us the DDL and the query plan to help with the tuning. > > To generate a query plan, do the following from sqlci, and the plan is in > text file mylog. > > > 1. log mylog clear; > 2. prepare xx from <your query>; > 3. explain xx; > 4. explain options 'f' xx; > 5. exit; > > > Thanks --Qifan > > On Mon, Sep 12, 2016 at 8:21 AM, Eric Owhadi <eric.owh...@esgyn.com> > wrote: > >> Hello Qiao, >> When you say whatever the table ddl it limits things a lot, as table ddl >> will help define several things that will drastically improve the degree >> of >> parallelism and the table size. >> >> The DOP (degree of parallelism) of the scan operator is constrained by the >> number of regions your table uses. So if you want to increase DOP, you >> need >> to partition your table using the syntax like: >> create table customer_demographics_salt >> ( >> cd_demo_sk int not null >> , cd_gender char(1) >> , cd_marital_status char(1) >> , cd_education_status char(20) >> , cd_purchase_estimate int >> , cd_credit_rating char(10) >> , cd_dep_count int >> , cd_dep_employed_count int >> , cd_dep_college_count int >> , primary key (cd_demo_sk) >> ) >> salt using 12 partitions >> ATTRIBUTES ALIGNED FORMAT >> HBASE_OPTIONS >> ( >> DATA_BLOCK_ENCODING = 'FAST_DIFF', >> COMPRESSION = 'SNAPPY' >> ); >> >> you can experiment with different values of number of partitions (but >> pick a >> multiple of 3 since you have 3 nodes). >> >> Then the optimizer will pick the DOP with a compromise of resource usage >> vs >> gain in speed. >> If you want to force higher DOP than what optimizer selected, you can use >> : >> CQD parallel_num_esps '12'; >> To force it to one ESP per partition (assuming you picked 12 partitions). >> You can verify what optimizer picked as DOP by doing an explain on the >> query. >> >> Other important factors plays in performance: >> - use of aligned format (see above example) >> - careful choice of the primary key (bad idea to use a varchar with big >> max >> size) >> - It is good idea to use compression and encoding (see the most common >> options we use above) >> - you can also increase the HBASE_OPTIONS BLOCKSIZE parameter, but there >> is >> a drawback: increasing it will increase performance of SCAN but decrease >> performance of keyed access. That is why I did not include it in the >> example >> above. >> >> Hope this helps, >> Regards, >> Eric >> >> >> -----Original Message----- >> From: 乔彦克 [mailto:qya...@gmail.com] >> Sent: Monday, September 12, 2016 1:22 AM >> To: user@trafodion.incubator.apache.org; >> d...@trafodion.incubator.apache.org >> Subject: trafodion query optimization >> >> Hi all, >> I executed the sum and count query on my table where the cluster has >> three nodes. I found that the sum query is not well parallel executed(not >> all the three nodes get high load when executing the sum query) and the >> cpu >> load is very high while the memory load is very low(the machines have 16 >> cores and 16GB memory). My sum query on the 12 million data sets takes >> about >> 2 minutes and a half time. >> So my question is that is there any optimization advice that I can use >> to improve the query performance and maximize the usage of my machines, >> what >> ever the configuration or the table ddl. >> Any replies is appreciated. >> >> Thanks, >> Qiao >> > > > > -- > Regards, --Qifan > >
mylog
Description: Binary data