Re: Problem on carbondata quering performance tuning

2018-04-02 Thread BabuLal
Hi 

Thanks for using Carbondata.

Based on Information you provided , Please try below solutions /Points.

*A.  Tune Resource Allocation *

   You have 55 core/NM , and given spark.executor.cores= 54 which means
one NM will have only one Executor and total you will have only 4 Executor
even you have given spark.executor.instances 10 .  
For Query Execution we need to have more Executor .
Cluster Capacity :- 
Total NM=4
Core/NM=55
Memory/NM=102

Ideally(most of the case) per Executor 12-15 GB memory  enough .Based on
this we can open 6 Executors in one NM ( 102/15) So according to this you
can configure below parameter and try again

spark.executor.memory 15g
spark.executor.cores 9
spark.executor.instances 24 

Please make sure that Yarn RM shows these 24 containers running(Excluding AM
container).


*B. Table Optimization *
1. Out of 5 table one table yuan_yuan10_STORE_SALES  is Big table having
~1.4 Billion Records  and it has  columns
SS_SOLD_DATE_SK,SS_ITEM_SK,SS_CUSTOMER_SK as DICTIONARY_INCLUDE , is any of
the column is High cardinality columns ? for High cardinality columns better
to have DICTIONARY_EXCLUDE you can check size of Metedata Folder in carbon
store location. 

2.  ss_sold_date_sk has between filter ,so better to have Int data type of
it.

*C.  Information For Next Analysis *

Please provide below detail 
1. Can you check SparkUI and check how much time CarbonScan RDD Stage has
taken and how much time Aggregate Stage taken ? You can Check DAG . Or send
spark event files or SparkUI snapshot . 
2.  How many task for each Stage ? 
3.  In Driver How much time spend between Parsing and below statement 
  18/04/01 20:49:01 INFO CarbonScanRDD: 
 Identified no.of.blocks: 1,
 no.of.tasks: 1,
 no.of.nodes: 0,
 parallelism: 1
 
4. Configure enable.query.statistics=true in carbon.proeprties and
Send/Analyze the Time taken by Carbon in executor side.(like time spend in
IO/Dictionary load..) 


For Data loading :- If data are loading with Local Sort then your
configuration is correct (1 Node ,1 Executor) 

Please check with Solution A. it may solve issue, if still exists then
provide requested Information in PointC .

Thanks
Babu



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Problem on carbondata quering performance tuning

2018-04-02 Thread Liang Chen
HI

Which carbondata+spark version?   and can you provide the full configuration
inside "carbondata.properties"


Mick Yuan wrote
> Hi,all
>   I have a quering performane tuning case on carbondata.
> 
>   *Enviroment is as below:*:
>   spark on yarn 
>   4 nodemanagers
>   102G,55 cores each nodemanager
> 
>   *Spark properties:*
> spark.master yarn
> spark.submit.deployMode client
> spark.serializer org.apache.spark.serializer.KryoSerializer
> spark.driver.memory 2g
> spark.executor.memory 90g
> spark.executor.cores 54
> spark.sql.hive.convertmetastoreParquet false
> spark.executor.instances 10
> spark.sql.shuffle.partitions 432
> spark.speculation true
>   *Carbondata properties:*
> carbon.storelocation=hdfs://hacluster/Opt/CarbonStore
> carbon.ddl.base.hdfs.url=hdfs://hacluster/opt/data
> carbon.badRecords.location=/opt/Carbon/Spark/badrecords
> carbon.sort.file.buffer.size=80
> carbon.graph.rowset.size=10
> carbon.number.of.cores.while.loading=24
> carbon.sort.size=100
> carbon.enableXXHash=true
> carbon.number.of.cores.while.alterPartition=2
> carbon.number.of.cores.while.compacting=2
> carbon.compaction.level.threshold=4,3
> carbon.major.compaction.size=1024
> carbon.number.of.cores=4
> carbon.inmemory.record.size=12
> 
>   *Create table sql:*
> CREATE TABLE
> yuan_yuan10_DATE_DIM
> (
> D_YEAR string,
> D_MOY string,
> D_DATE_SK string,
> D_DATE_ID string,
> D_DATE string,
> D_MONTH_SEQ string,
> D_WEEK_SEQ string,
> D_QUARTER_SEQ string,
> D_DOW string,
> D_DOM string,
> D_QOY string,
> D_FY_YEAR string,
> D_FY_QUARTER_SEQ string,
> D_FY_WEEK_SEQ string,
> D_DAY_NAME string,
> D_QUARTER_NAME string,
> D_HOLIDAY string,
> D_WEEKEND string,
> D_FOLLOWING_HOLIDAY string,
> D_FIRST_DOM string,
> D_LAST_DOM string,
> D_SAME_DAY_LY string,
> D_SAME_DAY_LQ string,
> D_CURRENT_DAY string,
> D_CURRENT_WEEK string,
> D_CURRENT_MONTH string,
> D_CURRENT_QUARTER string,
> D_CURRENT_YEAR string,
> REMARK string,
> HYREN_S_DATE string,
> HYREN_E_DATE string,
> HYREN_MD5_VAL string
> )
> stored BY 'carbondata' TBLPROPERTIES
> (
> 'TABLE_BLOCKSIZE'='256',
> 'SORT_COLUMNS'='D_YEAR,D_MOY,D_DATE_SK',
> 'DICTIONARY_INCLUDE'='D_MOY,D_YEAR,D_DATE_SK'
> );
> 
> 
> 
> CREATE TABLE
> yuan_yuan10_STORE_SALES
> (
> SS_SOLD_DATE_SK string,
> SS_ITEM_SK string,
> SS_CUSTOMER_SK string,
> SS_STORE_SK string,
> SS_SOLD_TIME_SK string,
> SS_CDEMO_SK string,
> SS_HDEMO_SK string,
> SS_ADDR_SK string,
> SS_PROMO_SK string,
> SS_TICKET_NUMBER string,
> SS_QUANTITY string,
> SS_WHOLESALE_COST string,
> SS_LIST_PRICE string,
> SS_SALES_PRICE string,
> SS_EXT_DISCOUNT_AMT string,
> SS_EXT_SALES_PRICE string,
> SS_EXT_WHOLESALE_COST string,
> SS_EXT_LIST_PRICE string,
> SS_EXT_TAX string,
> SS_COUPON_AMT string,
> SS_NET_PAID string,
> SS_NET_PAID_INC_TAX string,
> SS_NET_PROFIT string,
> REMARK string,
> HYREN_S_DATE string,
> HYREN_E_DATE string,
> HYREN_MD5_VAL string
> )
> stored BY 'carbondata' TBLPROPERTIES
> (
> 'TABLE_BLOCKSIZE'='256',
> 'SORT_COLUMNS'='SS_SOLD_DATE_SK,SS_ITEM_SK,SS_CUSTOMER_SK',
> 'DICTIONARY_INCLUDE'='SS_SOLD_DATE_SK,SS_ITEM_SK,SS_CUSTOMER_SK'
> );
> 
> 
> CREATE TABLE
> yuan_yuan10_ITEM
> (
> I_MANAGER_ID string,
> I_ITEM_SK string,
> I_ITEM_ID string,
> I_REC_START_DATE string,
> I_REC_END_DATE string,
> I_ITEM_DESC string,
> I_CURRENT_PRICE string,
> I_WHOLESALE_COST string,
> I_BRAND_ID string,
> I_BRAND string,
> I_CLASS_ID string,
> I_CLASS string,
> I_CATEGORY_ID string,
> I_CATEGORY string,
> I_MANUFACT_ID string,
> I_MANUFACT string,
> I_SIZE string,
> I_FORMULATION string,
> I_COLOR string,
> I_UNITS string,
> I_CONTAINER string,
> I_PRODUCT_NAME string,
> REMARK string,
> HYREN_S_DATE string,
> HYREN_E_DATE string,
> HYREN_MD5_VAL string
> )
> stored BY 'carbondata' TBLPROPERTIES
> (
> 'TABLE_BLOCKSIZE'='256',
> 'SORT_COLUMNS'='I_MANAGER_ID,I_ITEM_SK',
> 'DICTIONARY_INCLUDE'='I_MANAGER_ID,I_ITEM_SK'
> );
> 
> 
> CREATE TABLE
> yuan_yuan10_CUSTOMER
> (
> C_CURRENT_ADDR_SK string,
> C_CUSTOMER_SK string,
> C_CUSTOMER_ID string,
> C_CURRENT_CDEMO_SK string,
> C_CURRENT_HDEMO_SK string,
> C_FIRST_SHIPTO_DATE_SK string,
> 

Problem on carbondata quering performance tuning

2018-04-02 Thread Mick Yuan
Hi,all
  I have a quering performane tuning case on carbondata.

  *Enviroment is as below:*:
  spark on yarn 
  4 nodemanagers
  102G,55 cores each nodemanager

  *Spark properties:*
spark.master yarn
spark.submit.deployMode client
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory 2g
spark.executor.memory 90g
spark.executor.cores 54
spark.sql.hive.convertmetastoreParquet false
spark.executor.instances 10
spark.sql.shuffle.partitions 432
spark.speculation true
  *Carbondata properties:*
carbon.storelocation=hdfs://hacluster/Opt/CarbonStore
carbon.ddl.base.hdfs.url=hdfs://hacluster/opt/data
carbon.badRecords.location=/opt/Carbon/Spark/badrecords
carbon.sort.file.buffer.size=80
carbon.graph.rowset.size=10
carbon.number.of.cores.while.loading=24
carbon.sort.size=100
carbon.enableXXHash=true
carbon.number.of.cores.while.alterPartition=2
carbon.number.of.cores.while.compacting=2
carbon.compaction.level.threshold=4,3
carbon.major.compaction.size=1024
carbon.number.of.cores=4
carbon.inmemory.record.size=12

  *Create table sql:*
CREATE TABLE
yuan_yuan10_DATE_DIM
(
D_YEAR string,
D_MOY string,
D_DATE_SK string,
D_DATE_ID string,
D_DATE string,
D_MONTH_SEQ string,
D_WEEK_SEQ string,
D_QUARTER_SEQ string,
D_DOW string,
D_DOM string,
D_QOY string,
D_FY_YEAR string,
D_FY_QUARTER_SEQ string,
D_FY_WEEK_SEQ string,
D_DAY_NAME string,
D_QUARTER_NAME string,
D_HOLIDAY string,
D_WEEKEND string,
D_FOLLOWING_HOLIDAY string,
D_FIRST_DOM string,
D_LAST_DOM string,
D_SAME_DAY_LY string,
D_SAME_DAY_LQ string,
D_CURRENT_DAY string,
D_CURRENT_WEEK string,
D_CURRENT_MONTH string,
D_CURRENT_QUARTER string,
D_CURRENT_YEAR string,
REMARK string,
HYREN_S_DATE string,
HYREN_E_DATE string,
HYREN_MD5_VAL string
)
stored BY 'carbondata' TBLPROPERTIES
(
'TABLE_BLOCKSIZE'='256',
'SORT_COLUMNS'='D_YEAR,D_MOY,D_DATE_SK',
'DICTIONARY_INCLUDE'='D_MOY,D_YEAR,D_DATE_SK'
);



CREATE TABLE
yuan_yuan10_STORE_SALES
(
SS_SOLD_DATE_SK string,
SS_ITEM_SK string,
SS_CUSTOMER_SK string,
SS_STORE_SK string,
SS_SOLD_TIME_SK string,
SS_CDEMO_SK string,
SS_HDEMO_SK string,
SS_ADDR_SK string,
SS_PROMO_SK string,
SS_TICKET_NUMBER string,
SS_QUANTITY string,
SS_WHOLESALE_COST string,
SS_LIST_PRICE string,
SS_SALES_PRICE string,
SS_EXT_DISCOUNT_AMT string,
SS_EXT_SALES_PRICE string,
SS_EXT_WHOLESALE_COST string,
SS_EXT_LIST_PRICE string,
SS_EXT_TAX string,
SS_COUPON_AMT string,
SS_NET_PAID string,
SS_NET_PAID_INC_TAX string,
SS_NET_PROFIT string,
REMARK string,
HYREN_S_DATE string,
HYREN_E_DATE string,
HYREN_MD5_VAL string
)
stored BY 'carbondata' TBLPROPERTIES
(
'TABLE_BLOCKSIZE'='256',
'SORT_COLUMNS'='SS_SOLD_DATE_SK,SS_ITEM_SK,SS_CUSTOMER_SK',
'DICTIONARY_INCLUDE'='SS_SOLD_DATE_SK,SS_ITEM_SK,SS_CUSTOMER_SK'
);


CREATE TABLE
yuan_yuan10_ITEM
(
I_MANAGER_ID string,
I_ITEM_SK string,
I_ITEM_ID string,
I_REC_START_DATE string,
I_REC_END_DATE string,
I_ITEM_DESC string,
I_CURRENT_PRICE string,
I_WHOLESALE_COST string,
I_BRAND_ID string,
I_BRAND string,
I_CLASS_ID string,
I_CLASS string,
I_CATEGORY_ID string,
I_CATEGORY string,
I_MANUFACT_ID string,
I_MANUFACT string,
I_SIZE string,
I_FORMULATION string,
I_COLOR string,
I_UNITS string,
I_CONTAINER string,
I_PRODUCT_NAME string,
REMARK string,
HYREN_S_DATE string,
HYREN_E_DATE string,
HYREN_MD5_VAL string
)
stored BY 'carbondata' TBLPROPERTIES
(
'TABLE_BLOCKSIZE'='256',
'SORT_COLUMNS'='I_MANAGER_ID,I_ITEM_SK',
'DICTIONARY_INCLUDE'='I_MANAGER_ID,I_ITEM_SK'
);


CREATE TABLE
yuan_yuan10_CUSTOMER
(
C_CURRENT_ADDR_SK string,
C_CUSTOMER_SK string,
C_CUSTOMER_ID string,
C_CURRENT_CDEMO_SK string,
C_CURRENT_HDEMO_SK string,
C_FIRST_SHIPTO_DATE_SK string,
C_FIRST_SALES_DATE_SK string,
C_SALUTATION string,
C_FIRST_NAME string,
C_LAST_NAME string,
C_PREFERRED_CUST_FLAG string,
C_BIRTH_DAY string,
C_BIRTH_MONTH string,
C_BIRTH_YEAR string,
C_BIRTH_COUNTRY string,
C_LOGIN string,
C_EMAIL_ADDRESS string,
C_LAST_REVIEW_DATE string,
REMRK string,
HYREN_S_DATE string,
HYREN_E_DATE string,
HYREN_MD5_VAL