Hello,

Are there any properties that I can set to improve the performance of
Analyze table compute statistics statement.My data sits in s3 and I see
it's taking one second per file to read the schema of each file from s3.

2018-08-24T03:25:57,525 INFO  [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1
main([])]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(79)) - Reading ORC
rows from s3://file_1
2018-08-24T03:25:57,526 INFO  [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1
main([])]: impl.RecordReaderImpl (RecordReaderImpl.java:<init>(187)) -
Reader schema not provided -- using file schema

2018-08-24T03:25:58,395 INFO  [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1
main([])]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(79)) - Reading ORC
rows from s3://file_2
2018-08-24T03:25:58,395 INFO  [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1
main([])]: impl.RecordReaderImpl (RecordReaderImpl.java:<init>(187)) -
Reader schema not provided -- using file schema

It takes around 80 seconds for 76 files with total size of 23 GB.


2018-08-24T03:27:07,673 INFO  [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1
main([])]: exec.Task (SessionState.java:printInfo(1111)) - Table
dept_data_services.lc_credit_2018_08_20_temp stats: [numFiles=76,
numRows=101341845, totalSize=26500166568, rawDataSize=294491898741]
2018-08-24T03:27:07,673 INFO  [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1
main([])]: ql.Driver (Driver.java:execute(2050)) - Completed executing
command(queryId=lcapp_20180824032545_aba21e71-ea4b-4214-8793-705a5e0367f0);
Time taken: 81.169 seconds
2018-08-24T03:27:07,674 INFO  [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1
main([])]: ql.Driver (SessionState.java:printInfo(1111)) - OK
2018-08-24T03:27:07,681 INFO  [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1
main([])]: CliDriver (SessionState.java:printInfo(1111)) - Time taken:
81.992 seconds

If I run the same command with few columns then the query runs 60%
faster.Is there any property that I can modify to reduce the time taken for
this read?

Regards
Prabhakar Reddy

Reply via email to