Hello, Are there any properties that I can set to improve the performance of Analyze table compute statistics statement.My data sits in s3 and I see it's taking one second per file to read the schema of each file from s3.
2018-08-24T03:25:57,525 INFO [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 main([])]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(79)) - Reading ORC rows from s3://file_1 2018-08-24T03:25:57,526 INFO [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 main([])]: impl.RecordReaderImpl (RecordReaderImpl.java:<init>(187)) - Reader schema not provided -- using file schema 2018-08-24T03:25:58,395 INFO [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 main([])]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(79)) - Reading ORC rows from s3://file_2 2018-08-24T03:25:58,395 INFO [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 main([])]: impl.RecordReaderImpl (RecordReaderImpl.java:<init>(187)) - Reader schema not provided -- using file schema It takes around 80 seconds for 76 files with total size of 23 GB. 2018-08-24T03:27:07,673 INFO [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 main([])]: exec.Task (SessionState.java:printInfo(1111)) - Table dept_data_services.lc_credit_2018_08_20_temp stats: [numFiles=76, numRows=101341845, totalSize=26500166568, rawDataSize=294491898741] 2018-08-24T03:27:07,673 INFO [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 main([])]: ql.Driver (Driver.java:execute(2050)) - Completed executing command(queryId=lcapp_20180824032545_aba21e71-ea4b-4214-8793-705a5e0367f0); Time taken: 81.169 seconds 2018-08-24T03:27:07,674 INFO [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 main([])]: ql.Driver (SessionState.java:printInfo(1111)) - OK 2018-08-24T03:27:07,681 INFO [2b2c0a06-7da5-4fcd-83a7-4931b8e1b4b1 main([])]: CliDriver (SessionState.java:printInfo(1111)) - Time taken: 81.992 seconds If I run the same command with few columns then the query runs 60% faster.Is there any property that I can modify to reduce the time taken for this read? Regards Prabhakar Reddy