Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet files ......................................................................
IMPALA-3989: Display skew warning for poorly formatted Parquet files Parquet files are scanned in the granularity of row groups. Each row group belongs to one or more splits and each split is scanned by a separate parquet scanner. If some row groups span multiple splits, then we will most likely end up seeing some scanners having remote reads and some scanners not performing scans at all. This will attribute to skew across the cluster where distribution of scans is uneven. This change adds a counter (NumScannersWithNoReads) to the scan node's runtime profile to track the number of parquet scanners that end up doing no reads becuse of poor formatting. Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a Reviewed-on: http://gerrit.cloudera.org:8080/5400 Reviewed-by: Dan Hecht <[email protected]> Tested-by: Impala Public Jenkins --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M tests/query_test/test_scanners.py 3 files changed, 107 insertions(+), 9 deletions(-) Approvals: Impala Public Jenkins: Verified Dan Hecht: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
