[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/10450 ) Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1410 PS2, Line 1410: partitionsToUpdateFileMdByPath = getPartitionsByPath(partitionsToUpdate); : loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, true); > The new change behaves very much like the old code except for the case when I don't think so. In line 1404, the dirtyPartitions are all added to partitionsToRemove. When dirtyPartitions exist, partitionsToRemove won't be empty. Thus the if-branch won't be chosen and the else-branch performs the same as the old codes. -- To view, visit http://gerrit.cloudera.org:8080/10450 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 10450 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sun, 20 May 2018 01:27:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10440 ) Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify state .. Patch Set 4: This change did not cherrypick successfully into branch 2.x. To resolve this, please do the cherry-pick manually and submit it to Gerrit at refs/for/2.x or add an exception to the branch 2.x copy of bin/ignored_commits.json. Thanks, your friendly bot at https://jenkins.impala.io/job/cherrypick-2.x-and-test/520/ . -- To view, visit http://gerrit.cloudera.org:8080/10440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c Gerrit-Change-Number: 10440 Gerrit-PatchSet: 4 Gerrit-Owner: Dan HechtGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 19 May 2018 23:10:21 + Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-6941: load more text scanner compression plugins
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10462 ) Change subject: IMPALA-6941: load more text scanner compression plugins .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/10462 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: comment Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Gerrit-Change-Number: 10462 Gerrit-PatchSet: 1 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 19 May 2018 23:08:32 + Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-6941: load more text scanner compression plugins
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10462 ) Change subject: IMPALA-6941: load more text scanner compression plugins .. IMPALA-6941: load more text scanner compression plugins Add extensions for LZ4 and ZSTD (which are supported by Hadoop). Even without a plugin this results in better behaviour because we don't try to treat the files with unknown extensions as uncompressed text. Also allow loading tables containing files with unsupported compression types. There was weird behaviour before we knew of the file extension but didn't support querying the table - the catalog would load the table but the impalad would fail processing the catalog update. The simplest way to fix it is to just allow loading the tables. Similarly, make the "LOAD DATA" operation more permissive - we can copy files into a directory even if we can't decompress them. Switch to always checking plugin version - running mismatched plugin is inherently unsafe. Testing: Positive case where LZO is loaded is exercised. Added coverage for negative case where LZO is disabled. Fixed test gaps: * Querying LZO table with LZO plugin not available. * Interacting with tables with known but unsupported text compressions. * Querying files with unknown compression suffixes (which are treated as uncompressed text). Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Reviewed-on: http://gerrit.cloudera.org:8080/10165 Reviewed-by: Tim ArmstrongTested-by: Impala Public Jenkins Reviewed-on: http://gerrit.cloudera.org:8080/10462 --- M be/src/exec/CMakeLists.txt D be/src/exec/hdfs-lzo-text-scanner.cc D be/src/exec/hdfs-lzo-text-scanner.h A be/src/exec/hdfs-plugin-text-scanner.cc A be/src/exec/hdfs-plugin-text-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M common/fbs/CatalogObjects.fbs M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java A testdata/workloads/functional-query/queries/QueryTest/disable-lzo-plugin.test A testdata/workloads/functional-query/queries/QueryTest/unsupported-compression-partitions.test A tests/custom_cluster/test_scanner_plugin.py M tests/metadata/test_partition_metadata.py 17 files changed, 459 insertions(+), 280 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/10462 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: merged Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Gerrit-Change-Number: 10462 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5931: Generates scan ranges in planner for s3/adls
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8523 ) Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls .. Patch Set 14: running more tests... -- To view, visit http://gerrit.cloudera.org:8080/8523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 Gerrit-Change-Number: 8523 Gerrit-PatchSet: 14 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Sat, 19 May 2018 20:26:10 + Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-6941: load more text scanner compression plugins
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10462 ) Change subject: IMPALA-6941: load more text scanner compression plugins .. Patch Set 1: Code-Review+2 resolve conflict around startup flags. -- To view, visit http://gerrit.cloudera.org:8080/10462 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: comment Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Gerrit-Change-Number: 10462 Gerrit-PatchSet: 1 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 19 May 2018 19:25:55 + Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-6941: load more text scanner compression plugins
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10462 ) Change subject: IMPALA-6941: load more text scanner compression plugins .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2513/ -- To view, visit http://gerrit.cloudera.org:8080/10462 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: comment Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Gerrit-Change-Number: 10462 Gerrit-PatchSet: 1 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 19 May 2018 19:26:03 + Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-6941: load more text scanner compression plugins
Hello Impala Public Jenkins, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/10462 to review the following change. Change subject: IMPALA-6941: load more text scanner compression plugins .. IMPALA-6941: load more text scanner compression plugins Add extensions for LZ4 and ZSTD (which are supported by Hadoop). Even without a plugin this results in better behaviour because we don't try to treat the files with unknown extensions as uncompressed text. Also allow loading tables containing files with unsupported compression types. There was weird behaviour before we knew of the file extension but didn't support querying the table - the catalog would load the table but the impalad would fail processing the catalog update. The simplest way to fix it is to just allow loading the tables. Similarly, make the "LOAD DATA" operation more permissive - we can copy files into a directory even if we can't decompress them. Switch to always checking plugin version - running mismatched plugin is inherently unsafe. Testing: Positive case where LZO is loaded is exercised. Added coverage for negative case where LZO is disabled. Fixed test gaps: * Querying LZO table with LZO plugin not available. * Interacting with tables with known but unsupported text compressions. * Querying files with unknown compression suffixes (which are treated as uncompressed text). Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Reviewed-on: http://gerrit.cloudera.org:8080/10165 Reviewed-by: Tim ArmstrongTested-by: Impala Public Jenkins --- M be/src/exec/CMakeLists.txt D be/src/exec/hdfs-lzo-text-scanner.cc D be/src/exec/hdfs-lzo-text-scanner.h A be/src/exec/hdfs-plugin-text-scanner.cc A be/src/exec/hdfs-plugin-text-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M common/fbs/CatalogObjects.fbs M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java A testdata/workloads/functional-query/queries/QueryTest/disable-lzo-plugin.test A testdata/workloads/functional-query/queries/QueryTest/unsupported-compression-partitions.test A tests/custom_cluster/test_scanner_plugin.py M tests/metadata/test_partition_metadata.py 17 files changed, 459 insertions(+), 280 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/10462/1 -- To view, visit http://gerrit.cloudera.org:8080/10462 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: newchange Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Gerrit-Change-Number: 10462 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/10450 ) Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations. .. Patch Set 2: > > (2 comments) > > > > Just interested in this optimization. May I ask some questions? > > > > Looks like we are optimizing the case when partitionsToUpdate != > > null and partitions were neither dropped, created. Can we > optimize > > the case that partitionsToUpdate != null and some partitions are > > dropped? For example when an INSERT OVERWRITE statement updates > the > > majority of the partitions and only drops few of them. > > Will this case not introduce inconsistency between HMS and Impala? The problem with optimization is that introduces inconsistency, something which my change introduces too, when ALTER TABLE is done and HMS crashes. -- To view, visit http://gerrit.cloudera.org:8080/10450 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 10450 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 19 May 2018 18:18:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/10450 ) Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations. .. Patch Set 2: > (2 comments) > > Just interested in this optimization. May I ask some questions? > > Looks like we are optimizing the case when partitionsToUpdate != > null and partitions were neither dropped, created. Can we optimize > the case that partitionsToUpdate != null and some partitions are > dropped? For example when an INSERT OVERWRITE statement updates the > majority of the partitions and only drops few of them. Will this case not introduce inconsistency between HMS and Impala? -- To view, visit http://gerrit.cloudera.org:8080/10450 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 10450 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 19 May 2018 18:15:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.
Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/10450 ) Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations. .. Patch Set 2: (2 comments) > (2 comments) > > Just interested in this optimization. May I ask some questions? > > Looks like we are optimizing the case when partitionsToUpdate != > null and partitions were neither dropped, created. Can we optimize > the case that partitionsToUpdate != null and some partitions are > dropped? For example when an INSERT OVERWRITE statement updates the > majority of the partitions and only drops few of them. Will this case not cause introduce inconsistency between HMS and Impala ? http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1409 PS2, Line 1409: size() == 0 > nit: can be simplified by isEmpty() OK http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1410 PS2, Line 1410: partitionsToUpdateFileMdByPath = getPartitionsByPath(partitionsToUpdate); : loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, true); > Looks like the original codes perform the same as these two lines. Since dr The new change behaves very much like the old code except for the case when dirtyPartitions exist in that case there is an overhead of dropping and loading the dirty partitions from Metastore. -- To view, visit http://gerrit.cloudera.org:8080/10450 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 10450 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 19 May 2018 18:14:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6994: Avoid reloading a table's HMS data for file-only operations.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/10450 ) Change subject: IMPALA-6994: Avoid reloading a table's HMS data for file-only operations. .. Patch Set 2: (2 comments) Just interested in this optimization. May I ask some questions? Looks like we are optimizing the case when partitionsToUpdate != null and partitions were neither dropped, created. Can we optimize the case that partitionsToUpdate != null and some partitions are dropped? For example when an INSERT OVERWRITE statement updates the majority of the partitions and only drops few of them. http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1409 PS2, Line 1409: size() == 0 nit: can be simplified by isEmpty() http://gerrit.cloudera.org:8080/#/c/10450/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1410 PS2, Line 1410: partitionsToUpdateFileMdByPath = getPartitionsByPath(partitionsToUpdate); : loadMetadataAndDiskIds(partitionsToUpdateFileMdByPath, true); Looks like the original codes perform the same as these two lines. Since dropPartitions, loadPartitionsFromMetastore and loadPartitionsFromMetastore will return fast when their first parameter is empty. partitionExist is not empty means partitionsToUpdate != null, so the original codes will perform as these two lines. My question is what are the HMS requests we save? -- To view, visit http://gerrit.cloudera.org:8080/10450 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaabdf38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 10450 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 19 May 2018 13:17:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5706: Spilling sort optimisations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9943 ) Change subject: IMPALA-5706: Spilling sort optimisations .. Patch Set 13: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2512/ -- To view, visit http://gerrit.cloudera.org:8080/9943 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9 Gerrit-Change-Number: 9943 Gerrit-PatchSet: 13 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 19 May 2018 10:18:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5706: Spilling sort optimisations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9943 ) Change subject: IMPALA-5706: Spilling sort optimisations .. Patch Set 13: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2512/ -- To view, visit http://gerrit.cloudera.org:8080/9943 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74857c1694802e81f1cfc765d2b4e8bc644387f9 Gerrit-Change-Number: 9943 Gerrit-PatchSet: 13 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 19 May 2018 07:18:38 + Gerrit-HasComments: No