[Impala-ASF-CR] IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17171 ) Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8340/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea Gerrit-Change-Number: 17171 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 11 Mar 2021 07:57:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10367: Impala-shell internal error - UnboundLocalError, local variable 'retry msg' referenced before assign
Abhishek Rawat has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17172 Change subject: IMPALA-10367: Impala-shell internal error - UnboundLocalError, local variable 'retry_msg' referenced before assign .. IMPALA-10367: Impala-shell internal error - UnboundLocalError, local variable 'retry_msg' referenced before assign ImpalaHS2Client._open_session() has a 'retry_msg' variable which was not initialized in the code-path where retry was disabled. If an exception was hit with retry disabled, a compile time error was generated. The fix is to initialize 'retry_msg' in the non retry code-path. Testing: - Forced exception in ImpalaHS2Client._open_session() and verified that proper error message was generated. - Ran impala-shell e2e and custom cluster tests. Change-Id: I50a08a62a332de759022d0a4862e74f5a81945d9 --- M shell/impala_client.py 1 file changed, 2 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17172/1 -- To view, visit http://gerrit.cloudera.org:8080/17172 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I50a08a62a332de759022d0a4862e74f5a81945d9 Gerrit-Change-Number: 17172 Gerrit-PatchSet: 1 Gerrit-Owner: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17171 to look at the new patch set (#4). Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil .. IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil HDFS FileSystem provides a listStatusIterator() API for listing remote storage using a RemoteIterator. We use it to list files when loading table file metadata. It's not guaranteed that a RemoteIterator can survive when its hasNext() or next() throws IOExceptions. We should stop the loop in this case. Otherwise, we may go into a dead loop. Without HADOOP-16685, it's also not guaranteed that FileSystem.listStatusIterator() will thrown a FileNotFoundException when the path doesn't exist. This patch refactors the file listing iterators so we don't need to depend on these two assumptions. The basic idea is: - On one side, we should not depends on other RemoteIterator's behavior after exception. - On the other side, we try to make our own iterators more robust on transient sub-directories. So table loading won't be failed by them. Tests: - Loop test_insert_stress.py 100 times. Verified the non-existing subdirs are skipped and inserts are stable in a high concurrency. Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea --- M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java 1 file changed, 100 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17171/4 -- To view, visit http://gerrit.cloudera.org:8080/17171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea Gerrit-Change-Number: 17171 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17171 ) Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil .. Patch Set 3: When I loop over tests/stress/test_insert_stress.py to run it several times, I found the inserts are more easy to fail by TableLoadingException. And then lead the table metadata to a bad state. The exception is E0311 11:59:42.633015 20506 ParallelFileMetadataLoader.java:166] Refreshing file and block metadata for 1 paths for table test_inserts_ab08196b.test_concurrent_inserts encountered an error loading data for path hdfs://localhost:20500/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts Java exception follows: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File hdfs://localhost:20500/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/5e4afdf5a6978311_9f9d0baa does not exist. at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513) at com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:86) at org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:163) at org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:115) at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:747) at org.apache.impala.catalog.HdfsTable.updateUnpartitionedTableFileMd(HdfsTable.java:1296) at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1182) at org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:1015) at org.apache.impala.service.CatalogOpExecutor.updateCatalog(CatalogOpExecutor.java:4808) at org.apache.impala.service.JniCatalog.updateCatalog(JniCatalog.java:327) Caused by: java.io.FileNotFoundException: File hdfs://localhost:20500/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/5e4afdf5a6978311_9f9d0baa does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1273) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1247) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1192) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1188) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1206) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2126) at org.apache.hadoop.fs.FileSystem$5.handleFileStat(FileSystem.java:2314) at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2291) at org.apache.impala.common.FileSystemUtil$FilterIterator.hasNext(FileSystemUtil.java:813) at org.apache.impala.catalog.FileMetadataLoader.load(FileMetadataLoader.java:202) at org.apache.impala.catalog.ParallelFileMetadataLoader.lambda$loadInternal$1(ParallelFileMetadataLoader.java:157) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:322) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134) at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:66) at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36) at org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:157) ... 7 more The reason is that we have removed the catching of FileNotFoundException in FilterIterator. When listing files with locations, we use FileSystem#listFiles() which returns a RemoteIterator similar to our RecursingIterator except the handling of non-exisitng subdir in its hasNext(). To make the file listing with location more robust as well, PS3 use our RecursingIterator when fs is not S3. -- To view, visit http://gerrit.cloudera.org:8080/17171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
[Impala-ASF-CR] IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17171 to look at the new patch set (#3). Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil .. IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil HDFS FileSystem provides a listStatusIterator() API for listing remote storage using a RemoteIterator. We use it to list files when loading table file metadata. It's not guaranteed that a RemoteIterator can survive when its hasNext() or next() throws IOExceptions. We should stop the loop in this case. Otherwise, we may go into a dead loop. Without HADOOP-16685, it's also not guaranteed that FileSystem.listStatusIterator() will thrown a FileNotFoundException when the path doesn't exist. This patch refactors the file listing iterators so we don't need to depend on these two assumptions. Tests: - Loop test_insert_stress.py 100 times. Verified the non-existing subdirs are skipped and inserts are stable in a high concurrency. Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea --- M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java 1 file changed, 100 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17171/3 -- To view, visit http://gerrit.cloudera.org:8080/17171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea Gerrit-Change-Number: 17171 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17171 ) Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8339/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea Gerrit-Change-Number: 17171 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 11 Mar 2021 03:49:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17171 to look at the new patch set (#2). Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil .. IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil HDFS FileSystem provides a listStatusIterator() API for listing remote storage using a RemoteIterator. We use it to list files when loading table file metadata. It's not guaranteed that a RemoteIterator can survive when its hasNext() or next() throws IOExceptions. We should stop the loop in this case. Otherwise, we may go into a dead loop. Without HADOOP-16685, it's also not guaranteed that FileSystem.listStatusIterator() will thrown a FileNotFoundException when the path doesn't exist. This patch refactors the file listing iterators so we don't need to depend on these two assumptions. Tests: - Run test_insert_stress.py. Verified the non-existing subdirs are skipped. Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea --- M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java 1 file changed, 37 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17171/2 -- To view, visit http://gerrit.cloudera.org:8080/17171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea Gerrit-Change-Number: 17171 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17171 ) Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8338/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea Gerrit-Change-Number: 17171 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 11 Mar 2021 02:44:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 ) Change subject: IMPALA-10551: Add result sink support for external frontends .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8337/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072 Gerrit-Change-Number: 17144 Gerrit-PatchSet: 6 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Thu, 11 Mar 2021 02:42:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 ) Change subject: IMPALA-10551: Add result sink support for external frontends .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8336/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072 Gerrit-Change-Number: 17144 Gerrit-PatchSet: 5 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Thu, 11 Mar 2021 02:35:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17171 Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil .. IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil HDFS FileSystem provides a listStatusIterator() API for listing remote storage using a RemoteIterator. We use it to list files when loading table file metadata. It's not guaranteed that a RemoteIterator can survive when its hasNext() or next() throws IOExceptions. We should stop the loop in this case. Otherwise, we may go into a dead loop. Without HADOOP-16685, it's also not guaranteed that FileSystem.listStatusIterator() will thrown a FileNotFoundException when the path doesn't exist. This patch refactors the file listing iterators so we don't need to depend on these two assumptions. Tests: - Run test_insert_stress.py. Verified the non-existing subdirs are skipped. Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea --- M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java 1 file changed, 37 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17171/1 -- To view, visit http://gerrit.cloudera.org:8080/17171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea Gerrit-Change-Number: 17171 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang
[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 ) Change subject: IMPALA-10551: Add result sink support for external frontends .. Patch Set 6: (7 comments) Thanks Joe, I think the suggested review comments has made this a better patch overall. http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.h@308 PS3, Line 308: /// Non-null if and only if the query produces results for the client; i.e. is of : /// TStmtType::QUERY. Coordinator uses these to pull results from plan tree and return : /// them to the client in GetNext(), and also to access the fragment instance's runtime : /// state. : /// : /// Result rows are materialized by this fragment instance in its own thread. They are : /// materialized into a QueryResultSet provided to the coordinator during GetNext(). : /// : /// Owned by the QueryState. Set in Exec(). : FragmentInstanceState* coord_instance_ = nullptr; > The way we are using coord_instance_ for a query with a result sink doesn't I've added a method to query-exec-params and call it in a few places to determine if there is a result sink in coordinator.cc - not sure if that is the best way. http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@193 PS3, Line 193: // set coord_instance_ and coord_sink_ : if (exec_params_.GetCoordFragment() != nullptr) { : // this blocks until all fragment instances have finished their Prepare phase : Status query_status = query_state_->GetFInstanceState(query_id(), _instance_); : if (!query_status.ok()) return UpdateExecState(query_status, nullptr, FLAGS_hostname); : // We expected this query to have a coordinator instance. : DCHECK(coord_instance_ != nullptr); : // When GetFInstanceState() returns the coordinator instance, the Prepare phase is : // done and the FragmentInstanceState's root sink will be set up. : coord_sink_ = coord_instance_->GetRootSink(); : DCHECK(coord_sink_ != nullptr); : } : return Status::OK(); : } : > As mentioned in coordinator.h, I think coord_instance_ should be nullptr fo Done http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@798 PS3, Line 798: // staging directory. > I would prefer to review this when I can see how it is being called. The newest review has the call. http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@805 PS3, Line 805: DCHECK(query_ctx().__isset.desc_tbl_serialized); > If this returns an error (which it should if the query is not successful), This code is based on FinalizeHdfsDml which follows a similar pattern. So I wonder if something else ends up cleaning it up in the error case. http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@894 PS3, Line 894: && query_state_->query_options().spool_query_results : && query_state_->query_options().spool_all_results_for_retries) { : // Wait until the BufferedPlanRootSink spooled all results or any errors stopping : // it, e.g. batch queue full, cancellation or failures. : auto sink = static_cast(coord_sink_); : if (sink->WaitForAllResultsSpooled()) { : VLOG_QUERY << "Cannot spool all results in the allocated result spooling space." : " Query retry will be skipped if any results have been returned."; : } : > If the below if statement does not apply to queries with result sinks, then Done http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@905 PS3, Line 905: } > I think it would make sense to treat a query with a result sink like a DML Done http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@940 PS3, Line 940: > If a query with a result sink goes through the current DML path, then it wo Done -- To view, visit http://gerrit.cloudera.org:8080/17144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072 Gerrit-Change-Number: 17144 Gerrit-PatchSet: 6 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell
[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends
Hello Aman Sinha, Thomas Tauber-Marshall, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17144 to look at the new patch set (#6). Change subject: IMPALA-10551: Add result sink support for external frontends .. IMPALA-10551: Add result sink support for external frontends - The intended purpose of these changes is to allow external frontends to receive query results via files rather than streaming the results through the thrift interface. - External frontends are expected to provide an FeFsTable implementation that describes the desired location to store results. - External frontends are responsible for managing the files after the query is completed. - Testing has been manual and through an implementation of an external frontend. Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072 Reviewed-by: Aman Sinha --- M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/query-exec-params.cc M be/src/runtime/query-exec-params.h M common/thrift/DataSinks.thrift M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/TableSink.java 9 files changed, 97 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/6 -- To view, visit http://gerrit.cloudera.org:8080/17144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072 Gerrit-Change-Number: 17144 Gerrit-PatchSet: 6 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 ) Change subject: IMPALA-10551: Add result sink support for external frontends .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/17144/5/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java: http://gerrit.cloudera.org:8080/#/c/17144/5/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@245 PS5, Line 245: exprs.addAll(outputExprs_.subList(0, targetTable_.getNonClusteringColumns().size())); line too long (91 > 90) -- To view, visit http://gerrit.cloudera.org:8080/17144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072 Gerrit-Change-Number: 17144 Gerrit-PatchSet: 5 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Thu, 11 Mar 2021 02:17:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends
Hello Aman Sinha, Thomas Tauber-Marshall, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17144 to look at the new patch set (#5). Change subject: IMPALA-10551: Add result sink support for external frontends .. IMPALA-10551: Add result sink support for external frontends - The intended purpose of these changes is to allow external frontends to receive query results via files rather than streaming the results through the thrift interface. - External frontends are expected to provide an FeFsTable implementation that describes the desired location to store results. - External frontends are responsible for managing the files after the query is completed. - Testing has been manual and through an implementation of an external frontend. Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072 Reviewed-by: Aman Sinha --- M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/query-exec-params.cc M be/src/runtime/query-exec-params.h M common/thrift/DataSinks.thrift M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/TableSink.java 9 files changed, 96 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/5 -- To view, visit http://gerrit.cloudera.org:8080/17144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072 Gerrit-Change-Number: 17144 Gerrit-PatchSet: 5 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17075 ) Change subject: IMPALA-10494: Making use of the min/max column stats to improve min/max filters .. Patch Set 19: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8335/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17075 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df Gerrit-Change-Number: 17075 Gerrit-PatchSet: 19 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 11 Mar 2021 01:20:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8334/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 11 Mar 2021 01:10:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters
Qifan Chen has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/17075 ) Change subject: IMPALA-10494: Making use of the min/max column stats to improve min/max filters .. IMPALA-10494: Making use of the min/max column stats to improve min/max filters This patch adds the functionality to compute the minimal and the maximal value for a column of type integers, float or double for parquet tables, and to make use of the new stats to discard min/max filters, in both hash join builders and Parquet scanners, whose coverage are too close to the actual range defined by the column min and max. The computation and dislay of the new column min/max stats are done for Parquet tables only and can be controlled by two new Boolean query options (default to false): 1. compute_column_minmax_stats 2. show_column_minmax_stats Usage examples. set compute_column_minmax_stats=true; compute stats tpcds_parquet.store_sales; set show_column_minmax_stats=true; show column stats tpcds_parquet.store_sales; +---+--+-...---+-+-+ | Column| Type | #Falses | Min | Max | +---+--+-...---+-+-+ | ss_sold_time_sk | INT | -1 | 28800 | 75599 | | ss_item_sk| BIGINT | -1 | 1 | 18000 | | ss_customer_sk| INT | -1 | 1 | 10 | | ss_cdemo_sk | INT | -1 | 15 | 1920797 | | ss_hdemo_sk | INT | -1 | 1 | 7200| | ss_addr_sk| INT | -1 | 1 | 5 | | ss_store_sk | INT | -1 | 1 | 10 | | ss_promo_sk | INT | -1 | 1 | 300 | | ss_ticket_number | BIGINT | -1 | 1 | 24 | | ss_quantity | INT | -1 | 1 | 100 | | ss_wholesale_cost | DECIMAL(7,2) | -1 | -1 | -1 | | ss_list_price | DECIMAL(7,2) | -1 | -1 | -1 | | ss_sales_price| DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_discount_amt | DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_sales_price| DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_wholesale_cost | DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_list_price | DECIMAL(7,2) | -1 | -1 | -1 | | ss_ext_tax| DECIMAL(7,2) | -1 | -1 | -1 | | ss_coupon_amt | DECIMAL(7,2) | -1 | -1 | -1 | | ss_net_paid | DECIMAL(7,2) | -1 | -1 | -1 | | ss_net_paid_inc_tax | DECIMAL(7,2) | -1 | -1 | -1 | | ss_net_profit | DECIMAL(7,2) | -1 | -1 | -1 | | ss_sold_date_sk | INT | -1 | 2450816 | 2452642 | +---+--+-...---+-+-+ Only the min/max values for non-partition columns are stored in HMS. The min/max values for partition columns are computed in coordinator. The min-max filters, in C++ class or protobuf form, are augmented to deal with the always true state better. Once always true is set, the actual min and max values in the filter are no longer populated. Testing: - Added new compute/show stats tests for integers, float and double column data types in compute-stats-column-minmax.test; - Added new tests in overlap_min_max_filters.test to demonstrate the usefulness of column stats to quickly disable useless filters in both hash join builder and Parquet scanner; - Added tests in min-max-filter-test.cc to demonstrate method Or(), ToProtobuf() and constructor can deal with always true flag well; - core tests. TODO: 1. Test compute stats for timestamp and date columns; Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df --- M be/src/exec/catalog-op-executor.cc M be/src/exec/filter-context.cc M be/src/exec/filter-context.h M be/src/exec/hdfs-scanner.h M be/src/exec/incr-stats-util-test.cc M be/src/exec/incr-stats-util.cc M be/src/exec/incr-stats-util.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/service/hs2-util.cc M be/src/service/hs2-util.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/min-max-filter-test.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/CatalogObjects.thrift M common/thrift/Frontend.thrift M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M
[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit .. Patch Set 2: (2 comments) Patch set 2 disable result spooling in tests that set scratch_limit > -1. http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@ PS1, Line : un > Agree. I will rephrase into this: Done http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@1124 PS1, Line 1124: if (query_options->spool_query_results && scratch_limit != -1) { > Good point. Quick git grep show some more places where scratch_limit is set Done -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 11 Mar 2021 00:53:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17166 to look at the new patch set (#2). Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit .. IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit IMPALA-9856 enables result spooling by default. However, if the query option scratch_limit is set lower than max_spilled_result_spooling_mem, the query might fail in the middle of execution due to insufficient scratch space. This patch validation that when result spooling is enabled, max_spilled_result_spooling_mem <= scratch_limit. This patch also fix validation between max_result_spooling_mem and max_spilled_result_spooling_mem that should treat both value 0 and -1 as unbounded. Testing: - Lower max_spilled_result_spooling_mem in test_with_high_scratch_limit and test_with_low_scratch_limit. - Toggle off spool_query_results in tests that set scratch_limit=0 to ensure that result spilling will not happen. - Add test_with_scratch_limit_less_than_max_spilled_result_spooling_mem. - Add be test QueryOptions.ResultSpoolingWithScratchLimit. - Pass exhaustive tests. Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 --- M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/service/query-options-test.cc M be/src/service/query-options.cc M testdata/workloads/functional-query/queries/QueryTest/spilling-naaj-no-deny-reservation.test M testdata/workloads/functional-query/queries/QueryTest/spilling-naaj.test M testdata/workloads/tpch/queries/sort-reservation-usage-single-node.test M tests/query_test/test_scratch_limit.py 7 files changed, 129 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/2 -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends
Hello Aman Sinha, Thomas Tauber-Marshall, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17144 to look at the new patch set (#4). Change subject: IMPALA-10551: Add result sink support for external frontends .. IMPALA-10551: Add result sink support for external frontends - The intended purpose of these changes is to allow external frontends to receive query results via files rather than streaming the results through the thrift interface. - External frontends are expected to provide an FeFsTable implementation that describes the desired location to store results. - External frontends are responsible for managing the files after the query is completed. - Testing has been manual and through an implementation of an external frontend. Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072 Reviewed-by: Aman Sinha --- M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M common/thrift/DataSinks.thrift M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/TableSink.java 9 files changed, 113 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/4 -- To view, visit http://gerrit.cloudera.org:8080/17144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072 Gerrit-Change-Number: 17144 Gerrit-PatchSet: 4 Gerrit-Owner: John Sherman Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@ PS1, Line : $1 > slightly misleading,does not necessarily have to be the same, could be 0 or Agree. I will rephrase into this: "If max_result_spooling_mem is set to unbounded ($0) max_spilled_result_spooling_mem must be set to unbounded (0 or -1) as well." http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@1124 PS1, Line 1124: if (query_options->spool_query_results && scratch_limit != -1) { > since now we have spilling on by default, spool_query_results is set to 1GB Good point. Quick git grep show some more places where scratch_limit is set to 0, like some in spilling-naaj-no-deny-reservation.test. I will also set spool_query_results=0 in these places. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 10 Mar 2021 23:28:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 ) Change subject: IMPALA-10564: Return error when inserting an invalid decimal value .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8332/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98 Gerrit-Change-Number: 17168 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 10 Mar 2021 22:37:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 ) Change subject: IMPALA-10564: Return error when inserting an invalid decimal value .. Patch Set 2: Thanks for adding the check for Kudu tables and the unit tests. When thinking about this a bit more, one concern I have is if a user was doing an INSERT-SELECT of a billion row (or relatively large) table, would just one decimal value overflowing error out the whole operation or will the rest of the inserts go through ? I suppose this depends on the ABORT_ON_ERROR setting ? ETL operations normally expect invalid/dirty rows to be logged into a separate location while continuing with the insertion of other rows. We should make sure this new behavior does not regress that workflow. Any thoughts on that ? -- To view, visit http://gerrit.cloudera.org:8080/17168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98 Gerrit-Change-Number: 17168 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 10 Mar 2021 22:33:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 ) Change subject: IMPALA-10564: Return error when inserting an invalid decimal value .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8331/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98 Gerrit-Change-Number: 17168 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 10 Mar 2021 22:28:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value
Wenzhe Zhou has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17168 ) Change subject: IMPALA-10564: Return error when inserting an invalid decimal value .. IMPALA-10564: Return error when inserting an invalid decimal value When using CTAS statements or INSERT-SELECT statements to insert rows to table with decimal columns, Impala insert NULL for overflowed decimal values, instead of returning error. This issue happens when the data expression for the decimal column in SELECT sub-query consists at least one alias. This issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the issue for the cases with the data expression for the decimal columns as constants so that the overflowed decimal values could be detected by frontend during expression analysis. If there is an alias (variable) in the data expression for the decimal column, only backend could detect decimal overflow. This patch added checking for the query status of RuntimeState in TableWriter when ScalarExprEvaluator return NULL for decimal column. If there is an error, the query will be failed without inserting NULL for decimal column. We did not change the behaviour for decimal_v1. NULL will be inserted to the table for invalid decimal values with warning message. Tests: - Manually ran queries with overflowed decimal values by using CTAS and INSERT-SELECT statements. Verified that queries failed without inserting NULL as expected. - Manually ran queries with overflowed decimal values and decimal_v2 set as false. The result is same as before - NULLs were inserted to table for invalid decimal values with warning message. - Added unit-tests for INSERT-SELECT and CTAS. - Passed core tests. Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98 --- M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/kudu-table-sink.cc M be/src/exec/parquet/hdfs-parquet-table-writer.cc A testdata/workloads/functional-query/queries/QueryTest/decimal-insert-overflow-exprs.test M tests/query_test/test_decimal_queries.py 5 files changed, 140 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17168/3 -- To view, visit http://gerrit.cloudera.org:8080/17168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98 Gerrit-Change-Number: 17168 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 ) Change subject: IMPALA-10564: Return error when inserting an invalid decimal value .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/17168/2/tests/query_test/test_decimal_queries.py File tests/query_test/test_decimal_queries.py: http://gerrit.cloudera.org:8080/#/c/17168/2/tests/query_test/test_decimal_queries.py@96 PS2, Line 96: @pytest.mark.execute_serially > flake8: E302 expected 2 blank lines, found 1 Done -- To view, visit http://gerrit.cloudera.org:8080/17168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98 Gerrit-Change-Number: 17168 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 10 Mar 2021 22:15:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 ) Change subject: IMPALA-10564: Return error when inserting an invalid decimal value .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/17168/2/tests/query_test/test_decimal_queries.py File tests/query_test/test_decimal_queries.py: http://gerrit.cloudera.org:8080/#/c/17168/2/tests/query_test/test_decimal_queries.py@96 PS2, Line 96: @pytest.mark.execute_serially flake8: E302 expected 2 blank lines, found 1 -- To view, visit http://gerrit.cloudera.org:8080/17168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98 Gerrit-Change-Number: 17168 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 10 Mar 2021 22:09:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value
Wenzhe Zhou has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17168 ) Change subject: IMPALA-10564: Return error when inserting an invalid decimal value .. IMPALA-10564: Return error when inserting an invalid decimal value When using CTAS statements or INSERT-SELECT statements to insert rows to table with decimal columns, Impala insert NULL for overflowed decimal values, instead of returning error. This issue happens when the data expression for the decimal column in SELECT sub-query consists at least one alias. This issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the issue for the cases with the data expression for the decimal columns as constants so that the overflowed decimal values could be detected by frontend during expression analysis. If there is an alias (variable) in the data expression for the decimal column, only backend could detect decimal overflow. This patch added checking for the query status of RuntimeState in TableWriter when ScalarExprEvaluator return NULL for decimal column. If there is an error, the query will be failed without inserting NULL for decimal column. We did not change the behaviour for decimal_v1. NULL will be inserted to the table for invalid decimal values with warning message. Tests: - Manually ran queries with overflowed decimal values by using CTAS and INSERT-SELECT statements. Verified that queries failed without inserting NULL as expected. - Manually ran queries with overflowed decimal values and decimal_v2 set as false. The result is same as before - NULLs were inserted to table for invalid decimal values with warning message. - Added unit-tests for INSERT-SELECT and CTAS. - Passed core tests. Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98 --- M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/kudu-table-sink.cc M be/src/exec/parquet/hdfs-parquet-table-writer.cc A testdata/workloads/functional-query/queries/QueryTest/decimal-insert-overflow-exprs.test M tests/query_test/test_decimal_queries.py 5 files changed, 139 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17168/2 -- To view, visit http://gerrit.cloudera.org:8080/17168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98 Gerrit-Change-Number: 17168 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@ PS1, Line : $1 slightly misleading,does not necessarily have to be the same, could be 0 or -1, right? http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@1124 PS1, Line 1124: if (query_options->spool_query_results && scratch_limit != -1) { since now we have spilling on by default, spool_query_results is set to 1GB, i wonder if this check will break older workloads that set the scratch limit to something and dont expect it to throw an error. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 10 Mar 2021 21:54:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP IMPALA-10564: Return error when inserting an overflowed decimal value
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17168 ) Change subject: WIP IMPALA-10564: Return error when inserting an overflowed decimal value .. Patch Set 1: (2 comments) Fixed the issue for HDFS and Kudu table. HBase don't have this issue since it does not insert NULL to the table. http://gerrit.cloudera.org:8080/#/c/17168/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17168/1//COMMIT_MSG@7 PS1, Line 7: overflowed > See related comment below.. it can occur for other error conditions besides Done http://gerrit.cloudera.org:8080/#/c/17168/1/be/src/exec/hdfs-text-table-writer.cc File be/src/exec/hdfs-text-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/17168/1/be/src/exec/hdfs-text-table-writer.cc@107 PS1, Line 107: overflowed > It may not always be overflow. For instance I get the same NULL value inse Done -- To view, visit http://gerrit.cloudera.org:8080/17168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98 Gerrit-Change-Number: 17168 Gerrit-PatchSet: 1 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 10 Mar 2021 21:48:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10549: Register transactions from external frontend DML
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17122 ) Change subject: IMPALA-10549: Register transactions from external frontend DML .. Patch Set 12: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6950/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17122 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia8863b8d9d281a5d164f10de9c5ee52cf3be63db Gerrit-Change-Number: 17122 Gerrit-PatchSet: 12 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: John Sherman Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Wed, 10 Mar 2021 17:58:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17130 ) Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8330/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17130 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67 Gerrit-Change-Number: 17130 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 10 Mar 2021 13:31:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17130 ) Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables .. Patch Set 2: (3 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@233 PS1, Line 233: Preconditions.checkState(createStmt_.getIcebergPartitionSpecs().size() == 1); > Maybe we can add a comment here for this check. Done http://gerrit.cloudera.org:8080/#/c/17130/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java: http://gerrit.cloudera.org:8080/#/c/17130/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java@243 PS2, Line 243: transfromToTHdfsTable > typo: transform Done http://gerrit.cloudera.org:8080/#/c/17130/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java@262 PS2, Line 262: THdfsTable hdfsTable = new THdfsTable(localFsTable.getHdfsBaseDir(), : getColumnNames(), localFsTable.getNullPartitionKeyValue(), : FeFsTable.DEFAULT_NULL_COLUMN_VALUE, idToPartition, tPrototypePartition); : return hdfsTable; > Maybe we can return this new THdfsTable directly. Done -- To view, visit http://gerrit.cloudera.org:8080/17130 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67 Gerrit-Change-Number: 17130 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 10 Mar 2021 13:11:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17130 to look at the new patch set (#3). Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables .. IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables This patch adds support for CREATE TABLE AS SELECT statements for Iceberg tables. CTAS statements work like the following in Impala: 1. Analysis of the whole CTAS statement 2. Divide CTAS to CREATE stmt and INSERT stmt 3. Create temporary in-memory target table from the CREATE stmt 4. Analyse the INSERT statement by using the temporary target table 5. If everything is OK so far, create the target table 6. Execute the INSERT query For Iceberg tables the non-trivial thing was to create the temporary target table without actually creating it via Iceberg API. I've created a new class 'IcebergCtasTarget' that mimics an FeIceberg table. It can be used with catalog V1 and V2 as well. Testing * e2e CTAS tests in iceberg-ctas.test * SHOW CREATE TABLE stmts in show-create-table.test Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java A fe/src/main/java/org/apache/impala/catalog/CtasTargetTable.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java M fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java A testdata/workloads/functional-query/queries/QueryTest/iceberg-ctas.test M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test M tests/metadata/test_show_create_table.py M tests/query_test/test_iceberg.py 18 files changed, 686 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/17130/3 -- To view, visit http://gerrit.cloudera.org:8080/17130 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67 Gerrit-Change-Number: 17130 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng
[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 ) Change subject: IMPALA-7712: Support Google Cloud Storage .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8329/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17121 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b Gerrit-Change-Number: 17121 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 10 Mar 2021 12:56:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17121 ) Change subject: IMPALA-7712: Support Google Cloud Storage .. Patch Set 8: Fixed the deadloop found in tests/stress/test_insert_stress.py. Still investigating the other two timeout issues of IMPALA-10563 -- To view, visit http://gerrit.cloudera.org:8080/17121 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b Gerrit-Change-Number: 17121 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 10 Mar 2021 12:37:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage
Hello Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17121 to look at the new patch set (#8). Change subject: IMPALA-7712: Support Google Cloud Storage .. IMPALA-7712: Support Google Cloud Storage This patch adds support for GCS(Google Cloud Storage). Using the gcs-connector, the implementation is similar to other remote FileSystems. New flags for GCS: - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16. Follow-up: - Support for spilling to GCS will be addressed in IMPALA-10561. - Some tests are skipped for further investigation (IMPALA-10562, IMPALA-10563). Tests: - Compile and create hdfs test data on a GCE instance. Upload test data to a GCS bucket. Modify all locations in HMS DB to point to the GCS bucket. Remove some hdfs caching params. Run CORE tests. - Compile and load snapshot data to a GCS bucket. Run CORE tests. Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b --- M be/src/exec/hdfs-table-sink.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/tmp-file-mgr.cc M be/src/util/hdfs-util.cc M be/src/util/hdfs-util.h M bin/impala-config.sh M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M java/executor-deps/pom.xml M java/pom.xml M testdata/bin/create-load-data.sh M testdata/bin/load-test-warehouse-snapshot.sh M testdata/bin/run-all.sh M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py M tests/authorization/test_ranger.py M tests/common/impala_test_suite.py M tests/common/skip.py M tests/custom_cluster/test_admission_controller.py M tests/custom_cluster/test_coordinators.py M tests/custom_cluster/test_event_processing.py M tests/custom_cluster/test_hdfs_fd_caching.py M tests/custom_cluster/test_hive_parquet_codec_interop.py M tests/custom_cluster/test_hive_text_codec_interop.py M tests/custom_cluster/test_insert_behaviour.py M tests/custom_cluster/test_lineage.py M tests/custom_cluster/test_local_catalog.py M tests/custom_cluster/test_local_tz_conversion.py M tests/custom_cluster/test_metadata_replicas.py M tests/custom_cluster/test_parquet_max_page_header.py M tests/custom_cluster/test_permanent_udfs.py M tests/custom_cluster/test_query_retries.py M tests/custom_cluster/test_restart_services.py M tests/custom_cluster/test_topic_update_frequency.py M tests/data_errors/test_data_errors.py M tests/failure/test_failpoints.py M tests/metadata/test_catalogd_debug_actions.py M tests/metadata/test_compute_stats.py M tests/metadata/test_ddl.py M tests/metadata/test_hdfs_encryption.py M tests/metadata/test_hdfs_permissions.py M tests/metadata/test_hms_integration.py M tests/metadata/test_metadata_query_statements.py M tests/metadata/test_partition_metadata.py M tests/metadata/test_refresh_partition.py M tests/metadata/test_reset_metadata.py M tests/metadata/test_stale_metadata.py M tests/metadata/test_testcase_builder.py M tests/metadata/test_views_compatibility.py M tests/query_test/test_acid.py M tests/query_test/test_aggregation.py M tests/query_test/test_date_queries.py M tests/query_test/test_hbase_queries.py M tests/query_test/test_hdfs_caching.py M tests/query_test/test_insert_behaviour.py M tests/query_test/test_insert_parquet.py M tests/query_test/test_insert_permutation.py M tests/query_test/test_join_queries.py M tests/query_test/test_nested_types.py M tests/query_test/test_observability.py M tests/query_test/test_partitioning.py M tests/query_test/test_resource_limits.py M tests/query_test/test_scanners.py M tests/shell/test_shell_commandline.py M tests/stress/test_acid_stress.py M tests/stress/test_ddl_stress.py M tests/util/filesystem_utils.py 68 files changed, 302 insertions(+), 64 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17121/8 -- To view, visit http://gerrit.cloudera.org:8080/17121 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b Gerrit-Change-Number: 17121 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-9470: Use Parquet Bloom filters - Part 1
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17026 ) Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1 .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8328/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 Gerrit-Change-Number: 17026 Gerrit-PatchSet: 15 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 10 Mar 2021 11:52:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9470: Use Parquet Bloom filters - Part 1
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17026 ) Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1 .. Patch Set 15: (182 comments) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h File be/src/thirdparty/xxhash/xxhash.h: http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@70 PS15, Line 70: https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html?showComment=1552696407071#c3490092340461170735 line too long (112 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@92 PS15, Line 92: * https://fastcompression.blogspot.com/2018/03/xxhash-for-small-keys-impressive-power.html line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@113 PS15, Line 113: # elif defined (__cplusplus) || (defined (__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) /* C99 */) line too long (104 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@243 PS15, Line 243: # define XXH3_64bits_reset_withSecret XXH_NAME2(XXH_NAMESPACE, XXH3_64bits_reset_withSecret) line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@253 PS15, Line 253: # define XXH3_128bits_reset_withSeed XXH_NAME2(XXH_NAMESPACE, XXH3_128bits_reset_withSeed) line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@254 PS15, Line 254: # define XXH3_128bits_reset_withSecret XXH_NAME2(XXH_NAMESPACE, XXH3_128bits_reset_withSecret) line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@270 PS15, Line 270: #define XXH_VERSION_NUMBER (XXH_VERSION_MAJOR *100*100 + XXH_VERSION_MINOR *100 + XXH_VERSION_RELEASE) line too long (103 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@429 PS15, Line 429: * @param statePtr A pointer to an @ref XXH32_state_t allocated with @ref XXH32_createState(). line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@441 PS15, Line 441: XXH_PUBLIC_API void XXH32_copyState(XXH32_state_t* dst_state, const XXH32_state_t* src_state); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@476 PS15, Line 476: XXH_PUBLIC_API XXH_errorcode XXH32_update (XXH32_state_t* statePtr, const void* input, size_t length); line too long (102 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@628 PS15, Line 628: XXH_PUBLIC_API void XXH64_copyState(XXH64_state_t* dst_state, const XXH64_state_t* src_state); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@631 PS15, Line 631: XXH_PUBLIC_API XXH_errorcode XXH64_update (XXH64_state_t* statePtr, const void* input, size_t length); line too long (102 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@700 PS15, Line 700: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSeed(const void* data, size_t len, XXH64_hash_t seed); line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@724 PS15, Line 724: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSecret(const void* data, size_t len, const void* secret, size_t secretSize); line too long (120 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@743 PS15, Line 743: XXH_PUBLIC_API void XXH3_copyState(XXH3_state_t* dst_state, const XXH3_state_t* src_state); line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@756 PS15, Line 756: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_reset_withSeed(XXH3_state_t* statePtr, XXH64_hash_t seed); line too long (99 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@766 PS15, Line 766: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_reset_withSecret(XXH3_state_t* statePtr, const void* secret, size_t secretSize); line too long (121 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@768 PS15, Line 768: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_update (XXH3_state_t* statePtr, const void* input, size_t length); line too long (107 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@791 PS15, Line 791: XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_withSeed(const void* data, size_t len, XXH64_hash_t seed); line too long (100 > 90) http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@792 PS15, Line 792: XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_withSecret(const void* data, size_t len, const void* secret, size_t secretSize); line too long (122 > 90)
[Impala-ASF-CR] IMPALA-9470: Use Parquet Bloom filters - Part 1
Daniel Becker has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/17026 ) Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1 .. IMPALA-9470: Use Parquet Bloom filters - Part 1 This change adds read support for Parquet Bloom filters for some types. The supported Parquet type - Impala type pairs are the following: --- |Parquet type | Impala type| |---| |INT32| TINYINT, SMALLINT, INT | |INT64| BIGINT | |FLOAT| FLOAT | |DOUBLE | DOUBLE | |BYTE_ARRAY | STRING | --- If a Bloom filter is available for a column that is fully dictionary encoded, the Bloom filter is not used as the dictionary can give exact results in filtering. Testing: - Added tests/query_test/test_parquet_bloom_filter.py that tests that Parquet Bloom filtering works for the supported types and that we do not incorrectly discard row groups for the unsupported type VARCHAR. Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 --- M LICENSE.txt M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exprs/expr-value.h M be/src/exprs/literal.cc M be/src/exprs/literal.h M be/src/kudu/util/block_bloom_filter.cc M be/src/kudu/util/block_bloom_filter.h M be/src/runtime/bufferpool/buffer-pool-internal.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h A be/src/thirdparty/xxhash/README.md A be/src/thirdparty/xxhash/xxhash.h M be/src/util/CMakeLists.txt M be/src/util/bloom-filter.cc M be/src/util/bloom-filter.h A be/src/util/impala-bloom-filter-buffer-allocator.cc A be/src/util/impala-bloom-filter-buffer-allocator.h A be/src/util/parquet-bloom-filter.cc A be/src/util/parquet-bloom-filter.h M bin/rat_exclude_files.txt M bin/run_clang_tidy.sh M common/thrift/parquet.thrift A testdata/data/parquet-bloom-filtering.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter.test A tests/query_test/test_parquet_bloom_filter.py 27 files changed, 6,848 insertions(+), 123 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17026/15 -- To view, visit http://gerrit.cloudera.org:8080/17026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287 Gerrit-Change-Number: 17026 Gerrit-PatchSet: 15 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins