[Impala-ASF-CR] IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil

2021-03-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17171 )

Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8340/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea
Gerrit-Change-Number: 17171
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 11 Mar 2021 07:57:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10367: Impala-shell internal error - UnboundLocalError, local variable 'retry msg' referenced before assign

2021-03-10 Thread Abhishek Rawat (Code Review)
Abhishek Rawat has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17172


Change subject: IMPALA-10367: Impala-shell internal error - UnboundLocalError, 
local variable 'retry_msg' referenced before assign
..

IMPALA-10367: Impala-shell internal error -
UnboundLocalError, local variable 'retry_msg' referenced before assign

ImpalaHS2Client._open_session() has a 'retry_msg' variable which was
not initialized in the code-path where retry was disabled. If an
exception was hit with retry disabled, a compile time error was
generated.

The fix is to initialize 'retry_msg' in the non retry code-path.

Testing:
- Forced exception in ImpalaHS2Client._open_session() and verified that
proper error message was generated.
- Ran impala-shell e2e and custom cluster tests.

Change-Id: I50a08a62a332de759022d0a4862e74f5a81945d9
---
M shell/impala_client.py
1 file changed, 2 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17172/1
--
To view, visit http://gerrit.cloudera.org:8080/17172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I50a08a62a332de759022d0a4862e74f5a81945d9
Gerrit-Change-Number: 17172
Gerrit-PatchSet: 1
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil

2021-03-10 Thread Quanlong Huang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17171

to look at the new patch set (#4).

Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
..

IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil

HDFS FileSystem provides a listStatusIterator() API for listing remote
storage using a RemoteIterator. We use it to list files when loading
table file metadata.

It's not guaranteed that a RemoteIterator can survive when its hasNext()
or next() throws IOExceptions. We should stop the loop in this case.
Otherwise, we may go into a dead loop.

Without HADOOP-16685, it's also not guaranteed that
FileSystem.listStatusIterator() will thrown a FileNotFoundException when
the path doesn't exist.

This patch refactors the file listing iterators so we don't need to
depend on these two assumptions. The basic idea is:
 - On one side, we should not depends on other RemoteIterator's behavior
   after exception.
 - On the other side, we try to make our own iterators more robust on
   transient sub-directories. So table loading won't be failed by them.

Tests:
 - Loop test_insert_stress.py 100 times. Verified the non-existing
   subdirs are skipped and inserts are stable in a high concurrency.

Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea
---
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
1 file changed, 100 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17171/4
--
To view, visit http://gerrit.cloudera.org:8080/17171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea
Gerrit-Change-Number: 17171
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil

2021-03-10 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17171 )

Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
..


Patch Set 3:

When I loop over tests/stress/test_insert_stress.py to run it several times, I 
found the inserts are more easy to fail by TableLoadingException. And then lead 
the table metadata to a bad state. The exception is

E0311 11:59:42.633015 20506 ParallelFileMetadataLoader.java:166] Refreshing 
file and block metadata for 1 paths for table 
test_inserts_ab08196b.test_concurrent_inserts encountered an error loading data 
for path 
hdfs://localhost:20500/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts
Java exception follows:
java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File 
hdfs://localhost:20500/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/5e4afdf5a6978311_9f9d0baa
 does not exist.
at 
com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513)
at 
com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:86)
at 
org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:163)
at 
org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:115)
at 
org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:747)
at 
org.apache.impala.catalog.HdfsTable.updateUnpartitionedTableFileMd(HdfsTable.java:1296)
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1182)
at 
org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:1015)
at 
org.apache.impala.service.CatalogOpExecutor.updateCatalog(CatalogOpExecutor.java:4808)
at 
org.apache.impala.service.JniCatalog.updateCatalog(JniCatalog.java:327)
Caused by: java.io.FileNotFoundException: File 
hdfs://localhost:20500/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/5e4afdf5a6978311_9f9d0baa
 does not exist.
at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1273)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1247)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1192)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1188)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1206)
at 
org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2126)
at 
org.apache.hadoop.fs.FileSystem$5.handleFileStat(FileSystem.java:2314)
at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2291)
at 
org.apache.impala.common.FileSystemUtil$FilterIterator.hasNext(FileSystemUtil.java:813)
at 
org.apache.impala.catalog.FileMetadataLoader.load(FileMetadataLoader.java:202)
at 
org.apache.impala.catalog.ParallelFileMetadataLoader.lambda$loadInternal$1(ParallelFileMetadataLoader.java:157)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at 
com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:322)
at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
at 
com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:66)
at 
com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36)
at 
org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:157)
... 7 more

The reason is that we have removed the catching of FileNotFoundException in 
FilterIterator. When listing files with locations, we use 
FileSystem#listFiles() which returns a RemoteIterator similar to our 
RecursingIterator except the handling of non-exisitng subdir in its hasNext(). 
To make the file listing with location more robust as well, PS3 use our 
RecursingIterator when fs is not S3.


--
To view, visit http://gerrit.cloudera.org:8080/17171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings


[Impala-ASF-CR] IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil

2021-03-10 Thread Quanlong Huang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17171

to look at the new patch set (#3).

Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
..

IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil

HDFS FileSystem provides a listStatusIterator() API for listing remote
storage using a RemoteIterator. We use it to list files when loading
table file metadata.

It's not guaranteed that a RemoteIterator can survive when its hasNext()
or next() throws IOExceptions. We should stop the loop in this case.
Otherwise, we may go into a dead loop.

Without HADOOP-16685, it's also not guaranteed that
FileSystem.listStatusIterator() will thrown a FileNotFoundException when
the path doesn't exist.

This patch refactors the file listing iterators so we don't need to
depend on these two assumptions.

Tests:
 - Loop test_insert_stress.py 100 times. Verified the non-existing
   subdirs are skipped and inserts are stable in a high concurrency.

Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea
---
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
1 file changed, 100 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17171/3
--
To view, visit http://gerrit.cloudera.org:8080/17171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea
Gerrit-Change-Number: 17171
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil

2021-03-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17171 )

Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8339/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea
Gerrit-Change-Number: 17171
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 11 Mar 2021 03:49:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil

2021-03-10 Thread Quanlong Huang (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17171

to look at the new patch set (#2).

Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
..

IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil

HDFS FileSystem provides a listStatusIterator() API for listing remote
storage using a RemoteIterator. We use it to list files when loading
table file metadata.

It's not guaranteed that a RemoteIterator can survive when its hasNext()
or next() throws IOExceptions. We should stop the loop in this case.
Otherwise, we may go into a dead loop.

Without HADOOP-16685, it's also not guaranteed that
FileSystem.listStatusIterator() will thrown a FileNotFoundException when
the path doesn't exist.

This patch refactors the file listing iterators so we don't need to
depend on these two assumptions.

Tests:
 - Run test_insert_stress.py. Verified the non-existing subdirs are
   skipped.

Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea
---
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
1 file changed, 37 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17171/2
--
To view, visit http://gerrit.cloudera.org:8080/17171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea
Gerrit-Change-Number: 17171
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil

2021-03-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17171 )

Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8338/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea
Gerrit-Change-Number: 17171
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 11 Mar 2021 02:44:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8337/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 6
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 11 Mar 2021 02:42:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8336/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 5
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 11 Mar 2021 02:35:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil

2021-03-10 Thread Quanlong Huang (Code Review)
Quanlong Huang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17171


Change subject: IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil
..

IMPALA-10579: Fix usage of RemoteIterator in FileSystemUtil

HDFS FileSystem provides a listStatusIterator() API for listing remote
storage using a RemoteIterator. We use it to list files when loading
table file metadata.

It's not guaranteed that a RemoteIterator can survive when its hasNext()
or next() throws IOExceptions. We should stop the loop in this case.
Otherwise, we may go into a dead loop.

Without HADOOP-16685, it's also not guaranteed that
FileSystem.listStatusIterator() will thrown a FileNotFoundException when
the path doesn't exist.

This patch refactors the file listing iterators so we don't need to
depend on these two assumptions.

Tests:
 - Run test_insert_stress.py. Verified the non-existing subdirs are
   skipped.

Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea
---
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
1 file changed, 37 insertions(+), 18 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17171/1
--
To view, visit http://gerrit.cloudera.org:8080/17171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I859bd4f976c51a34eb6a03cefd2ddcdf11656cea
Gerrit-Change-Number: 17171
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-10 Thread John Sherman (Code Review)
John Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
..


Patch Set 6:

(7 comments)

Thanks Joe, I think the suggested review comments has made this a better patch 
overall.

http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.h@308
PS3, Line 308:   /// Non-null if and only if the query produces results for the 
client; i.e. is of
 :   /// TStmtType::QUERY. Coordinator uses these to pull results 
from plan tree and return
 :   /// them to the client in GetNext(), and also to access the 
fragment instance's runtime
 :   /// state.
 :   ///
 :   /// Result rows are materialized by this fragment instance in 
its own thread. They are
 :   /// materialized into a QueryResultSet provided to the 
coordinator during GetNext().
 :   ///
 :   /// Owned by the QueryState. Set in Exec().
 :   FragmentInstanceState* coord_instance_ = nullptr;
> The way we are using coord_instance_ for a query with a result sink doesn't
I've added a method to query-exec-params and call it in a few places to 
determine if there is a result sink in coordinator.cc - not sure if that is the 
best way.


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@193
PS3, Line 193:   // set coord_instance_ and coord_sink_
 :   if (exec_params_.GetCoordFragment() != nullptr) {
 : // this blocks until all fragment instances have finished 
their Prepare phase
 : Status query_status = 
query_state_->GetFInstanceState(query_id(), _instance_);
 : if (!query_status.ok()) return UpdateExecState(query_status, 
nullptr, FLAGS_hostname);
 : // We expected this query to have a coordinator instance.
 : DCHECK(coord_instance_ != nullptr);
 : // When GetFInstanceState() returns the coordinator 
instance, the Prepare phase is
 : // done and the FragmentInstanceState's root sink will be 
set up.
 : coord_sink_ = coord_instance_->GetRootSink();
 : DCHECK(coord_sink_ != nullptr);
 :   }
 :   return Status::OK();
 : }
 :
> As mentioned in coordinator.h, I think coord_instance_ should be nullptr fo
Done


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@798
PS3, Line 798:   // staging directory.
> I would prefer to review this when I can see how it is being called.
The newest review has the call.


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@805
PS3, Line 805:   DCHECK(query_ctx().__isset.desc_tbl_serialized);
> If this returns an error (which it should if the query is not successful),
This code is based on FinalizeHdfsDml which follows a similar pattern. So I 
wonder if something else ends up cleaning it up in the error case.


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@894
PS3, Line 894: && query_state_->query_options().spool_query_results
 : && 
query_state_->query_options().spool_all_results_for_retries) {
 :   // Wait until the BufferedPlanRootSink spooled all results 
or any errors stopping
 :   // it, e.g. batch queue full, cancellation or failures.
 :   auto sink = 
static_cast(coord_sink_);
 :   if (sink->WaitForAllResultsSpooled()) {
 : VLOG_QUERY << "Cannot spool all results in the allocated 
result spooling space."
 : " Query retry will be skipped if any results have 
been returned.";
 :   }
 :
> If the below if statement does not apply to queries with result sinks, then
Done


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@905
PS3, Line 905:   }
> I think it would make sense to treat a query with a result sink like a DML
Done


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@940
PS3, Line 940:
> If a query with a result sink goes through the current DML path, then it wo
Done



--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 6
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-10 Thread John Sherman (Code Review)
Hello Aman Sinha, Thomas Tauber-Marshall, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17144

to look at the new patch set (#6).

Change subject: IMPALA-10551: Add result sink support for external frontends
..

IMPALA-10551: Add result sink support for external frontends

- The intended purpose of these changes is to allow external frontends
  to receive query results via files rather than streaming the results
  through the thrift interface.
- External frontends are expected to provide an FeFsTable implementation
  that describes the desired location to store results.
- External frontends are responsible for managing the files after the
  query is completed.
- Testing has been manual and through an implementation of an external
  frontend.

Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Reviewed-by: Aman Sinha 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/query-exec-params.cc
M be/src/runtime/query-exec-params.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
9 files changed, 97 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/6
--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 6
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17144/5/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/17144/5/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@245
PS5, Line 245:   exprs.addAll(outputExprs_.subList(0, 
targetTable_.getNonClusteringColumns().size()));
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 5
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 11 Mar 2021 02:17:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-10 Thread John Sherman (Code Review)
Hello Aman Sinha, Thomas Tauber-Marshall, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17144

to look at the new patch set (#5).

Change subject: IMPALA-10551: Add result sink support for external frontends
..

IMPALA-10551: Add result sink support for external frontends

- The intended purpose of these changes is to allow external frontends
  to receive query results via files rather than streaming the results
  through the thrift interface.
- External frontends are expected to provide an FeFsTable implementation
  that describes the desired location to store results.
- External frontends are responsible for managing the files after the
  query is completed.
- Testing has been manual and through an implementation of an external
  frontend.

Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Reviewed-by: Aman Sinha 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/query-exec-params.cc
M be/src/runtime/query-exec-params.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
9 files changed, 96 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/5
--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 5
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters

2021-03-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17075 )

Change subject: IMPALA-10494: Making use of the min/max column stats to improve 
min/max filters
..


Patch Set 19:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8335/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
Gerrit-Change-Number: 17075
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 11 Mar 2021 01:20:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit

2021-03-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs 
scratch_limit
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8334/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 11 Mar 2021 01:10:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10494: Making use of the min/max column stats to improve min/max filters

2021-03-10 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#19). ( 
http://gerrit.cloudera.org:8080/17075 )

Change subject: IMPALA-10494: Making use of the min/max column stats to improve 
min/max filters
..

IMPALA-10494: Making use of the min/max column stats to improve min/max filters

This patch adds the functionality to compute the minimal and the maximal
value for a column of type integers, float or double for parquet tables,
and to make use of the new stats to discard min/max filters, in both hash
join builders and Parquet scanners, whose coverage are too close to the
actual range defined by the column min and max.

The computation and dislay of the new column min/max stats are done
for Parquet tables only and can be controlled by two new Boolean query
options (default to false):
  1. compute_column_minmax_stats
  2. show_column_minmax_stats

Usage examples.

  set compute_column_minmax_stats=true;
  compute stats tpcds_parquet.store_sales;

  set show_column_minmax_stats=true;
  show column stats tpcds_parquet.store_sales;

+---+--+-...---+-+-+
| Column| Type |   #Falses | Min | Max |
+---+--+-...---+-+-+
| ss_sold_time_sk   | INT  |   -1  | 28800   | 75599   |
| ss_item_sk| BIGINT   |   -1  | 1   | 18000   |
| ss_customer_sk| INT  |   -1  | 1   | 10  |
| ss_cdemo_sk   | INT  |   -1  | 15  | 1920797 |
| ss_hdemo_sk   | INT  |   -1  | 1   | 7200|
| ss_addr_sk| INT  |   -1  | 1   | 5   |
| ss_store_sk   | INT  |   -1  | 1   | 10  |
| ss_promo_sk   | INT  |   -1  | 1   | 300 |
| ss_ticket_number  | BIGINT   |   -1  | 1   | 24  |
| ss_quantity   | INT  |   -1  | 1   | 100 |
| ss_wholesale_cost | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_list_price | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_sales_price| DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_discount_amt   | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_sales_price| DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_wholesale_cost | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_list_price | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_ext_tax| DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_coupon_amt | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_net_paid   | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_net_paid_inc_tax   | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_net_profit | DECIMAL(7,2) |   -1  | -1  | -1  |
| ss_sold_date_sk   | INT  |   -1  | 2450816 | 2452642 |
+---+--+-...---+-+-+

Only the min/max values for non-partition columns are stored in HMS.
The min/max values for partition columns are computed in coordinator.

The min-max filters, in C++ class or protobuf form, are augmented to
deal with the always true state better. Once always true is set, the
actual min and max values in the filter are no longer populated.

Testing:
 - Added new compute/show stats tests for integers, float and double
   column data types in compute-stats-column-minmax.test;
 - Added new tests in overlap_min_max_filters.test to demonstrate the
   usefulness of column stats to quickly disable useless filters in
   both hash join builder and Parquet scanner;
 - Added tests in min-max-filter-test.cc to demonstrate method Or(),
   ToProtobuf() and constructor can deal with always true flag well;
 - core tests.

TODO:
 1. Test compute stats for timestamp and date columns;

Change-Id: I08581b44419bb8da5940cbf98502132acd1c86df
---
M be/src/exec/catalog-op-executor.cc
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/incr-stats-util-test.cc
M be/src/exec/incr-stats-util.cc
M be/src/exec/incr-stats-util.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/service/hs2-util.cc
M be/src/service/hs2-util.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/CatalogObjects.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
M 

[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit

2021-03-10 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs 
scratch_limit
..


Patch Set 2:

(2 comments)

Patch set 2 disable result spooling in tests that set scratch_limit > -1.

http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@
PS1, Line : un
> Agree. I will rephrase into this:
Done


http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@1124
PS1, Line 1124:   if (query_options->spool_query_results && scratch_limit != 
-1) {
> Good point. Quick git grep show some more places where scratch_limit is set
Done



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Thu, 11 Mar 2021 00:53:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit

2021-03-10 Thread Riza Suminto (Code Review)
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17166

to look at the new patch set (#2).

Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs 
scratch_limit
..

IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit

IMPALA-9856 enables result spooling by default. However, if the query
option scratch_limit is set lower than max_spilled_result_spooling_mem,
the query might fail in the middle of execution due to insufficient
scratch space. This patch validation that when result spooling is
enabled, max_spilled_result_spooling_mem <= scratch_limit.

This patch also fix validation between max_result_spooling_mem and
max_spilled_result_spooling_mem that should treat both value 0 and -1 as
unbounded.

Testing:
- Lower max_spilled_result_spooling_mem in test_with_high_scratch_limit
  and test_with_low_scratch_limit.
- Toggle off spool_query_results in tests that set scratch_limit=0 to
  ensure that result spilling will not happen.
- Add test_with_scratch_limit_less_than_max_spilled_result_spooling_mem.
- Add be test QueryOptions.ResultSpoolingWithScratchLimit.
- Pass exhaustive tests.

Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
---
M be/src/runtime/bufferpool/buffer-pool-test.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M 
testdata/workloads/functional-query/queries/QueryTest/spilling-naaj-no-deny-reservation.test
M testdata/workloads/functional-query/queries/QueryTest/spilling-naaj.test
M testdata/workloads/tpch/queries/sort-reservation-usage-single-node.test
M tests/query_test/test_scratch_limit.py
7 files changed, 129 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/2
--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

2021-03-10 Thread John Sherman (Code Review)
Hello Aman Sinha, Thomas Tauber-Marshall, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17144

to look at the new patch set (#4).

Change subject: IMPALA-10551: Add result sink support for external frontends
..

IMPALA-10551: Add result sink support for external frontends

- The intended purpose of these changes is to allow external frontends
  to receive query results via files rather than streaming the results
  through the thrift interface.
- External frontends are expected to provide an FeFsTable implementation
  that describes the desired location to store results.
- External frontends are responsible for managing the files after the
  query is completed.
- Testing has been manual and through an implementation of an external
  frontend.

Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Reviewed-by: Aman Sinha 
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
9 files changed, 113 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/4
--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 4
Gerrit-Owner: John Sherman 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit

2021-03-10 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs 
scratch_limit
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@
PS1, Line : $1
> slightly misleading,does not necessarily have to be the same, could be 0 or
Agree. I will rephrase into this:
"If max_result_spooling_mem is set to unbounded ($0) 
max_spilled_result_spooling_mem must be set to unbounded (0 or -1) as well."


http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@1124
PS1, Line 1124:   if (query_options->spool_query_results && scratch_limit != 
-1) {
> since now we have spilling on by default, spool_query_results is set to 1GB
Good point. Quick git grep show some more places where scratch_limit is set to 
0, like some in spilling-naaj-no-deny-reservation.test. I will also set 
spool_query_results=0 in these places.



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 10 Mar 2021 23:28:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

2021-03-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal 
value
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8332/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 10 Mar 2021 22:37:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

2021-03-10 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal 
value
..


Patch Set 2:

Thanks for adding the check for Kudu tables and the unit tests.  When thinking 
about this a bit more, one concern I have is if a user was doing an 
INSERT-SELECT of a billion row (or relatively large) table, would just one 
decimal value overflowing error out the whole operation or will the rest of the 
inserts go through ? I suppose this depends on the ABORT_ON_ERROR setting ?

ETL operations normally expect invalid/dirty rows to be logged into a separate 
location while continuing with the insertion of other rows. We should make sure 
this new behavior does not regress that workflow.  Any thoughts on that ?


--
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 10 Mar 2021 22:33:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

2021-03-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal 
value
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8331/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 10 Mar 2021 22:28:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

2021-03-10 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal 
value
..

IMPALA-10564: Return error when inserting an invalid decimal value

When using CTAS statements or INSERT-SELECT statements to insert rows to
table with decimal columns, Impala insert NULL for overflowed decimal
values, instead of returning error. This issue happens when the data
expression for the decimal column in SELECT sub-query consists at least
one alias.
This issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the
issue for the cases with the data expression for the decimal columns as
constants so that the overflowed decimal values could be detected by
frontend during expression analysis. If there is an alias (variable) in
the data expression for the decimal column, only backend could detect
decimal overflow.

This patch added checking for the query status of RuntimeState in
TableWriter when ScalarExprEvaluator return NULL for decimal column.
If there is an error, the query will be failed without inserting NULL
for decimal column.
We did not change the behaviour for decimal_v1. NULL will be inserted
to the table for invalid decimal values with warning message.

Tests:
 - Manually ran queries with overflowed decimal values by using CTAS
   and INSERT-SELECT statements. Verified that queries failed without
   inserting NULL as expected.
 - Manually ran queries with overflowed decimal values and decimal_v2
   set as false. The result is same as before - NULLs were inserted
   to table for invalid decimal values with warning message.
 - Added unit-tests for INSERT-SELECT and CTAS.
 - Passed core tests.

Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
---
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
A 
testdata/workloads/functional-query/queries/QueryTest/decimal-insert-overflow-exprs.test
M tests/query_test/test_decimal_queries.py
5 files changed, 140 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17168/3
--
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

2021-03-10 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal 
value
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17168/2/tests/query_test/test_decimal_queries.py
File tests/query_test/test_decimal_queries.py:

http://gerrit.cloudera.org:8080/#/c/17168/2/tests/query_test/test_decimal_queries.py@96
PS2, Line 96: @pytest.mark.execute_serially
> flake8: E302 expected 2 blank lines, found 1
Done



--
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 10 Mar 2021 22:15:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

2021-03-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal 
value
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17168/2/tests/query_test/test_decimal_queries.py
File tests/query_test/test_decimal_queries.py:

http://gerrit.cloudera.org:8080/#/c/17168/2/tests/query_test/test_decimal_queries.py@96
PS2, Line 96: @pytest.mark.execute_serially
flake8: E302 expected 2 blank lines, found 1



--
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 10 Mar 2021 22:09:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10564: Return error when inserting an invalid decimal value

2021-03-10 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/17168 )

Change subject: IMPALA-10564: Return error when inserting an invalid decimal 
value
..

IMPALA-10564: Return error when inserting an invalid decimal value

When using CTAS statements or INSERT-SELECT statements to insert rows to
table with decimal columns, Impala insert NULL for overflowed decimal
values, instead of returning error. This issue happens when the data
expression for the decimal column in SELECT sub-query consists at least
one alias.
This issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the
issue for the cases with the data expression for the decimal columns as
constants so that the overflowed decimal values could be detected by
frontend during expression analysis. If there is an alias (variable) in
the data expression for the decimal column, only backend could detect
decimal overflow.

This patch added checking for the query status of RuntimeState in
TableWriter when ScalarExprEvaluator return NULL for decimal column.
If there is an error, the query will be failed without inserting NULL
for decimal column.
We did not change the behaviour for decimal_v1. NULL will be inserted
to the table for invalid decimal values with warning message.

Tests:
 - Manually ran queries with overflowed decimal values by using CTAS
   and INSERT-SELECT statements. Verified that queries failed without
   inserting NULL as expected.
 - Manually ran queries with overflowed decimal values and decimal_v2
   set as false. The result is same as before - NULLs were inserted
   to table for invalid decimal values with warning message.
 - Added unit-tests for INSERT-SELECT and CTAS.
 - Passed core tests.

Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
---
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/kudu-table-sink.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
A 
testdata/workloads/functional-query/queries/QueryTest/decimal-insert-overflow-exprs.test
M tests/query_test/test_decimal_queries.py
5 files changed, 139 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17168/2
--
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit

2021-03-10 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17166 )

Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs 
scratch_limit
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@
PS1, Line : $1
slightly misleading,does not necessarily have to be the same, could be 0 or -1, 
right?


http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@1124
PS1, Line 1124:   if (query_options->spool_query_results && scratch_limit != 
-1) {
since now we have spilling on by default, spool_query_results is set to 1GB, i 
wonder if this check will break older workloads that set the scratch limit to 
something and dont expect it to throw an error.



--
To view, visit http://gerrit.cloudera.org:8080/17166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9
Gerrit-Change-Number: 17166
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 10 Mar 2021 21:54:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP IMPALA-10564: Return error when inserting an overflowed decimal value

2021-03-10 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17168 )

Change subject: WIP IMPALA-10564: Return error when inserting an overflowed 
decimal value
..


Patch Set 1:

(2 comments)

Fixed the issue for HDFS and Kudu table. HBase don't have this issue since it 
does not insert NULL to the table.

http://gerrit.cloudera.org:8080/#/c/17168/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17168/1//COMMIT_MSG@7
PS1, Line 7: overflowed
> See related comment below.. it can occur for other error conditions besides
Done


http://gerrit.cloudera.org:8080/#/c/17168/1/be/src/exec/hdfs-text-table-writer.cc
File be/src/exec/hdfs-text-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/17168/1/be/src/exec/hdfs-text-table-writer.cc@107
PS1, Line 107: overflowed
> It may not always be overflow.  For instance I get the same NULL value inse
Done



--
To view, visit http://gerrit.cloudera.org:8080/17168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64ce4ed194af81ef06401ffc1124e12f05b8da98
Gerrit-Change-Number: 17168
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 10 Mar 2021 21:48:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10549: Register transactions from external frontend DML

2021-03-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17122 )

Change subject: IMPALA-10549: Register transactions from external frontend DML
..


Patch Set 12:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6950/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17122
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8863b8d9d281a5d164f10de9c5ee52cf3be63db
Gerrit-Change-Number: 17122
Gerrit-PatchSet: 12
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: John Sherman 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 10 Mar 2021 17:58:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

2021-03-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17130 )

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8330/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 10 Mar 2021 13:31:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

2021-03-10 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17130 )

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
..


Patch Set 2:

(3 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/17130/1/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@233
PS1, Line 233: 
Preconditions.checkState(createStmt_.getIcebergPartitionSpecs().size() == 1);
> Maybe we can add a comment here for this check.
Done


http://gerrit.cloudera.org:8080/#/c/17130/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java:

http://gerrit.cloudera.org:8080/#/c/17130/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java@243
PS2, Line 243: transfromToTHdfsTable
> typo: transform
Done


http://gerrit.cloudera.org:8080/#/c/17130/2/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java@262
PS2, Line 262: THdfsTable hdfsTable = new 
THdfsTable(localFsTable.getHdfsBaseDir(),
 : getColumnNames(), 
localFsTable.getNullPartitionKeyValue(),
 : FeFsTable.DEFAULT_NULL_COLUMN_VALUE, idToPartition, 
tPrototypePartition);
 : return hdfsTable;
> Maybe we can return this new THdfsTable directly.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 10 Mar 2021 13:11:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

2021-03-10 Thread Zoltan Borok-Nagy (Code Review)
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17130

to look at the new patch set (#3).

Change subject: IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables
..

IMPALA-10222: CREATE TABLE AS SELECT for Iceberg tables

This patch adds support for CREATE TABLE AS SELECT statements
for Iceberg tables.

CTAS statements work like the following in Impala:

1. Analysis of the whole CTAS statement
2. Divide CTAS to CREATE stmt and INSERT stmt
3. Create temporary in-memory target table from the CREATE stmt
4. Analyse the INSERT statement by using the temporary target table
5. If everything is OK so far, create the target table
6. Execute the INSERT query

For Iceberg tables the non-trivial thing was to create the temporary
target table without actually creating it via Iceberg API. I've created
a new class 'IcebergCtasTarget' that mimics an FeIceberg table. It can be
used with catalog V1 and V2 as well.

Testing
 * e2e CTAS tests in iceberg-ctas.test
 * SHOW CREATE TABLE stmts in show-create-table.test

Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/catalog/CtasTargetTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
A testdata/workloads/functional-query/queries/QueryTest/iceberg-ctas.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
M tests/metadata/test_show_create_table.py
M tests/query_test/test_iceberg.py
18 files changed, 686 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/17130/3
--
To view, visit http://gerrit.cloudera.org:8080/17130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I81d2084e401b9fa74d5ad161b51fd3e2aa3fcc67
Gerrit-Change-Number: 17130
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

2021-03-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8329/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 10 Mar 2021 12:56:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

2021-03-10 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17121 )

Change subject: IMPALA-7712: Support Google Cloud Storage
..


Patch Set 8:

Fixed the deadloop found in tests/stress/test_insert_stress.py.
Still investigating the other two timeout issues of IMPALA-10563


--
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 10 Mar 2021 12:37:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7712: Support Google Cloud Storage

2021-03-10 Thread Quanlong Huang (Code Review)
Hello Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17121

to look at the new patch set (#8).

Change subject: IMPALA-7712: Support Google Cloud Storage
..

IMPALA-7712: Support Google Cloud Storage

This patch adds support for GCS(Google Cloud Storage). Using the
gcs-connector, the implementation is similar to other remote
FileSystems.

New flags for GCS:
 - num_gcs_io_threads: Number of GCS I/O threads. Defaults to be 16.

Follow-up:
 - Support for spilling to GCS will be addressed in IMPALA-10561.
 - Some tests are skipped for further investigation (IMPALA-10562,
   IMPALA-10563).

Tests:
 - Compile and create hdfs test data on a GCE instance. Upload test data
   to a GCS bucket. Modify all locations in HMS DB to point to the GCS
   bucket. Remove some hdfs caching params. Run CORE tests.
 - Compile and load snapshot data to a GCS bucket. Run CORE tests.

Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
---
M be/src/exec/hdfs-table-sink.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/runtime/tmp-file-mgr.cc
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M java/executor-deps/pom.xml
M java/pom.xml
M testdata/bin/create-load-data.sh
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/bin/run-all.sh
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py
M tests/authorization/test_ranger.py
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_admission_controller.py
M tests/custom_cluster/test_coordinators.py
M tests/custom_cluster/test_event_processing.py
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/custom_cluster/test_hive_parquet_codec_interop.py
M tests/custom_cluster/test_hive_text_codec_interop.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_lineage.py
M tests/custom_cluster/test_local_catalog.py
M tests/custom_cluster/test_local_tz_conversion.py
M tests/custom_cluster/test_metadata_replicas.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/custom_cluster/test_query_retries.py
M tests/custom_cluster/test_restart_services.py
M tests/custom_cluster/test_topic_update_frequency.py
M tests/data_errors/test_data_errors.py
M tests/failure/test_failpoints.py
M tests/metadata/test_catalogd_debug_actions.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_reset_metadata.py
M tests/metadata/test_stale_metadata.py
M tests/metadata/test_testcase_builder.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_acid.py
M tests/query_test/test_aggregation.py
M tests/query_test/test_date_queries.py
M tests/query_test/test_hbase_queries.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_insert_behaviour.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_insert_permutation.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_observability.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_resource_limits.py
M tests/query_test/test_scanners.py
M tests/shell/test_shell_commandline.py
M tests/stress/test_acid_stress.py
M tests/stress/test_ddl_stress.py
M tests/util/filesystem_utils.py
68 files changed, 302 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/17121/8
--
To view, visit http://gerrit.cloudera.org:8080/17121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia91ec956de3b620cccf6a1244b56b7da7a45b32b
Gerrit-Change-Number: 17121
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-9470: Use Parquet Bloom filters - Part 1

2021-03-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8328/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 15
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 10 Mar 2021 11:52:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9470: Use Parquet Bloom filters - Part 1

2021-03-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1
..


Patch Set 15:

(182 comments)

http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h
File be/src/thirdparty/xxhash/xxhash.h:

http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@70
PS15, Line 70: 
https://fastcompression.blogspot.com/2019/03/presenting-xxh3.html?showComment=1552696407071#c3490092340461170735
line too long (112 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@92
PS15, Line 92:  *  
https://fastcompression.blogspot.com/2018/03/xxhash-for-small-keys-impressive-power.html
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@113
PS15, Line 113: #  elif defined (__cplusplus) || (defined (__STDC_VERSION__) && 
(__STDC_VERSION__ >= 199901L) /* C99 */)
line too long (104 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@243
PS15, Line 243: #  define XXH3_64bits_reset_withSecret XXH_NAME2(XXH_NAMESPACE, 
XXH3_64bits_reset_withSecret)
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@253
PS15, Line 253: #  define XXH3_128bits_reset_withSeed XXH_NAME2(XXH_NAMESPACE, 
XXH3_128bits_reset_withSeed)
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@254
PS15, Line 254: #  define XXH3_128bits_reset_withSecret 
XXH_NAME2(XXH_NAMESPACE, XXH3_128bits_reset_withSecret)
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@270
PS15, Line 270: #define XXH_VERSION_NUMBER  (XXH_VERSION_MAJOR *100*100 + 
XXH_VERSION_MINOR *100 + XXH_VERSION_RELEASE)
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@429
PS15, Line 429:  * @param statePtr A pointer to an @ref XXH32_state_t allocated 
with @ref XXH32_createState().
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@441
PS15, Line 441: XXH_PUBLIC_API void XXH32_copyState(XXH32_state_t* dst_state, 
const XXH32_state_t* src_state);
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@476
PS15, Line 476: XXH_PUBLIC_API XXH_errorcode XXH32_update (XXH32_state_t* 
statePtr, const void* input, size_t length);
line too long (102 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@628
PS15, Line 628: XXH_PUBLIC_API void XXH64_copyState(XXH64_state_t* dst_state, 
const XXH64_state_t* src_state);
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@631
PS15, Line 631: XXH_PUBLIC_API XXH_errorcode XXH64_update (XXH64_state_t* 
statePtr, const void* input, size_t length);
line too long (102 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@700
PS15, Line 700: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSeed(const void* 
data, size_t len, XXH64_hash_t seed);
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@724
PS15, Line 724: XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_withSecret(const void* 
data, size_t len, const void* secret, size_t secretSize);
line too long (120 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@743
PS15, Line 743: XXH_PUBLIC_API void XXH3_copyState(XXH3_state_t* dst_state, 
const XXH3_state_t* src_state);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@756
PS15, Line 756: XXH_PUBLIC_API XXH_errorcode 
XXH3_64bits_reset_withSeed(XXH3_state_t* statePtr, XXH64_hash_t seed);
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@766
PS15, Line 766: XXH_PUBLIC_API XXH_errorcode 
XXH3_64bits_reset_withSecret(XXH3_state_t* statePtr, const void* secret, size_t 
secretSize);
line too long (121 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@768
PS15, Line 768: XXH_PUBLIC_API XXH_errorcode XXH3_64bits_update (XXH3_state_t* 
statePtr, const void* input, size_t length);
line too long (107 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@791
PS15, Line 791: XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_withSeed(const void* 
data, size_t len, XXH64_hash_t seed);
line too long (100 > 90)


http://gerrit.cloudera.org:8080/#/c/17026/15/be/src/thirdparty/xxhash/xxhash.h@792
PS15, Line 792: XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_withSecret(const 
void* data, size_t len, const void* secret, size_t secretSize);
line too long (122 > 90)



[Impala-ASF-CR] IMPALA-9470: Use Parquet Bloom filters - Part 1

2021-03-10 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#15). ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-9470: Use Parquet Bloom filters - Part 1
..

IMPALA-9470: Use Parquet Bloom filters - Part 1

This change adds read support for Parquet Bloom filters for some types.
The supported Parquet type - Impala type pairs are the following:

 ---
|Parquet type |  Impala type|
|---|
|INT32|  TINYINT, SMALLINT, INT |
|INT64|  BIGINT |
|FLOAT|  FLOAT  |
|DOUBLE   |  DOUBLE |
|BYTE_ARRAY   |  STRING |
 ---

If a Bloom filter is available for a column that is fully dictionary
encoded, the Bloom filter is not used as the dictionary can give exact
results in filtering.

Testing:
  - Added tests/query_test/test_parquet_bloom_filter.py that tests that
Parquet Bloom filtering works for the supported types and that we do
not incorrectly discard row groups for the unsupported type VARCHAR.

Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
---
M LICENSE.txt
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exprs/expr-value.h
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
M be/src/kudu/util/block_bloom_filter.cc
M be/src/kudu/util/block_bloom_filter.h
M be/src/runtime/bufferpool/buffer-pool-internal.h
M be/src/runtime/bufferpool/buffer-pool.cc
M be/src/runtime/bufferpool/buffer-pool.h
A be/src/thirdparty/xxhash/README.md
A be/src/thirdparty/xxhash/xxhash.h
M be/src/util/CMakeLists.txt
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/impala-bloom-filter-buffer-allocator.cc
A be/src/util/impala-bloom-filter-buffer-allocator.h
A be/src/util/parquet-bloom-filter.cc
A be/src/util/parquet-bloom-filter.h
M bin/rat_exclude_files.txt
M bin/run_clang_tidy.sh
M common/thrift/parquet.thrift
A testdata/data/parquet-bloom-filtering.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-bloom-filter.test
A tests/query_test/test_parquet_bloom_filter.py
27 files changed, 6,848 insertions(+), 123 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/17026/15
--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 15
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins